Both of these jobs are failing with installer errors, starting from yesterday: - periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi-virtualmedia - periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi-ovn-ipv6 is failing frequently in CI, see: https://sippy.ci.openshift.org/sippy-ng/jobs/4.9/analysis?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi-virtualmedia%22%7D%2C%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi-ovn-ipv6%22%7D%5D%2C%22linkOperator%22%3A%22or%22%7D Here is a list of job runs that had installer errors https://sippy.ci.openshift.org/sippy-ng/jobs/4.9/runs?filters=%7B%22items%22%3A%5B%7B%22id%22%3A37693%2C%22columnField%22%3A%22result%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22I%22%7D%2C%7B%22id%22%3A99%2C%22columnField%22%3A%22job%22%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22metal-ipi%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D&sort=desc&sortField=timestamp
After investigating the problem with @afasano, we found out that the first bootstrap error date is 26th August 18:15 and 16:18; https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi-ovn-ipv6?buildId=1431332397214863360 https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-oc-master-e2e-metal-ipi-ovn-ipv6 Checking the merged PR's after these times, most suspected PRs are in here; https://amd64.ocp.releases.ci.openshift.org/releasestream/4.9.0-0.nightly/release/4.9.0-0.nightly-2021-08-26-151113 Investigation still continue, but it is likely that one of these PRs causing this bootstrap control plane error.
Looks like that release was garbage collected, any chance you saved the list of PR's? Just looking at GitHub,there was an installer RHCOS bump on that day: https://github.com/openshift/installer/pull/5168. It passed e2e-metal-ipi-ovn-ipv6, though.
I reproduced it on my local and see; Aug 31 08:40:25 master-1.ostest.test.metalkube.org nm-dispatcher[150065]: req:1 'dhcp6-change' [br-ex], "/etc/NetworkManager/dispatcher.d/30-resolv-prepender": complete: failed with Script '/etc/NetworkManager/dispatcher.d/30-resolv-prepender' exited with error status 1. I suspected https://github.com/openshift/machine-config-operator/pull/2706 and opened a revert PR to test this https://github.com/openshift/machine-config-operator/pull/2744 It looks like network issue. I copied the list of https://amd64.ocp.releases.ci.openshift.org/releasestream/4.9.0-0.nightly/release/4.9.0-0.nightly-2021-08-26-151113 not to lose history; BAREMETAL-OPERATOR Bug 1983190: Add LIVE_ISO_FORCE_PERSISTENT_BOOT_DEVICE variable #173 Full changelog CLUSTER-KUBE-APISERVER-OPERATOR Bug 1997420: revert wrong change on the api-usage rules #1204 Full changelog CLUSTER-MONITORING-OPERATOR Bug 1997528: remove use of etcd_object_counts metric #1345 Bug 1996941: adding label check for node when creating daemon set #1339 Full changelog CLUSTER-NODE-TUNING-OPERATOR Bug 1997486: Ship the latest TuneD and stalld. #265 Full changelog CLUSTER-VERSION-OPERATOR Bug 1986707: lib/resourcedelete/helper: Never-installed alternative in deletion log message #642 Full changelog CONSOLE Bug 1997102: Update gherkin for observe tab in workload sidebar #9865 Bug 1987344: Set openshift doc version to 4.8 #9889 Bug 1997655: Remove unused data-test-id which logs a react warning #9883 Full changelog CSI-DRIVER-MANILA, OPENSTACK-CINDER-CSI-DRIVER, OPENSTACK-CLOUD-CONTROLLER-MANAGER Bug 1988374: UPSTREAM: 1988374: Disable uuid checks on XFS (#1614) #72 Bug 1996031: Merge upstream tag ‘v1.22.0’ #70 Full changelog MACHINE-CONFIG-OPERATOR Bug 1971715: configure-ovs: fix RHEL7 specific issues #2706 Full changelog OPERATOR-LIFECYCLE-MANAGER, OPERATOR-REGISTRY Bug 1994648: fix(sub): Reset ResolutionFailed cond when error is resolved #176 Bug 1996878: Add deprecation warnings for CLIs that use or depend on sqlite #177 Full changelog SERVICE-CA-OPERATOR OWNERS: remove s-urbaniak #175 Full changelog TESTS Bug 1986562: Only check status for image trigger tests #26411 Full changelog
Running a revert PR https://github.com/openshift/installer/pull/5180 of https://github.com/openshift/installer/pull/5168. e2e-metal-ipi-ovn-ipv6 jobs passed. https://prow.ci.openshift.org/pr-history/?org=openshift&repo=installer&pr=5180
TRT marking this a blocker due to impact on IPv6/dualstack.
*** Bug 1999594 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759