Overall pass rates for baremetal are pretty low -- in the 30-40%ish range. TestGrid looks pretty bad for metal 4.8 and 4.9 CI -- 4.8: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-blocking#periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi 4.9: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-blocking#periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi That waterfall like pattern of 'F''s indicates most runs something fails, but not the same test. That has a high likelihood of being platform-related (or at least related to the on-prem networking) I filed https://bugzilla.redhat.com/show_bug.cgi?id=1974350 about a similar problem, someone one from the metal platform team should dig into the failures and see if it's something similar. Example test failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi/1417753887451910144 If you click on "open stdout" from the "local kubeconfig" test, you'll see networking-related problems: + oc --kubeconfig /etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost-recovery.kubeconfig get namespace kube-system The connection to the server localhost:6443 was refused - did you specify the right host or port? Definitely could be networking related, I'd try to correlate the time of that message(08:52:22) to logs in openshift-kni-infra namespace
Arda had a look, assigning to him for now. Looks like this is caused by several different bugs, some of which are being investigated or merged already.
There are 2 PRs for 2 flaky tests; https://github.com/openshift/origin/pull/26377 is for kubeconfig tests. https://github.com/openshift/origin/pull/26385 is for oc explain tests. After these PRs is merged, it is not expected to these tests fail.
I'm removing this job from origin until it's stable, please revert https://github.com/openshift/release/pull/21200 when this BZ is resolved.
https://github.com/openshift/origin/pull/26377 has merged but there is another, follow-on PR: https://github.com/openshift/origin/pull/26407
origin#26407 has now merged as well. Both PRs from the previous comment link bug 1986003, which is still POST. Are we waiting for more of those to land before doing more with this bug?
Metal IPI looks pretty healthy[1], and gating jobs are regularly getting through on 1 try. IMHO, 'd considered this fixed at this point, and TRT can open bugs for anything else we find if it affects metal. [1] https://sippy.ci.openshift.org/sippy-ng/jobs/4.9?filters=%7B%22items%22%3A%5B%7B%22id%22%3A1%2C%22columnField%22%3A%22current_runs%22%2C%22operatorValue%22%3A%22%3E%3D%22%2C%22value%22%3A%221%22%7D%2C%7B%22id%22%3A99%2C%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22metal-ipi%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D&period=twoDay&sort=desc&sortField=current_pass_percentage
According to the testgrid, https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-blocking#periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi "kubeconfig" failures have been resolved. Other test failures are followed in https://bugzilla.redhat.com/show_bug.cgi?id=1998643. I'm closing this bug as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759