Description of problem: Even after the disk space increase, we're still seeing some jobs fail See: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-metal-ipi-upgrade https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-metal-ipi-upgrade There are several tests failing, or incomplete upgrades. Please investigate. Note: OCP has a soft 75m time limit for upgrades, which is one of the failing tests. It's often just a little bit over, so either the job needs to find a way to reduce the upgrade time, or you can add an exception like AWS. This is a soft limit though so I don't think it's the root cause of the latest failures.
(In reply to Stephen Benjamin from comment #0) > Description of problem: > > Even after the disk space increase, we're still seeing some jobs fail > > See: > > https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci- > openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-metal-ipi- > upgrade > > https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic- > ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-metal- > ipi-upgrade > > > There are several tests failing, or incomplete upgrades. Please investigate. > > Note: OCP has a soft 75m time limit for upgrades, which is one of the > failing tests. It's often just a little bit over, so either the job needs to > find a way to reduce the upgrade time, or you can add an exception like AWS. > This is a soft limit though so I don't think it's the root cause of the > latest failures. Looking at some of the recent failures, all of the jobs that timed out with a report of how far they got (4/7), failed in the same place "568 of 676 done (84% complete)" "Cluster did not complete upgrade: timed out waiting for the condition: Working towards 4.8.0-0.nightly-2021-06-12-223426: 568 of 676 done (84% complete)" The other 3 failures varied
This was verified on CI
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759