Bug 1971602 - e2e-metal-ipi-upgrade for 4.7 to 4.8 is permafailing
Summary: e2e-metal-ipi-upgrade for 4.7 to 4.8 is permafailing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.9.0
Assignee: Arda Guclu
QA Contact: Ori Michaeli
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-14 12:25 UTC by Stephen Benjamin
Modified: 2021-10-18 17:34 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:33:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift release pull 19228 0 None open Bug 1971602: Increase MASTER_DISK to 50GB for 4.7 to 4.8 metal upgrade 2021-06-15 13:48:55 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:34:23 UTC

Description Stephen Benjamin 2021-06-14 12:25:13 UTC
Description of problem:

Even after the disk space increase, we're still seeing some jobs fail

See: 
  https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-metal-ipi-upgrade

https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-metal-ipi-upgrade


There are several tests failing, or incomplete upgrades. Please investigate.

Note: OCP has a soft 75m time limit for upgrades, which is one of the failing tests. It's often just a little bit over, so either the job needs to find a way to reduce the upgrade time, or you can add an exception like AWS. This is a soft limit though so I don't think it's the root cause of the latest failures.

Comment 1 Derek Higgins 2021-06-15 11:39:04 UTC
(In reply to Stephen Benjamin from comment #0)
> Description of problem:
> 
> Even after the disk space increase, we're still seeing some jobs fail
> 
> See: 
>  
> https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-
> openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-metal-ipi-
> upgrade
> 
> https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-
> ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-metal-
> ipi-upgrade
> 
> 
> There are several tests failing, or incomplete upgrades. Please investigate.
> 
> Note: OCP has a soft 75m time limit for upgrades, which is one of the
> failing tests. It's often just a little bit over, so either the job needs to
> find a way to reduce the upgrade time, or you can add an exception like AWS.
> This is a soft limit though so I don't think it's the root cause of the
> latest failures.

Looking at some of the recent failures, all of the jobs that timed out with a 
report of how far they got (4/7), failed in the same place "568 of 676 done (84% complete)"

"Cluster did not complete upgrade: timed out waiting for the condition: Working towards 4.8.0-0.nightly-2021-06-12-223426: 568 of 676 done (84% complete)"

The other 3 failures varied

Comment 4 Ori Michaeli 2021-06-22 07:34:00 UTC
This was verified on CI

Comment 7 errata-xmlrpc 2021-10-18 17:33:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.