Bug 2025722

Summary: [sig-cluster-lifecycle] cluster upgrade should complete in 75m: minor updates timeout
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: Test FrameworkAssignee: Devan Goodwin <dgoodwin>
Status: CLOSED CURRENTRELEASE QA Contact: Devan Goodwin <dgoodwin>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.8CC: dgoodwin, sippy
Target Milestone: ---   
Target Release: 4.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2024766
: 2028021 (view as bug list) Environment:
Last Closed: 2021-12-01 10:40:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2024766    
Bug Blocks: 2028021    

Description W. Trevor King 2021-11-22 20:06:41 UTC
+++ This bug was initially created as a clone of Bug #2024766 +++

[sig-cluster-lifecycle] cluster upgrade should complete in 75m (105m on AWS)

is failing frequently in CI, see:
https://sippy.ci.openshift.org/sippy-ng/tests/4.8/analysis?test=%5Bsig-cluster-lifecycle%5D%20cluster%20upgrade%20should%20complete%20in%2075m%20(105m%20on%20AWS)

For example [1] blocked a 4.8 CI release:

  : [sig-cluster-lifecycle] cluster upgrade should complete in 75m (105m on AWS)	1h16m50s
  upgrade to registry.build03.ci.openshift.org/ci-op-ktnnb79c/release@sha256:326a14fdab07111e77882fdc34f26bed95f2254d3d8868faccd61cfe49f36017 took too long: 76.84 minutes

Common for several minor bumps:

  $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=96h&type=junit&name=release-master-ci&search=cluster+upgrade+should+complete+in' | grep 'failures match' | grep -v rehearse | sort -V
  periodic-ci-openshift-release-master-ci-4.8-e2e-gcp-upgrade (all) - 37 runs, 46% failed, 6% of failures match = 3% impact
  periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-ovn-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-upgrade (all) - 4 runs, 75% failed, 33% of failures match = 25% impact
  periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade (all) - 37 runs, 70% failed, 58% of failures match = 41% impact
  periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-openstack-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-ovirt-upgrade (all) - 16 runs, 69% failed, 18% of failures match = 13% impact
  periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-vsphere-upgrade (all) - 9 runs, 33% failed, 67% of failures match = 22% impact
  periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-azure-ovn-upgrade (all) - 4 runs, 100% failed, 25% of failures match = 25% impact
  periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
  periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-upgrade (all) - 4 runs, 50% failed, 100% of failures match = 50% impact
  periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-openstack-upgrade (all) - 4 runs, 100% failed, 50% of failures match = 50% impact
  periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-ovirt-upgrade (all) - 16 runs, 50% failed, 25% of failures match = 13% impact
  periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-azure-upgrade (all) - 281 runs, 97% failed, 0% of failures match = 0% impact
  periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-openstack-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1461391694523011072

Comment 2 W. Trevor King 2021-12-01 10:40:27 UTC
$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&name=release-master-ci&search=cluster+upgrade+should+complete+in' | grep 'failures match' | grep -v rehearse | sort -V
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-ovn-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade (all) - 16 runs, 38% failed, 200% of failures match = 75% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-ovirt-upgrade (all) - 8 runs, 75% failed, 50% of failures match = 38% impact
periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-openstack-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact

So OpenStack is still slow (as on 4.10, but that didn't stop us from closing bug 2024766 without patching).  Otherwise 4.8 -> 4.9 looks good since the 4.9 patch landed, so I'll move this to CURRENTRELEASE.