Bug 2024766

Summary: [sig-cluster-lifecycle] cluster upgrade should complete in 75m: minor updates timeout
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: Test FrameworkAssignee: Devan Goodwin <dgoodwin>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4.8CC: sippy
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2025722 (view as bug list) Environment:
Last Closed: 2021-11-28 19:31:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2025722    

Description W. Trevor King 2021-11-18 22:39:36 UTC
[sig-cluster-lifecycle] cluster upgrade should complete in 75m (105m on AWS)

is failing frequently in CI, see:
https://sippy.ci.openshift.org/sippy-ng/tests/4.8/analysis?test=%5Bsig-cluster-lifecycle%5D%20cluster%20upgrade%20should%20complete%20in%2075m%20(105m%20on%20AWS)

For example [1] blocked a 4.8 CI release:

  : [sig-cluster-lifecycle] cluster upgrade should complete in 75m (105m on AWS)	1h16m50s
  upgrade to registry.build03.ci.openshift.org/ci-op-ktnnb79c/release@sha256:326a14fdab07111e77882fdc34f26bed95f2254d3d8868faccd61cfe49f36017 took too long: 76.84 minutes

Common for several minor bumps:

  $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=96h&type=junit&name=release-master-ci&search=cluster+upgrade+should+complete+in' | grep 'failures match' | grep -v rehearse | sort -V
  periodic-ci-openshift-release-master-ci-4.8-e2e-gcp-upgrade (all) - 37 runs, 46% failed, 6% of failures match = 3% impact
  periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-ovn-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-upgrade (all) - 4 runs, 75% failed, 33% of failures match = 25% impact
  periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade (all) - 37 runs, 70% failed, 58% of failures match = 41% impact
  periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-openstack-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-ovirt-upgrade (all) - 16 runs, 69% failed, 18% of failures match = 13% impact
  periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-vsphere-upgrade (all) - 9 runs, 33% failed, 67% of failures match = 22% impact
  periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-azure-ovn-upgrade (all) - 4 runs, 100% failed, 25% of failures match = 25% impact
  periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
  periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-upgrade (all) - 4 runs, 50% failed, 100% of failures match = 50% impact
  periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-openstack-upgrade (all) - 4 runs, 100% failed, 50% of failures match = 50% impact
  periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-ovirt-upgrade (all) - 16 runs, 50% failed, 25% of failures match = 13% impact
  periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-azure-upgrade (all) - 281 runs, 97% failed, 0% of failures match = 0% impact
  periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-openstack-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1461391694523011072

Comment 1 Scott Dodson 2021-11-28 19:32:51 UTC
It was discussed that we'd only loosen these tolerances in 4.9 as the problem doesn't seem present in 4.10 upgrades and if it becomes an issue there we'd prefer fixing whatever regression introduced the increase in upgrade duration rather than amending the tests.