Bug 1872906
Summary: | Cluster did not acknowledge request to upgrade in a reasonable time | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> |
Component: | Cluster Version Operator | Assignee: | W. Trevor King <wking> |
Status: | CLOSED ERRATA | QA Contact: | liujia <jiajliu> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 4.5 | CC: | aos-bugs, bleanhar, bparees, deads, dosmith, hongkliu, jack.ottofaro, jiajliu, jokerman, sdodson, wking, yanyang |
Target Milestone: | --- | ||
Target Release: | 4.5.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1843505 | Environment: | |
Last Closed: | 2020-10-19 14:54:24 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1843505 | ||
Bug Blocks: |
Description
W. Trevor King
2020-08-26 21:22:02 UTC
Setting "No Doc Update", because the effect of this was a new CVO possibly taking a minute or two to pick up the orphaned lease. Doesn't seem like a big deal, and we don't have a formal commitment around how quickly we turn that status acknowledgement around. 4.6 bug was re-opened, so the backport won't land until that's been verified. Waiting on the patch manager to tag us in. From ci test results in past 48h, there is still one failure "Cluster version operator acknowledges upgrade" in job "release-openshift-origin-installer-e2e-aws-upgrade". https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1315617602956955648 2020/10/12 11:37:47 Resolved release initial to registry.svc.ci.openshift.org/ocp/release:4.5.14 2020/10/12 11:37:47 Resolved release latest to registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-10-12-113401 fail [github.com/openshift/origin/test/e2e/upgrade/upgrade.go:137]: during upgrade to registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-10-12-113401 Unexpected error: <*errors.errorString | 0xc001ac2440>: { s: "timed out waiting for cluster to acknowledge upgrade: timed out waiting for the condition", } timed out waiting for cluster to acknowledge upgrade: timed out waiting for the condition occurred $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.5.14-x86_64 | grep cluster-version-operator cluster-version-operator https://github.com/openshift/cluster-version-operator 8bfbc20d65c296f48b21d40c47a443fefc0c8d77 $ git --no-pager log --first-parent --oneline -3 origin/release-4.5 0a34ac3632 (origin/release-4.5) Merge pull request #470 from wking/v0.18-go-clients 2c849e5729 Merge pull request #446 from wking/gracefully-release-leader-lease-4.5 8bfbc20d65 Merge pull request #433 from openshift-cherrypick-robot/cherry-pick-428-to-release-4.5 So that didn't have the fix yet in the outgoing version. According to above, we don't have original failure in job "release-openshift-origin-installer-e2e-aws-upgrade" in past 48h. The only failure of v4.5-v4.6 upgrade is against v4.5.14 stable build, which does not include the fix yet. Moreover, i checked there are two one successful v4.5 upgrade job[1][2](4.5.0-0.ci-2020-10-09-151943 to 4.5.0-0.ci-2020-10-12-153413). So move the bug to verify. [1] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1315677855731945472 [2] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1315687645593997312 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.15 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4228 Saw another one today. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-ovn-upgrade-4.4-stable-to-4.5-ci/1328703700692111360 [1] shows that it picked up the 4.5 target shortly after the test timed out. Improving 4.4's lease-release logic would help, but a minute or so of bumpy lease handoff doesn't seem important enough to be worth mucking with the maintenance-phase 4.4 [2]. [1]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-ovn-upgrade-4.4-stable-to-4.5-ci/1328703700692111360/artifacts/e2e-gcp-upgrade/clusterversion.json [2]: https://access.redhat.com/support/policy/updates/openshift#dates |