Bug 1763821
Summary: | [upgrade-4.1-4.2] Canceling the task graph partway though should be an error even if no tasks fail | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> | |
Component: | Cluster Version Operator | Assignee: | W. Trevor King <wking> | |
Status: | CLOSED ERRATA | QA Contact: | liujia <jiajliu> | |
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 4.3.0 | CC: | aos-bugs, ccoleman, ChetRHosey, gblomqui, jeder, jokerman, mpatel, nmalik, scuppett, xtian | |
Target Milestone: | --- | |||
Target Release: | 4.3.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1763822 1763823 (view as bug list) | Environment: | ||
Last Closed: | 2020-01-23 11:08:30 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1763822, 1763823 |
Description
W. Trevor King
2019-10-21 16:54:42 UTC
Even worse, we'll switch into reconciling mode which means we might upgrade nodes before control plane which is not allowed. This is a 4.1 to 4.2 upgrade blocker. I have tried a normal upgrade steps against 4.1.20 to 4.2.0. Upgrade finished actually and no worker cancel msg in the cvo log. According to the discussion with devs on slack(https://coreos.slack.com/archives/CEGKQ43CP/p1571710624054300), this should be race condition and only happened at a ratio during e2e test. and the appearance is that although the e2e job shows the upgrade successfully/finished, but the co did not finish sync(cluster upgrade is Progressing: Working towards 0.0.1-2019-10-21-095122: 25% complete). This check point is included in our upgrade test case, so i don't think qe need more case about it. About the bug's reproduce and verify, it reproduced more easily in e2e test such as the example jobs in https://coreos.slack.com/archives/CEKNRGF25/p1571655534427500, so we can do some regression test against the target build when the pr landed. please feel free to correct me. I tried upgrade from 4.2.1 to 4.3.0-0.nightly-2019-10-24-203507, it failed. Checked our ci test result on https://openshift-release.svc.ci.openshift.org, there is still not available for 4.2.1 to 4.3 upgrade path. So this bug's regression test is blocked. Regression test pass. Upgrade v4.2.2 to 4.3.0-0.nightly-2019-10-31-022441 successfully. # oc get clusterversion -o json|jq .items[0].status.history[ { "completionTime": "2019-10-31T07:00:44Z", "image": "registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-10-31-022441", "startedTime": "2019-10-31T06:13:16Z", "state": "Completed", "verified": false, "version": "4.3.0-0.nightly-2019-10-31-022441" }, { "completionTime": "2019-10-31T03:23:31Z", "image": "registry.svc.ci.openshift.org/ocp/release@sha256:dc782b44cac3d59101904cc5da2b9d8bdb90e55a07814df50ea7a13071b0f5f0", "startedTime": "2019-10-31T02:58:46Z", "state": "Completed", "verified": false, "version": "4.2.2" } ] No need extra test case according to comment 2, so remove needtestcase. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062 |