Bug 1843526
| Summary: | [RHOCP4.4] Unable to upgrade OCP4.3.19 to OCP4.4 in disconnected env: CVO enters reconciling mode without applying any manifests in update mode | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> | |
| Component: | Cluster Version Operator | Assignee: | W. Trevor King <wking> | |
| Status: | CLOSED ERRATA | QA Contact: | Johnny Liu <jialiu> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 4.3.z | CC: | aos-bugs, jniu, jokerman | |
| Target Milestone: | --- | |||
| Target Release: | 4.5.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause: The Cluster Version Operator had a race where it would consider a timed-out update reconciliation cycle as a successful update. The race was very rare, except for restricted-network clusters where the operator timed out attempting to fetch release image signatures.
Consequence: The Cluster Version Operator would enter its shuffled-manifest reconciliation mode, possibly breaking the cluster if the manifests were applied in an order that the components could not handle.
Fix: The Cluster Version Operator now treats those timed-out updates as failures.
Result: The Cluster Version Operator no longer enters reconciling mode before the update succeeds.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1843732 (view as bug list) | Environment: | ||
| Last Closed: | 2020-07-13 17:42:56 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1838497 | |||
| Bug Blocks: | 1843732, 1843987 | |||
|
Description
W. Trevor King
2020-06-03 13:26:09 UTC
Verified this bug with 4.5.0-0.nightly-2020-06-04-001344, Passed. Set up a pure disconnected cluster with 4.4.4, trigger upgrade with --force to 4.5.0-0.nightly-2020-06-04-001344. [root@preserve-jialiu-ansible ~]# oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.5.0-0.nightly-2020-06-04-001344 True False False 5h24m cloud-credential 4.5.0-0.nightly-2020-06-04-001344 True False False 5h22m cluster-autoscaler 4.5.0-0.nightly-2020-06-04-001344 True False False 5h53m config-operator 4.5.0-0.nightly-2020-06-04-001344 True False False 111m console 4.5.0-0.nightly-2020-06-04-001344 True False False 81m csi-snapshot-controller 4.5.0-0.nightly-2020-06-04-001344 True False False 97m dns 4.5.0-0.nightly-2020-06-04-001344 True False False 5h57m etcd 4.5.0-0.nightly-2020-06-04-001344 True False False 3h31m image-registry 4.5.0-0.nightly-2020-06-04-001344 True False False 5h52m ingress 4.5.0-0.nightly-2020-06-04-001344 True False False 3h29m insights 4.5.0-0.nightly-2020-06-04-001344 True False False 5h54m kube-apiserver 4.5.0-0.nightly-2020-06-04-001344 True False False 5h56m kube-controller-manager 4.5.0-0.nightly-2020-06-04-001344 True False False 5h56m kube-scheduler 4.5.0-0.nightly-2020-06-04-001344 True False False 5h56m kube-storage-version-migrator 4.5.0-0.nightly-2020-06-04-001344 True False False 83m machine-api 4.5.0-0.nightly-2020-06-04-001344 True False False 5h57m machine-approver 4.5.0-0.nightly-2020-06-04-001344 True False False 175m machine-config 4.5.0-0.nightly-2020-06-04-001344 True False False 5h57m marketplace 4.5.0-0.nightly-2020-06-04-001344 True False False 81m monitoring 4.5.0-0.nightly-2020-06-04-001344 True False False 80m network 4.5.0-0.nightly-2020-06-04-001344 True False False 5h58m node-tuning 4.5.0-0.nightly-2020-06-04-001344 True False False 108m openshift-apiserver 4.5.0-0.nightly-2020-06-04-001344 True False False 98m openshift-controller-manager 4.5.0-0.nightly-2020-06-04-001344 True False False 5h52m openshift-samples 4.5.0-0.nightly-2020-06-04-001344 True False False 99m operator-lifecycle-manager 4.5.0-0.nightly-2020-06-04-001344 True False False 5h57m operator-lifecycle-manager-catalog 4.5.0-0.nightly-2020-06-04-001344 True False False 5h57m operator-lifecycle-manager-packageserver 4.5.0-0.nightly-2020-06-04-001344 True False False 81m service-ca 4.5.0-0.nightly-2020-06-04-001344 True False False 5h58m storage 4.5.0-0.nightly-2020-06-04-001344 True False False 109m [root@preserve-jialiu-ansible ~]# oc get node NAME STATUS ROLES AGE VERSION ip-10-0-50-112.us-east-2.compute.internal Ready master 6h3m v1.18.3+a637491 ip-10-0-50-64.us-east-2.compute.internal Ready master 6h3m v1.18.3+a637491 ip-10-0-62-65.us-east-2.compute.internal Ready worker 5h54m v1.18.3+a637491 ip-10-0-67-224.us-east-2.compute.internal Ready master 6h3m v1.18.3+a637491 ip-10-0-75-234.us-east-2.compute.internal Ready worker 5h53m v1.18.3+a637491 ip-10-0-78-237.us-east-2.compute.internal Ready worker 5h53m v1.18.3+a637491 [root@preserve-jialiu-ansible ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-06-04-001344 True False 71m Cluster version is 4.5.0-0.nightly-2020-06-04-001344 [root@preserve-jialiu-ansible ~]# grep -r "update was cancelled at" cvo.log <--empty--> Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |