Bug 1783221
| Summary: | CVO got panic when downgrading to 4.2.10 | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Gaoyun Pei <gpei> | |
| Component: | Cluster Version Operator | Assignee: | W. Trevor King <wking> | |
| Status: | CLOSED ERRATA | QA Contact: | Gaoyun Pei <gpei> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 4.2.z | CC: | aos-bugs, ccoleman, jokerman, padillon, wking | |
| Target Milestone: | --- | |||
| Target Release: | 4.4.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1798049 (view as bug list) | Environment: | ||
| Last Closed: | 2020-05-13 21:55:11 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1798049 | |||
|
Comment 2
Clayton Coleman
2019-12-16 15:07:54 UTC
$ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-12-155629 | grep version cluster-version-operator https://github.com/openshift/cluster-version-operator da28418b76e0a4c2f2946a914ac2c649dbaf1dc5 so the stack trace hits [1] and [2]. I bet the "index out of range" is from [2]'s: existingCurr = &(*existing)[i] because we don't re-enter the loop over existing [3] when we drop an entry [4]. This might be a common pattern among our resourcemerge implementations. [1]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/pkg/payload/task_graph.go#L591 [2]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L69 [3]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L65 [4]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L76 Verify this bug in 4.4.0-0.nightly-2020-02-05-220946 1. Install a latest 4.4 nightly cluster on AWS # ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-02-05-220946 True False 4h8m Cluster version is 4.4.0-0.nightly-2020-02-05-220946 2. Downgrade the cluster to 4.3.0 # ./oc adm upgrade --to-image='quay.io/openshift-release-dev/ocp-release@sha256:3a516480dfd68e0f87f702b4d7bdd6f6a0acfdac5cd2e9767b838ceede34d70d' --allow-explicit-upgrade Updating to release image quay.io/openshift-release-dev/ocp-release@sha256:3a516480dfd68e0f87f702b4d7bdd6f6a0acfdac5cd2e9767b838ceede34d70d 3. Downgrade finished, CVO pod is running well, no panic happened. # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.0 True False 76m Cluster version is 4.3.0 # oc get pod -n openshift-cluster-version NAME READY STATUS RESTARTS AGE cluster-version-operator-584fddff45-wgjps 1/1 Running 1 101m 4. Several operator kept in 4.4.0-0.nightly-2020-02-05-220946 while CVO shows the downgrade is finished. # oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.3.0 True False False 6h31m cloud-credential 4.3.0 True False False 6h52m cluster-autoscaler 4.3.0 True False False 6h40m console 4.3.0 True False False 112m csi-snapshot-controller 4.4.0-0.nightly-2020-02-05-220946 True False False 112m dns 4.3.0 True False False 6h44m etcd 4.4.0-0.nightly-2020-02-05-220946 True False False 107m image-registry 4.3.0 True False False 113m ingress 4.3.0 True False False 112m insights 4.3.0 True False False 6h46m kube-apiserver 4.3.0 True False False 6h43m kube-controller-manager 4.3.0 True False False 6h43m kube-scheduler 4.3.0 True False False 6h43m kube-storage-version-migrator 4.4.0-0.nightly-2020-02-05-220946 True False False 114m machine-api 4.3.0 True False False 6h44m machine-config 4.3.0 True False False 6h45m marketplace 4.3.0 True False False 109m monitoring 4.3.0 True False False 106m network 4.3.0 True False False 6h46m node-tuning 4.3.0 True False False 107m openshift-apiserver 4.3.0 True False False 107m openshift-controller-manager 4.3.0 True False False 6h43m openshift-samples 4.3.0 True False False 134m operator-lifecycle-manager 4.3.0 True False False 6h45m operator-lifecycle-manager-catalog 4.3.0 True False False 6h44m operator-lifecycle-manager-packageserver 4.3.0 True False False 108m service-ca 4.3.0 True False False 6h46m service-catalog-apiserver 4.3.0 True False False 6h46m service-catalog-controller-manager 4.3.0 True False False 6h46m storage 4.3.0 True False False 136m # oc get clusterversion -o json|jq -r '.items[0].status.history[]|.startedTime + "|" + .completionTime + "|" + .state + "|" + .version' 2020-02-06T07:03:49Z|2020-02-06T07:47:17Z|Completed|4.3.0 2020-02-06T02:32:16Z|2020-02-06T02:53:34Z|Completed|4.4.0-0.nightly-2020-02-05-220946 Found an existing bug https://bugzilla.redhat.com/show_bug.cgi?id=1794360 about downgrade issue from 4.4 to 4.3, will track the issue 4 in BZ#1794360 , move this bug to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |