Bug 1783221 - CVO got panic when downgrading to 4.2.10
Summary: CVO got panic when downgrading to 4.2.10
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.4.0
Assignee: W. Trevor King
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks: 1798049
TreeView+ depends on / blocked
 
Reported: 2019-12-13 11:02 UTC by Gaoyun Pei
Modified: 2020-05-13 21:55 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1798049 (view as bug list)
Environment:
Last Closed: 2020-05-13 21:55:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 282 0 None closed Bug 1783221: lib/resourcemerge/core: Fix panic on container/port removal 2020-09-24 02:23:30 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-13 21:55:14 UTC

Comment 2 Clayton Coleman 2019-12-16 15:07:54 UTC
This must be fixed for 4.3 GA, we may not have non downgrading working.

Comment 4 W. Trevor King 2019-12-17 00:06:01 UTC
$ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-12-155629 | grep version
  cluster-version-operator                      https://github.com/openshift/cluster-version-operator                      da28418b76e0a4c2f2946a914ac2c649dbaf1dc5

so the stack trace hits [1] and [2].  I bet the "index out of range" is from [2]'s:

  existingCurr = &(*existing)[i]

because we don't re-enter the loop over existing [3] when we drop an entry [4].  This might be a common pattern among our resourcemerge implementations.

[1]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/pkg/payload/task_graph.go#L591
[2]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L69
[3]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L65
[4]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L76

Comment 7 Gaoyun Pei 2020-02-06 09:42:41 UTC
Verify this bug in 4.4.0-0.nightly-2020-02-05-220946

1. Install a latest 4.4 nightly cluster on AWS
# ./oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-02-05-220946   True        False         4h8m    Cluster version is 4.4.0-0.nightly-2020-02-05-220946

2. Downgrade the cluster to 4.3.0
# ./oc adm upgrade --to-image='quay.io/openshift-release-dev/ocp-release@sha256:3a516480dfd68e0f87f702b4d7bdd6f6a0acfdac5cd2e9767b838ceede34d70d' --allow-explicit-upgrade
Updating to release image quay.io/openshift-release-dev/ocp-release@sha256:3a516480dfd68e0f87f702b4d7bdd6f6a0acfdac5cd2e9767b838ceede34d70d

3. Downgrade finished, CVO pod is running well, no panic happened.
# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0     True        False         76m     Cluster version is 4.3.0

# oc get pod -n openshift-cluster-version
NAME                                        READY   STATUS    RESTARTS   AGE
cluster-version-operator-584fddff45-wgjps   1/1     Running   1          101m


4. Several operator kept in 4.4.0-0.nightly-2020-02-05-220946 while CVO shows the downgrade is finished.
# oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.3.0                               True        False         False      6h31m
cloud-credential                           4.3.0                               True        False         False      6h52m
cluster-autoscaler                         4.3.0                               True        False         False      6h40m
console                                    4.3.0                               True        False         False      112m
csi-snapshot-controller                    4.4.0-0.nightly-2020-02-05-220946   True        False         False      112m
dns                                        4.3.0                               True        False         False      6h44m
etcd                                       4.4.0-0.nightly-2020-02-05-220946   True        False         False      107m
image-registry                             4.3.0                               True        False         False      113m
ingress                                    4.3.0                               True        False         False      112m
insights                                   4.3.0                               True        False         False      6h46m
kube-apiserver                             4.3.0                               True        False         False      6h43m
kube-controller-manager                    4.3.0                               True        False         False      6h43m
kube-scheduler                             4.3.0                               True        False         False      6h43m
kube-storage-version-migrator              4.4.0-0.nightly-2020-02-05-220946   True        False         False      114m
machine-api                                4.3.0                               True        False         False      6h44m
machine-config                             4.3.0                               True        False         False      6h45m
marketplace                                4.3.0                               True        False         False      109m
monitoring                                 4.3.0                               True        False         False      106m
network                                    4.3.0                               True        False         False      6h46m
node-tuning                                4.3.0                               True        False         False      107m
openshift-apiserver                        4.3.0                               True        False         False      107m
openshift-controller-manager               4.3.0                               True        False         False      6h43m
openshift-samples                          4.3.0                               True        False         False      134m
operator-lifecycle-manager                 4.3.0                               True        False         False      6h45m
operator-lifecycle-manager-catalog         4.3.0                               True        False         False      6h44m
operator-lifecycle-manager-packageserver   4.3.0                               True        False         False      108m
service-ca                                 4.3.0                               True        False         False      6h46m
service-catalog-apiserver                  4.3.0                               True        False         False      6h46m
service-catalog-controller-manager         4.3.0                               True        False         False      6h46m
storage                                    4.3.0                               True        False         False      136m

# oc get clusterversion -o json|jq -r '.items[0].status.history[]|.startedTime + "|" + .completionTime + "|" + .state + "|" + .version'
2020-02-06T07:03:49Z|2020-02-06T07:47:17Z|Completed|4.3.0
2020-02-06T02:32:16Z|2020-02-06T02:53:34Z|Completed|4.4.0-0.nightly-2020-02-05-220946

Found an existing bug https://bugzilla.redhat.com/show_bug.cgi?id=1794360 about downgrade issue from 4.4 to 4.3, will track the issue 4 in BZ#1794360 , move this bug to VERIFIED.

Comment 9 errata-xmlrpc 2020-05-13 21:55:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.