1783221 – CVO got panic when downgrading to 4.2.10

Bug 1783221 - CVO got panic when downgrading to 4.2.10

Summary: CVO got panic when downgrading to 4.2.10

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	4.2.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.4.0
Assignee:	W. Trevor King
QA Contact:	Gaoyun Pei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1798049
TreeView+	depends on / blocked

Reported:	2019-12-13 11:02 UTC by Gaoyun Pei
Modified:	2020-05-13 21:55 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1798049 (view as bug list)
Environment:
Last Closed:	2020-05-13 21:55:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-version-operator pull 282	0	None	closed	Bug 1783221: lib/resourcemerge/core: Fix panic on container/port removal	2020-09-24 02:23:30 UTC
Red Hat Product Errata	RHBA-2020:0581	0	None	None	None	2020-05-13 21:55:14 UTC

Comment 2 Clayton Coleman 2019-12-16 15:07:54 UTC

This must be fixed for 4.3 GA, we may not have non downgrading working.

Comment 4 W. Trevor King 2019-12-17 00:06:01 UTC

$ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-12-155629 | grep version
  cluster-version-operator                      https://github.com/openshift/cluster-version-operator                      da28418b76e0a4c2f2946a914ac2c649dbaf1dc5

so the stack trace hits [1] and [2].  I bet the "index out of range" is from [2]'s:

  existingCurr = &(*existing)[i]

because we don't re-enter the loop over existing [3] when we drop an entry [4].  This might be a common pattern among our resourcemerge implementations.

[1]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/pkg/payload/task_graph.go#L591
[2]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L69
[3]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L65
[4]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L76

Comment 7 Gaoyun Pei 2020-02-06 09:42:41 UTC

Verify this bug in 4.4.0-0.nightly-2020-02-05-220946

1. Install a latest 4.4 nightly cluster on AWS
# ./oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-02-05-220946   True        False         4h8m    Cluster version is 4.4.0-0.nightly-2020-02-05-220946

2. Downgrade the cluster to 4.3.0
# ./oc adm upgrade --to-image='quay.io/openshift-release-dev/ocp-release@sha256:3a516480dfd68e0f87f702b4d7bdd6f6a0acfdac5cd2e9767b838ceede34d70d' --allow-explicit-upgrade
Updating to release image quay.io/openshift-release-dev/ocp-release@sha256:3a516480dfd68e0f87f702b4d7bdd6f6a0acfdac5cd2e9767b838ceede34d70d

3. Downgrade finished, CVO pod is running well, no panic happened.
# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0     True        False         76m     Cluster version is 4.3.0

# oc get pod -n openshift-cluster-version
NAME                                        READY   STATUS    RESTARTS   AGE
cluster-version-operator-584fddff45-wgjps   1/1     Running   1          101m


4. Several operator kept in 4.4.0-0.nightly-2020-02-05-220946 while CVO shows the downgrade is finished.
# oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.3.0                               True        False         False      6h31m
cloud-credential                           4.3.0                               True        False         False      6h52m
cluster-autoscaler                         4.3.0                               True        False         False      6h40m
console                                    4.3.0                               True        False         False      112m
csi-snapshot-controller                    4.4.0-0.nightly-2020-02-05-220946   True        False         False      112m
dns                                        4.3.0                               True        False         False      6h44m
etcd                                       4.4.0-0.nightly-2020-02-05-220946   True        False         False      107m
image-registry                             4.3.0                               True        False         False      113m
ingress                                    4.3.0                               True        False         False      112m
insights                                   4.3.0                               True        False         False      6h46m
kube-apiserver                             4.3.0                               True        False         False      6h43m
kube-controller-manager                    4.3.0                               True        False         False      6h43m
kube-scheduler                             4.3.0                               True        False         False      6h43m
kube-storage-version-migrator              4.4.0-0.nightly-2020-02-05-220946   True        False         False      114m
machine-api                                4.3.0                               True        False         False      6h44m
machine-config                             4.3.0                               True        False         False      6h45m
marketplace                                4.3.0                               True        False         False      109m
monitoring                                 4.3.0                               True        False         False      106m
network                                    4.3.0                               True        False         False      6h46m
node-tuning                                4.3.0                               True        False         False      107m
openshift-apiserver                        4.3.0                               True        False         False      107m
openshift-controller-manager               4.3.0                               True        False         False      6h43m
openshift-samples                          4.3.0                               True        False         False      134m
operator-lifecycle-manager                 4.3.0                               True        False         False      6h45m
operator-lifecycle-manager-catalog         4.3.0                               True        False         False      6h44m
operator-lifecycle-manager-packageserver   4.3.0                               True        False         False      108m
service-ca                                 4.3.0                               True        False         False      6h46m
service-catalog-apiserver                  4.3.0                               True        False         False      6h46m
service-catalog-controller-manager         4.3.0                               True        False         False      6h46m
storage                                    4.3.0                               True        False         False      136m

# oc get clusterversion -o json|jq -r '.items[0].status.history[]|.startedTime + "|" + .completionTime + "|" + .state + "|" + .version'
2020-02-06T07:03:49Z|2020-02-06T07:47:17Z|Completed|4.3.0
2020-02-06T02:32:16Z|2020-02-06T02:53:34Z|Completed|4.4.0-0.nightly-2020-02-05-220946

Found an existing bug https://bugzilla.redhat.com/show_bug.cgi?id=1794360 about downgrade issue from 4.4 to 4.3, will track the issue 4 in BZ#1794360 , move this bug to VERIFIED.

Comment 9 errata-xmlrpc 2020-05-13 21:55:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581

Note You need to log in before you can comment on or make changes to this bug.