Bug 1800346

Summary: CVO got panic when downgrading to 4.2.10
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: Cluster Version OperatorAssignee: W. Trevor King <wking>
Status: CLOSED ERRATA QA Contact: Gaoyun Pei <gpei>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2.0CC: aos-bugs, ccoleman, gpei, jokerman, padillon, sdodson, wking
Target Milestone: ---   
Target Release: 4.2.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1798049 Environment:
Last Closed: 2020-03-10 11:41:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1798049    
Bug Blocks:    

Description W. Trevor King 2020-02-06 22:11:04 UTC
+++ This bug was initially created as a clone of Bug #1798049 +++

+++ This bug was initially created as a clone of Bug #1783221 +++

--- Additional comment from W. Trevor King on 2019-12-16 19:06:01 EST ---

$ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-12-155629 | grep version
  cluster-version-operator                      https://github.com/openshift/cluster-version-operator                      da28418b76e0a4c2f2946a914ac2c649dbaf1dc5

so the stack trace hits [1] and [2].  I bet the "index out of range" is from [2]'s:

  existingCurr = &(*existing)[i]

because we don't re-enter the loop over existing [3] when we drop an entry [4].  This might be a common pattern among our resourcemerge implementations.

[1]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/pkg/payload/task_graph.go#L591
[2]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L69
[3]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L65
[4]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L76

Comment 3 Gaoyun Pei 2020-02-27 10:05:30 UTC
Verify this bug on payload 4.2.0-0.nightly-2020-02-26-233230

Downgrade a 4.3 cluster to 4.2.0-0.nightly-2020-02-26-233230, no panic in CVO pod, it's running well:

# oc get pod -n openshift-cluster-version
NAME                                        READY   STATUS      RESTARTS   AGE
cluster-version-operator-5498bcdf8f-z8fch   1/1     Running     0          152m
version--gkvd4-bjv4d                        0/1     Completed   0          153m
version--xw9vl-xzrg8                        0/1     Completed   0          3h15m

Comment 5 errata-xmlrpc 2020-03-10 11:41:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0685