Bug 1800346

Summary:	CVO got panic when downgrading to 4.2.10
Product:	OpenShift Container Platform	Reporter:	W. Trevor King <wking>
Component:	Cluster Version Operator	Assignee:	W. Trevor King <wking>
Status:	CLOSED ERRATA	QA Contact:	Gaoyun Pei <gpei>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.2.0	CC:	aos-bugs, ccoleman, gpei, jokerman, padillon, sdodson, wking
Target Milestone:	---
Target Release:	4.2.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1798049	Environment:
Last Closed:	2020-03-10 11:41:10 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1798049
Bug Blocks:

Description W. Trevor King 2020-02-06 22:11:04 UTC

+++ This bug was initially created as a clone of Bug #1798049 +++

+++ This bug was initially created as a clone of Bug #1783221 +++

--- Additional comment from W. Trevor King on 2019-12-16 19:06:01 EST ---

$ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-12-155629 | grep version
  cluster-version-operator                      https://github.com/openshift/cluster-version-operator                      da28418b76e0a4c2f2946a914ac2c649dbaf1dc5

so the stack trace hits [1] and [2].  I bet the "index out of range" is from [2]'s:

  existingCurr = &(*existing)[i]

because we don't re-enter the loop over existing [3] when we drop an entry [4].  This might be a common pattern among our resourcemerge implementations.

[1]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/pkg/payload/task_graph.go#L591
[2]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L69
[3]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L65
[4]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L76

Comment 3 Gaoyun Pei 2020-02-27 10:05:30 UTC

Verify this bug on payload 4.2.0-0.nightly-2020-02-26-233230

Downgrade a 4.3 cluster to 4.2.0-0.nightly-2020-02-26-233230, no panic in CVO pod, it's running well:

# oc get pod -n openshift-cluster-version
NAME                                        READY   STATUS      RESTARTS   AGE
cluster-version-operator-5498bcdf8f-z8fch   1/1     Running     0          152m
version--gkvd4-bjv4d                        0/1     Completed   0          153m
version--xw9vl-xzrg8                        0/1     Completed   0          3h15m

Comment 5 errata-xmlrpc 2020-03-10 11:41:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0685