1800346 – CVO got panic when downgrading to 4.2.10

Bug 1800346 - CVO got panic when downgrading to 4.2.10

Summary: CVO got panic when downgrading to 4.2.10

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.2.z
Assignee:	W. Trevor King
QA Contact:	Gaoyun Pei
Docs Contact:
URL:
Whiteboard:
Depends On:	1798049
Blocks:
TreeView+	depends on / blocked

Reported:	2020-02-06 22:11 UTC by W. Trevor King
Modified:	2020-03-10 11:41 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1798049
Environment:
Last Closed:	2020-03-10 11:41:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-version-operator pull 314	0	None	closed	Bug 1800346: lib/resourcemerge/core: Fix panic on container removal	2020-09-24 02:46:10 UTC
Red Hat Product Errata	RHBA-2020:0685	0	None	None	None	2020-03-10 11:41:29 UTC

Description W. Trevor King 2020-02-06 22:11:04 UTC

+++ This bug was initially created as a clone of Bug #1798049 +++

+++ This bug was initially created as a clone of Bug #1783221 +++

--- Additional comment from W. Trevor King on 2019-12-16 19:06:01 EST ---

$ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-12-155629 | grep version
  cluster-version-operator                      https://github.com/openshift/cluster-version-operator                      da28418b76e0a4c2f2946a914ac2c649dbaf1dc5

so the stack trace hits [1] and [2].  I bet the "index out of range" is from [2]'s:

  existingCurr = &(*existing)[i]

because we don't re-enter the loop over existing [3] when we drop an entry [4].  This might be a common pattern among our resourcemerge implementations.

[1]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/pkg/payload/task_graph.go#L591
[2]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L69
[3]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L65
[4]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L76

Comment 3 Gaoyun Pei 2020-02-27 10:05:30 UTC

Verify this bug on payload 4.2.0-0.nightly-2020-02-26-233230

Downgrade a 4.3 cluster to 4.2.0-0.nightly-2020-02-26-233230, no panic in CVO pod, it's running well:

# oc get pod -n openshift-cluster-version
NAME                                        READY   STATUS      RESTARTS   AGE
cluster-version-operator-5498bcdf8f-z8fch   1/1     Running     0          152m
version--gkvd4-bjv4d                        0/1     Completed   0          153m
version--xw9vl-xzrg8                        0/1     Completed   0          3h15m

Comment 5 errata-xmlrpc 2020-03-10 11:41:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0685

Note You need to log in before you can comment on or make changes to this bug.