Bug 1800346 - CVO got panic when downgrading to 4.2.10
Summary: CVO got panic when downgrading to 4.2.10
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.2.z
Assignee: W. Trevor King
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On: 1798049
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-06 22:11 UTC by W. Trevor King
Modified: 2020-03-10 11:41 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1798049
Environment:
Last Closed: 2020-03-10 11:41:10 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 314 None closed Bug 1800346: lib/resourcemerge/core: Fix panic on container removal 2020-03-05 15:38:17 UTC
Red Hat Product Errata RHBA-2020:0685 None None None 2020-03-10 11:41:29 UTC

Description W. Trevor King 2020-02-06 22:11:04 UTC
+++ This bug was initially created as a clone of Bug #1798049 +++

+++ This bug was initially created as a clone of Bug #1783221 +++

--- Additional comment from W. Trevor King on 2019-12-16 19:06:01 EST ---

$ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-12-155629 | grep version
  cluster-version-operator                      https://github.com/openshift/cluster-version-operator                      da28418b76e0a4c2f2946a914ac2c649dbaf1dc5

so the stack trace hits [1] and [2].  I bet the "index out of range" is from [2]'s:

  existingCurr = &(*existing)[i]

because we don't re-enter the loop over existing [3] when we drop an entry [4].  This might be a common pattern among our resourcemerge implementations.

[1]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/pkg/payload/task_graph.go#L591
[2]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L69
[3]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L65
[4]: https://github.com/openshift/cluster-version-operator/blob/e5e468961b5fd687f65844d511690d7ed0046447/lib/resourcemerge/core.go#L76

Comment 3 Gaoyun Pei 2020-02-27 10:05:30 UTC
Verify this bug on payload 4.2.0-0.nightly-2020-02-26-233230

Downgrade a 4.3 cluster to 4.2.0-0.nightly-2020-02-26-233230, no panic in CVO pod, it's running well:

# oc get pod -n openshift-cluster-version
NAME                                        READY   STATUS      RESTARTS   AGE
cluster-version-operator-5498bcdf8f-z8fch   1/1     Running     0          152m
version--gkvd4-bjv4d                        0/1     Completed   0          153m
version--xw9vl-xzrg8                        0/1     Completed   0          3h15m

Comment 5 errata-xmlrpc 2020-03-10 11:41:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0685


Note You need to log in before you can comment on or make changes to this bug.