Bug 2027585 - CVO crashes when changing spec.upstream to a cincinnati graph which includes invalid conditional edges
Summary: CVO crashes when changing spec.upstream to a cincinnati graph which includes ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.10.0
Assignee: W. Trevor King
QA Contact: Yang Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-30 07:17 UTC by Yang Yang
Modified: 2022-03-10 16:31 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:31:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
CVO log file (677.39 KB, text/plain)
2021-11-30 07:17 UTC, Yang Yang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 697 0 None open Bug 2027585: pkg/cincinnati: Fix panic for conditional edges with risks after an invalid risk 2021-11-30 19:20:53 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:31:22 UTC

Description Yang Yang 2021-11-30 07:17:43 UTC
Created attachment 1844118 [details]
CVO log file

Description of problem:
When using a dummy cincinnati graph which includes invalid conditional edges, CVO keeps restarting and goes to CrashLoopBackOff eventually.

Cincinnati graph is available online: https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid.json

Snipped CVO log:

6607 W1130 07:10:17.721748       1 cincinnati.go:220] Conditional update to 4.10.0-0.nightly-2021-11-26-195620, risk "TypeNull", has emp     ty pruned matchingRules; dropping this target to avoid rejections when pushing to the Kubernetes API server. Pruning results: Skipp     ing unrecognized cluster condition type ""
6608 I1130 07:10:17.721793       1 cvo.go:582] Finished syncing available updates "openshift-cluster-version/version" (136.217034ms)
6609 E1130 07:10:17.721916       1 runtime.go:78] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error:      index out of range [0] with length 0)
6610 goroutine 177 [running]:
6611 k8s.io/apimachinery/pkg/util/runtime.logPanic({0x196f5a0, 0xc000eee2d0})
6612         /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x85
6613 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000af1680})
6614         /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x75
6615 panic({0x196f5a0, 0xc000eee2d0})
6616         /usr/lib/golang/src/runtime/panic.go:1038 +0x215
6617 github.com/openshift/cluster-version-operator/pkg/cincinnati.Client.GetUpdates({{0xa2, 0x74, 0x1e, 0xaf, 0x95, 0xa0, 0x44, 0x40, 0x     97, 0x24, ...}, ...}, ...)
6618         /go/src/github.com/openshift/cluster-version-operator/pkg/cincinnati/cincinnati.go:218 +0x2f14
6619 github.com/openshift/cluster-version-operator/pkg/cvo.calculateAvailableUpdatesStatus({0x1ce3a10, 0xc000ecf0c0}, {0xc001404120, 0x2     4}, 0xc00017a140, {0xc001270000, 0x69}, {0x1a653f5, 0x5}, {0xc0025004b3, ...}, ...)
6620         /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/availableupdates.go:226 +0x9c5
6621 github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).syncAvailableUpdates(0xc00060a240, {0x1ce3a10, 0xc000ecf0c0}, 0xc     001ec6000)
6622         /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/availableupdates.go:53 +0x353
6623 github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).availableUpdatesSync(0xc00060a240, {0x1ce3a10, 0xc000ecf0c0}, {0x     c00040ab70, 0x21})
6624         /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:592 +0x3dc
6625 github.com/openshift/cluster-version-operator/pkg/cvo.processNextWorkItem({0x1ce3a10, 0xc000ecf0c0}, {0x1d15a88, 0xc0002f6940}, 0xc     0016e7d68, 0x8)
<snipped>

Version-Release number of the following components:
4.10.0-0.nightly-2021-11-26-145635

How reproducible:
100%

Steps to Reproduce:
1. Prepare a dummy cincinnati graph which should have invalid conditional edge in it, like https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid.json

2. Patch cluster to use the graph
# oc patch clusterversion/version --patch '{"spec":{"upstream":"https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid.json"}}' --type=merge

Actual results:
CVO crashes.

# oc get all -n openshift-cluster-version
NAME                                            READY   STATUS             RESTARTS       AGE
pod/cluster-version-operator-7b58db6899-sjsnh   0/1     CrashLoopBackOff   32 (90s ago)   28h

NAME                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/cluster-version-operator   ClusterIP   172.30.16.143   <none>        9099/TCP   28h

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-version-operator   0/1     1            0           28h

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/cluster-version-operator-7b58db6899   1         1         0       28h


Expected results:
CVO can detect incorrect configuration and prompt errors

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 4 Yang Yang 2021-12-02 06:20:54 UTC
Verifying with 4.10.0-0.nightly-2021-12-01-164437

1. Install a cluster with 4.10.0-0.nightly-2021-12-01-164437
# oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-12-01-164437   True        False         86m     Cluster version is 4.10.0-0.nightly-2021-12-01-164437

2. Patch to use the dummy cincinnati graph

# oc adm upgrade --include-not-recommended
Cluster version is 4.10.0-0.nightly-2021-12-01-164437

Upstream: https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid.json
Channel: stable-4.10
No updates available. You may force an upgrade to a specific release image, but doing so may not be supported and may result in downtime or data loss.

No updates which are not recommended based on your cluster configuration are available.

So, cvo drops all the invalid conditional edges.

3. Check CVO

# oc get all -n openshift-cluster-version
NAME                                            READY   STATUS    RESTARTS   AGE
pod/cluster-version-operator-6f5d9777dc-j7vvj   1/1     Running   0          109m

NAME                               TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/cluster-version-operator   ClusterIP   172.30.137.0   <none>        9099/TCP   109m

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-version-operator   1/1     1            1           110m

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/cluster-version-operator-6f5d9777dc   1         1         1       109m

CVO is running well. Moving it to verified state.

Comment 8 errata-xmlrpc 2022-03-10 16:31:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.