Bug 2027585

Summary: CVO crashes when changing spec.upstream to a cincinnati graph which includes invalid conditional edges
Product: OpenShift Container Platform Reporter: Yang Yang <yanyang>
Component: Cluster Version OperatorAssignee: W. Trevor King <wking>
Status: CLOSED ERRATA QA Contact: Yang Yang <yanyang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.10CC: aos-bugs, bleanhar, wking
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:31:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
CVO log file none

Description Yang Yang 2021-11-30 07:17:43 UTC
Created attachment 1844118 [details]
CVO log file

Description of problem:
When using a dummy cincinnati graph which includes invalid conditional edges, CVO keeps restarting and goes to CrashLoopBackOff eventually.

Cincinnati graph is available online: https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid.json

Snipped CVO log:

6607 W1130 07:10:17.721748       1 cincinnati.go:220] Conditional update to 4.10.0-0.nightly-2021-11-26-195620, risk "TypeNull", has emp     ty pruned matchingRules; dropping this target to avoid rejections when pushing to the Kubernetes API server. Pruning results: Skipp     ing unrecognized cluster condition type ""
6608 I1130 07:10:17.721793       1 cvo.go:582] Finished syncing available updates "openshift-cluster-version/version" (136.217034ms)
6609 E1130 07:10:17.721916       1 runtime.go:78] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error:      index out of range [0] with length 0)
6610 goroutine 177 [running]:
6611 k8s.io/apimachinery/pkg/util/runtime.logPanic({0x196f5a0, 0xc000eee2d0})
6612         /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x85
6613 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000af1680})
6614         /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x75
6615 panic({0x196f5a0, 0xc000eee2d0})
6616         /usr/lib/golang/src/runtime/panic.go:1038 +0x215
6617 github.com/openshift/cluster-version-operator/pkg/cincinnati.Client.GetUpdates({{0xa2, 0x74, 0x1e, 0xaf, 0x95, 0xa0, 0x44, 0x40, 0x     97, 0x24, ...}, ...}, ...)
6618         /go/src/github.com/openshift/cluster-version-operator/pkg/cincinnati/cincinnati.go:218 +0x2f14
6619 github.com/openshift/cluster-version-operator/pkg/cvo.calculateAvailableUpdatesStatus({0x1ce3a10, 0xc000ecf0c0}, {0xc001404120, 0x2     4}, 0xc00017a140, {0xc001270000, 0x69}, {0x1a653f5, 0x5}, {0xc0025004b3, ...}, ...)
6620         /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/availableupdates.go:226 +0x9c5
6621 github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).syncAvailableUpdates(0xc00060a240, {0x1ce3a10, 0xc000ecf0c0}, 0xc     001ec6000)
6622         /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/availableupdates.go:53 +0x353
6623 github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).availableUpdatesSync(0xc00060a240, {0x1ce3a10, 0xc000ecf0c0}, {0x     c00040ab70, 0x21})
6624         /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:592 +0x3dc
6625 github.com/openshift/cluster-version-operator/pkg/cvo.processNextWorkItem({0x1ce3a10, 0xc000ecf0c0}, {0x1d15a88, 0xc0002f6940}, 0xc     0016e7d68, 0x8)

Version-Release number of the following components:

How reproducible:

Steps to Reproduce:
1. Prepare a dummy cincinnati graph which should have invalid conditional edge in it, like https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid.json

2. Patch cluster to use the graph
# oc patch clusterversion/version --patch '{"spec":{"upstream":"https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid.json"}}' --type=merge

Actual results:
CVO crashes.

# oc get all -n openshift-cluster-version
NAME                                            READY   STATUS             RESTARTS       AGE
pod/cluster-version-operator-7b58db6899-sjsnh   0/1     CrashLoopBackOff   32 (90s ago)   28h

NAME                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/cluster-version-operator   ClusterIP   <none>        9099/TCP   28h

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-version-operator   0/1     1            0           28h

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/cluster-version-operator-7b58db6899   1         1         0       28h

Expected results:
CVO can detect incorrect configuration and prompt errors

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 4 Yang Yang 2021-12-02 06:20:54 UTC
Verifying with 4.10.0-0.nightly-2021-12-01-164437

1. Install a cluster with 4.10.0-0.nightly-2021-12-01-164437
# oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-12-01-164437   True        False         86m     Cluster version is 4.10.0-0.nightly-2021-12-01-164437

2. Patch to use the dummy cincinnati graph

# oc adm upgrade --include-not-recommended
Cluster version is 4.10.0-0.nightly-2021-12-01-164437

Upstream: https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid.json
Channel: stable-4.10
No updates available. You may force an upgrade to a specific release image, but doing so may not be supported and may result in downtime or data loss.

No updates which are not recommended based on your cluster configuration are available.

So, cvo drops all the invalid conditional edges.

3. Check CVO

# oc get all -n openshift-cluster-version
NAME                                            READY   STATUS    RESTARTS   AGE
pod/cluster-version-operator-6f5d9777dc-j7vvj   1/1     Running   0          109m

NAME                               TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/cluster-version-operator   ClusterIP   <none>        9099/TCP   109m

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-version-operator   1/1     1            1           110m

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/cluster-version-operator-6f5d9777dc   1         1         1       109m

CVO is running well. Moving it to verified state.

Comment 8 errata-xmlrpc 2022-03-10 16:31:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.