+++ This bug was initially created as a clone of Bug #1916021 +++ Created attachment 1747236 [details] cpu flame graph from olm process Description of problem: The following test in ./pkg/controller/operators/olm never terminates: func TestGetReplacementChain(t *testing.T) { csv := &v1alpha1.ClusterServiceVersion{ ObjectMeta: metav1.ObjectMeta{ Name: "foo", }, Spec: v1alpha1.ClusterServiceVersionSpec{ Replaces: "foo", }, } (&Operator{}).getReplacementChain(csv, map[string]*v1alpha1.ClusterServiceVersion{csv.GetName(): csv}) } Version-Release number of selected component (if applicable): 4.6.1 How reproducible: Always? Steps to Reproduce: 1. Create a CSV that replaces itself (sample attached). Actual results: The olm-operator pod jumps to 100% CPU utilization and doesn't make progress reconciling the CSV. Even after deleting the CSV, the olm-operator pod has to be deleted in order to recover. Expected results: CSV reconciled as normal. --- Additional comment from bluddy on 2021-01-13 23:54:52 UTC --- Created attachment 1747237 [details] sample bad CSV manifest
[scolange@scolange ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2021-01-30-211400 True False 23m Cluster version is 4.6.0-0.nightly-2021-01-30-211 [scolange@scolange ~]$ oc -n openshift-operator-lifecycle-manager exec catalog-operator-5f9bfcf948-dm25n -- olm --version OLM version: 0.16.1 git commit: 4268b669a6f90423a4eea3d5bdcf6bf00af48a6d [scolange@scolange .kube]$ oc create ns olm namespace/olm created 1. Create an operatorGroup [scolange@scolange .kube]$ oc create -f operatorGroup.yaml operatorgroup.operators.coreos.com/default-og created 2. Create a csv in atteched and verify it [scolange@scolange .kube]$ oc create -f testing.yaml clusterserviceversion.operators.coreos.com/packageserver created 3. Verify the if CPU going to 100% of olm operator [scolange@scolange .kube]$ oc get csv -n olm NAME DISPLAY VERSION REPLACES PHASE packageserver Package Server 1.0.0 packageserver Pending kubectl -n openshift-operator-lifecycle-manager exec --stdin --tty olm-operator-5d865c694d-fjwjj -- /bin/bash top - 12:00:09 up 23 min, 0 users, load average: 1.76, 1.45, 0.94 Tasks: 3 total, 1 running, 2 sleeping, 0 stopped, 0 zombie %Cpu(s): 37.6 us, 10.2 sy, 0.0 ni, 46.0 id, 0.3 wa, 2.1 hi, 3.9 si, 0.0 st MiB Mem : 15025.6 total, 6455.2 free, 5547.2 used, 3023.2 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 10277.8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 1001 20 0 1709652 252460 32804 S 0.0 1.6 0:07.16 olm 18 1001 20 0 12020 3160 2744 S 0.0 0.0 0:00.00 bash 25 1001 20 0 49040 3828 3228 R 0.0 0.0 0:00.02 top 4. Delete the csv [scolange@scolange .kube]$ oc delete csv packageserver -n olm clusterserviceversion.operators.coreos.com "packageserver" deleted LGMT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.6.16 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0308