Bug 1918525 - OLM enters infinite loop if Pending CSV replaces itself
Summary: OLM enters infinite loop if Pending CSV replaces itself
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.6.z
Assignee: Ben Luddy
QA Contact: Salvatore Colangelo
Depends On: 1916021
TreeView+ depends on / blocked
Reported: 2021-01-20 23:17 UTC by OpenShift BugZilla Robot
Modified: 2021-02-08 13:51 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2021-02-08 13:51:26 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 1970 0 None closed [release-4.6] Bug 1918525: Fix infinite loop when a CSV replacement chain contains a cycle. 2021-02-19 05:58:58 UTC
Red Hat Product Errata RHSA-2021:0308 0 None None None 2021-02-08 13:51:57 UTC

Description OpenShift BugZilla Robot 2021-01-20 23:17:04 UTC
+++ This bug was initially created as a clone of Bug #1916021 +++

Created attachment 1747236 [details]
cpu flame graph from olm process

Description of problem:

The following test in ./pkg/controller/operators/olm never terminates:

func TestGetReplacementChain(t *testing.T) {
	csv := &v1alpha1.ClusterServiceVersion{
		ObjectMeta: metav1.ObjectMeta{
			Name: "foo",
		Spec: v1alpha1.ClusterServiceVersionSpec{
			Replaces: "foo",
	(&Operator{}).getReplacementChain(csv, map[string]*v1alpha1.ClusterServiceVersion{csv.GetName(): csv})

Version-Release number of selected component (if applicable): 4.6.1

How reproducible: Always?

Steps to Reproduce:
1. Create a CSV that replaces itself (sample attached).

Actual results:

The olm-operator pod jumps to 100% CPU utilization and doesn't make progress reconciling the CSV. Even after deleting the CSV, the olm-operator pod has to be deleted in order to recover.

Expected results:

CSV reconciled as normal.

--- Additional comment from bluddy on 2021-01-13 23:54:52 UTC ---

Created attachment 1747237 [details]
sample bad CSV manifest

Comment 4 Salvatore Colangelo 2021-02-01 12:03:07 UTC
[scolange@scolange ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2021-01-30-211400   True        False         23m     Cluster version is 4.6.0-0.nightly-2021-01-30-211

[scolange@scolange ~]$ oc -n openshift-operator-lifecycle-manager exec catalog-operator-5f9bfcf948-dm25n -- olm --version
OLM version: 0.16.1
git commit: 4268b669a6f90423a4eea3d5bdcf6bf00af48a6d

[scolange@scolange .kube]$ oc create ns olm
namespace/olm created

1. Create an operatorGroup

[scolange@scolange .kube]$ oc create -f operatorGroup.yaml 
operatorgroup.operators.coreos.com/default-og created

2. Create a csv in atteched and verify it

[scolange@scolange .kube]$ oc create -f testing.yaml
clusterserviceversion.operators.coreos.com/packageserver created

3. Verify the if CPU going to 100% of olm operator 

[scolange@scolange .kube]$ oc get csv -n olm
NAME            DISPLAY          VERSION   REPLACES        PHASE
packageserver   Package Server   1.0.0     packageserver   Pending

kubectl -n openshift-operator-lifecycle-manager exec --stdin --tty olm-operator-5d865c694d-fjwjj -- /bin/bash

top - 12:00:09 up 23 min,  0 users,  load average: 1.76, 1.45, 0.94
Tasks:   3 total,   1 running,   2 sleeping,   0 stopped,   0 zombie
%Cpu(s): 37.6 us, 10.2 sy,  0.0 ni, 46.0 id,  0.3 wa,  2.1 hi,  3.9 si,  0.0 st
MiB Mem :  15025.6 total,   6455.2 free,   5547.2 used,   3023.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  10277.8 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                   
      1 1001      20   0 1709652 252460  32804 S   0.0   1.6   0:07.16 olm                                                       
     18 1001      20   0   12020   3160   2744 S   0.0   0.0   0:00.00 bash                                                      
     25 1001      20   0   49040   3828   3228 R   0.0   0.0   0:00.02 top 

4. Delete the csv 

[scolange@scolange .kube]$ oc delete csv packageserver -n olm
clusterserviceversion.operators.coreos.com "packageserver" deleted


Comment 6 errata-xmlrpc 2021-02-08 13:51:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.6.16 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.