1918525 – OLM enters infinite loop if Pending CSV replaces itself

Bug 1918525 - OLM enters infinite loop if Pending CSV replaces itself

Summary: OLM enters infinite loop if Pending CSV replaces itself

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	OLM
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.6.z
Assignee:	Ben Luddy
QA Contact:	Salvatore Colangelo
Docs Contact:
URL:
Whiteboard:
Depends On:	1916021
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-20 23:17 UTC by OpenShift BugZilla Robot
Modified:	2021-02-08 13:51 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-08 13:51:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	operator-framework operator-lifecycle-manager pull 1970	0	None	closed	[release-4.6] Bug 1918525: Fix infinite loop when a CSV replacement chain contains a cycle.	2021-02-19 05:58:58 UTC
Red Hat Product Errata	RHSA-2021:0308	0	None	None	None	2021-02-08 13:51:57 UTC

Description OpenShift BugZilla Robot 2021-01-20 23:17:04 UTC

+++ This bug was initially created as a clone of Bug #1916021 +++

Created attachment 1747236 [details]
cpu flame graph from olm process

Description of problem:

The following test in ./pkg/controller/operators/olm never terminates:

func TestGetReplacementChain(t *testing.T) {
	csv := &v1alpha1.ClusterServiceVersion{
		ObjectMeta: metav1.ObjectMeta{
			Name: "foo",
		},
		Spec: v1alpha1.ClusterServiceVersionSpec{
			Replaces: "foo",
		},
	}
	(&Operator{}).getReplacementChain(csv, map[string]*v1alpha1.ClusterServiceVersion{csv.GetName(): csv})
}


Version-Release number of selected component (if applicable): 4.6.1


How reproducible: Always?


Steps to Reproduce:
1. Create a CSV that replaces itself (sample attached).

Actual results:

The olm-operator pod jumps to 100% CPU utilization and doesn't make progress reconciling the CSV. Even after deleting the CSV, the olm-operator pod has to be deleted in order to recover.

Expected results:

CSV reconciled as normal.

--- Additional comment from bluddy on 2021-01-13 23:54:52 UTC ---

Created attachment 1747237 [details]
sample bad CSV manifest

Comment 4 Salvatore Colangelo 2021-02-01 12:03:07 UTC

[scolange@scolange ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2021-01-30-211400   True        False         23m     Cluster version is 4.6.0-0.nightly-2021-01-30-211

[scolange@scolange ~]$ oc -n openshift-operator-lifecycle-manager exec catalog-operator-5f9bfcf948-dm25n -- olm --version
OLM version: 0.16.1
git commit: 4268b669a6f90423a4eea3d5bdcf6bf00af48a6d

[scolange@scolange .kube]$ oc create ns olm
namespace/olm created

1. Create an operatorGroup

[scolange@scolange .kube]$ oc create -f operatorGroup.yaml 
operatorgroup.operators.coreos.com/default-og created

2. Create a csv in atteched and verify it

[scolange@scolange .kube]$ oc create -f testing.yaml
clusterserviceversion.operators.coreos.com/packageserver created

3. Verify the if CPU going to 100% of olm operator 

[scolange@scolange .kube]$ oc get csv -n olm
NAME            DISPLAY          VERSION   REPLACES        PHASE
packageserver   Package Server   1.0.0     packageserver   Pending


kubectl -n openshift-operator-lifecycle-manager exec --stdin --tty olm-operator-5d865c694d-fjwjj -- /bin/bash

top - 12:00:09 up 23 min,  0 users,  load average: 1.76, 1.45, 0.94
Tasks:   3 total,   1 running,   2 sleeping,   0 stopped,   0 zombie
%Cpu(s): 37.6 us, 10.2 sy,  0.0 ni, 46.0 id,  0.3 wa,  2.1 hi,  3.9 si,  0.0 st
MiB Mem :  15025.6 total,   6455.2 free,   5547.2 used,   3023.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  10277.8 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                   
      1 1001      20   0 1709652 252460  32804 S   0.0   1.6   0:07.16 olm                                                       
     18 1001      20   0   12020   3160   2744 S   0.0   0.0   0:00.00 bash                                                      
     25 1001      20   0   49040   3828   3228 R   0.0   0.0   0:00.02 top 

4. Delete the csv 

[scolange@scolange .kube]$ oc delete csv packageserver -n olm
clusterserviceversion.operators.coreos.com "packageserver" deleted


LGMT

Comment 6 errata-xmlrpc 2021-02-08 13:51:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.6.16 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0308

Note You need to log in before you can comment on or make changes to this bug.