Bug 1732911

Summary:	catalog-operator will panic when the installing operator's ClusterRole/ClusterRoleBinding exist
Product:	OpenShift Container Platform	Reporter:	Evan Cordell <ecordell>
Component:	OLM	Assignee:	Evan Cordell <ecordell>
OLM sub component:	OLM	QA Contact:	Jian Zhang <jiazha>
Status:	CLOSED DUPLICATE	Docs Contact:
Severity:	medium
Priority:	medium	CC:	bandrade, chezhang, chuo, jfan, jiazha, scolange, zitang
Version:	4.1.z
Target Milestone:	---
Target Release:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1732302	Environment:
Last Closed:	2019-07-24 16:58:19 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1732214, 1732302, 1733324
Bug Blocks:

Description Evan Cordell 2019-07-24 16:55:33 UTC

+++ This bug was initially created as a clone of Bug #1732302 +++

Backport for 4.1

Description of problem:
This bug is a clone of bug 1732214. It should be fixed in the 4.1.z version.
catalog-operator pod in crash loop with following error

```
time="2019-07-22T21:21:53Z" level=info msg="log level info"
time="2019-07-22T21:21:53Z" level=info msg="TLS keys set, using https for metrics"
time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config"
time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config"
time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config"
time="2019-07-22T21:21:53Z" level=info msg="connection established. cluster-version: v1.13.4+6569b4f"
time="2019-07-22T21:21:53Z" level=info msg="operator ready"
time="2019-07-22T21:21:53Z" level=info msg="starting informers..."
time="2019-07-22T21:21:53Z" level=info msg="waiting for caches to sync..."
time="2019-07-22T21:21:53Z" level=info msg="starting workers..."
time="2019-07-22T21:21:53Z" level=info msg=syncing id=J0bpl ip=install-ssz97 namespace=openshift-dedicated-admin phase=Installing
time="2019-07-22T21:21:53Z" level=info msg="building connection to registry" currentSource="{community-operators openshift-marketplace}" id=WiU2e source=community-operators
time="2019-07-22T21:21:53Z" level=info msg=syncing id=uqL/b ip=install-7h475 namespace=openshift-logging phase=Complete
time="2019-07-22T21:21:53Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{community-operators openshift-marketplace}" id=WiU2e source=community-operators
time="2019-07-22T21:21:53Z" level=info msg=syncing id=BZh1k ip=install-c2thj namespace=openshift-monitoring phase=Complete
time="2019-07-22T21:21:54Z" level=info msg="retrying openshift-logging"
E0722 21:21:54.040243       1 queueinformer_operator.go:186] Sync "openshift-logging" failed: no catalog sources available
time="2019-07-22T21:21:54Z" level=info msg="retrying openshift-operator-lifecycle-manager"
E0722 21:21:54.239699       1 queueinformer_operator.go:186] Sync "openshift-operator-lifecycle-manager" failed: no catalog sources available
time="2019-07-22T21:21:54Z" level=info msg="building connection to registry" currentSource="{configure-alertmanager-operator-registry openshift-operator-lifecycle-manager}" id=lLs1i source=configure-alertmanager-operator-registry
time="2019-07-22T21:21:54Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{configure-alertmanager-operator-registry openshift-operator-lifecycle-manager}" id=lLs1i source=configure-alertmanager-operator-registry
panic: assignment to entry in nil map

goroutine 183 [running]:
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.(*Operator).ExecutePlan(0xc4204bec00, 0xc421924000, 0x1, 0x1)
        /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go:1187 +0x4243
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.transitionInstallPlanState(0xc420084180, 0x1601da0, 0xc4204bec00, 0xc420cde4c0, 0xb, 0xc4206cb5a0, 0x1d, 0xc420cde500, 0xd, 0xc420cde4e0, ...)
        /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go:992 +0x2ad
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.(*Operator).syncInstallPlans(0xc4204bec00, 0x14a01c0, 0xc420888000, 0xc420888000, 0x1)


Version-Release number of selected component (if applicable):
OLM: 4.1.6 (release-4.1)

How reproducible:
always

Steps to Reproduce:
1. Deploy ClusterRole/ClusterRoleBinding to the cluster first manually
2. Deploy operator that attempts to create the same ClusterRole/ClusterRoleBinding
3. Observe panic and crash loop in catalog-operator pod

Actual results:
Crash looping catalog-operator pod

Expected results:
catalog-operator is able to re-label ClusterRole/ClusterRoleBinding and continue


Additional info:
Proposed fix for master branch(4.2): https://github.com/operator-framework/operator-lifecycle-manager/pull/959

This will also require backporting a fix to release branches, as the github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog package was changed in a substantial way in https://github.com/operator-framework/operator-lifecycle-manager/pull/892

Comment 1 Evan Cordell 2019-07-24 16:58:19 UTC


*** This bug has been marked as a duplicate of bug 1732302 ***