Bug 1732911

Summary: catalog-operator will panic when the installing operator's ClusterRole/ClusterRoleBinding exist
Product: OpenShift Container Platform Reporter: Evan Cordell <ecordell>
Component: OLMAssignee: Evan Cordell <ecordell>
OLM sub component: OLM QA Contact: Jian Zhang <jiazha>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: medium CC: bandrade, chezhang, chuo, jfan, jiazha, scolange, zitang
Version: 4.1.z   
Target Milestone: ---   
Target Release: 4.1.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1732302 Environment:
Last Closed: 2019-07-24 16:58:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1732214, 1732302, 1733324    
Bug Blocks:    

Description Evan Cordell 2019-07-24 16:55:33 UTC
+++ This bug was initially created as a clone of Bug #1732302 +++

Backport for 4.1

Description of problem:
This bug is a clone of bug 1732214. It should be fixed in the 4.1.z version.
catalog-operator pod in crash loop with following error

```
time="2019-07-22T21:21:53Z" level=info msg="log level info"
time="2019-07-22T21:21:53Z" level=info msg="TLS keys set, using https for metrics"
time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config"
time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config"
time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config"
time="2019-07-22T21:21:53Z" level=info msg="connection established. cluster-version: v1.13.4+6569b4f"
time="2019-07-22T21:21:53Z" level=info msg="operator ready"
time="2019-07-22T21:21:53Z" level=info msg="starting informers..."
time="2019-07-22T21:21:53Z" level=info msg="waiting for caches to sync..."
time="2019-07-22T21:21:53Z" level=info msg="starting workers..."
time="2019-07-22T21:21:53Z" level=info msg=syncing id=J0bpl ip=install-ssz97 namespace=openshift-dedicated-admin phase=Installing
time="2019-07-22T21:21:53Z" level=info msg="building connection to registry" currentSource="{community-operators openshift-marketplace}" id=WiU2e source=community-operators
time="2019-07-22T21:21:53Z" level=info msg=syncing id=uqL/b ip=install-7h475 namespace=openshift-logging phase=Complete
time="2019-07-22T21:21:53Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{community-operators openshift-marketplace}" id=WiU2e source=community-operators
time="2019-07-22T21:21:53Z" level=info msg=syncing id=BZh1k ip=install-c2thj namespace=openshift-monitoring phase=Complete
time="2019-07-22T21:21:54Z" level=info msg="retrying openshift-logging"
E0722 21:21:54.040243       1 queueinformer_operator.go:186] Sync "openshift-logging" failed: no catalog sources available
time="2019-07-22T21:21:54Z" level=info msg="retrying openshift-operator-lifecycle-manager"
E0722 21:21:54.239699       1 queueinformer_operator.go:186] Sync "openshift-operator-lifecycle-manager" failed: no catalog sources available
time="2019-07-22T21:21:54Z" level=info msg="building connection to registry" currentSource="{configure-alertmanager-operator-registry openshift-operator-lifecycle-manager}" id=lLs1i source=configure-alertmanager-operator-registry
time="2019-07-22T21:21:54Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{configure-alertmanager-operator-registry openshift-operator-lifecycle-manager}" id=lLs1i source=configure-alertmanager-operator-registry
panic: assignment to entry in nil map

goroutine 183 [running]:
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.(*Operator).ExecutePlan(0xc4204bec00, 0xc421924000, 0x1, 0x1)
        /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go:1187 +0x4243
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.transitionInstallPlanState(0xc420084180, 0x1601da0, 0xc4204bec00, 0xc420cde4c0, 0xb, 0xc4206cb5a0, 0x1d, 0xc420cde500, 0xd, 0xc420cde4e0, ...)
        /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go:992 +0x2ad
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.(*Operator).syncInstallPlans(0xc4204bec00, 0x14a01c0, 0xc420888000, 0xc420888000, 0x1)


Version-Release number of selected component (if applicable):
OLM: 4.1.6 (release-4.1)

How reproducible:
always

Steps to Reproduce:
1. Deploy ClusterRole/ClusterRoleBinding to the cluster first manually
2. Deploy operator that attempts to create the same ClusterRole/ClusterRoleBinding
3. Observe panic and crash loop in catalog-operator pod

Actual results:
Crash looping catalog-operator pod

Expected results:
catalog-operator is able to re-label ClusterRole/ClusterRoleBinding and continue


Additional info:
Proposed fix for master branch(4.2): https://github.com/operator-framework/operator-lifecycle-manager/pull/959

This will also require backporting a fix to release branches, as the github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog package was changed in a substantial way in https://github.com/operator-framework/operator-lifecycle-manager/pull/892

Comment 1 Evan Cordell 2019-07-24 16:58:19 UTC

*** This bug has been marked as a duplicate of bug 1732302 ***