Bug 1732911 - catalog-operator will panic when the installing operator's ClusterRole/ClusterRoleBinding exist
Summary: catalog-operator will panic when the installing operator's ClusterRole/Cluste...
Keywords:
Status: CLOSED DUPLICATE of bug 1732302
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.1.z
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On: 1732214 1732302 1733324
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-24 16:55 UTC by Evan Cordell
Modified: 2019-08-06 21:27 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1732302
Environment:
Last Closed: 2019-07-24 16:58:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Evan Cordell 2019-07-24 16:55:33 UTC
+++ This bug was initially created as a clone of Bug #1732302 +++

Backport for 4.1

Description of problem:
This bug is a clone of bug 1732214. It should be fixed in the 4.1.z version.
catalog-operator pod in crash loop with following error

```
time="2019-07-22T21:21:53Z" level=info msg="log level info"
time="2019-07-22T21:21:53Z" level=info msg="TLS keys set, using https for metrics"
time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config"
time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config"
time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config"
time="2019-07-22T21:21:53Z" level=info msg="connection established. cluster-version: v1.13.4+6569b4f"
time="2019-07-22T21:21:53Z" level=info msg="operator ready"
time="2019-07-22T21:21:53Z" level=info msg="starting informers..."
time="2019-07-22T21:21:53Z" level=info msg="waiting for caches to sync..."
time="2019-07-22T21:21:53Z" level=info msg="starting workers..."
time="2019-07-22T21:21:53Z" level=info msg=syncing id=J0bpl ip=install-ssz97 namespace=openshift-dedicated-admin phase=Installing
time="2019-07-22T21:21:53Z" level=info msg="building connection to registry" currentSource="{community-operators openshift-marketplace}" id=WiU2e source=community-operators
time="2019-07-22T21:21:53Z" level=info msg=syncing id=uqL/b ip=install-7h475 namespace=openshift-logging phase=Complete
time="2019-07-22T21:21:53Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{community-operators openshift-marketplace}" id=WiU2e source=community-operators
time="2019-07-22T21:21:53Z" level=info msg=syncing id=BZh1k ip=install-c2thj namespace=openshift-monitoring phase=Complete
time="2019-07-22T21:21:54Z" level=info msg="retrying openshift-logging"
E0722 21:21:54.040243       1 queueinformer_operator.go:186] Sync "openshift-logging" failed: no catalog sources available
time="2019-07-22T21:21:54Z" level=info msg="retrying openshift-operator-lifecycle-manager"
E0722 21:21:54.239699       1 queueinformer_operator.go:186] Sync "openshift-operator-lifecycle-manager" failed: no catalog sources available
time="2019-07-22T21:21:54Z" level=info msg="building connection to registry" currentSource="{configure-alertmanager-operator-registry openshift-operator-lifecycle-manager}" id=lLs1i source=configure-alertmanager-operator-registry
time="2019-07-22T21:21:54Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{configure-alertmanager-operator-registry openshift-operator-lifecycle-manager}" id=lLs1i source=configure-alertmanager-operator-registry
panic: assignment to entry in nil map

goroutine 183 [running]:
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.(*Operator).ExecutePlan(0xc4204bec00, 0xc421924000, 0x1, 0x1)
        /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go:1187 +0x4243
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.transitionInstallPlanState(0xc420084180, 0x1601da0, 0xc4204bec00, 0xc420cde4c0, 0xb, 0xc4206cb5a0, 0x1d, 0xc420cde500, 0xd, 0xc420cde4e0, ...)
        /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go:992 +0x2ad
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.(*Operator).syncInstallPlans(0xc4204bec00, 0x14a01c0, 0xc420888000, 0xc420888000, 0x1)


Version-Release number of selected component (if applicable):
OLM: 4.1.6 (release-4.1)

How reproducible:
always

Steps to Reproduce:
1. Deploy ClusterRole/ClusterRoleBinding to the cluster first manually
2. Deploy operator that attempts to create the same ClusterRole/ClusterRoleBinding
3. Observe panic and crash loop in catalog-operator pod

Actual results:
Crash looping catalog-operator pod

Expected results:
catalog-operator is able to re-label ClusterRole/ClusterRoleBinding and continue


Additional info:
Proposed fix for master branch(4.2): https://github.com/operator-framework/operator-lifecycle-manager/pull/959

This will also require backporting a fix to release branches, as the github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog package was changed in a substantial way in https://github.com/operator-framework/operator-lifecycle-manager/pull/892

Comment 1 Evan Cordell 2019-07-24 16:58:19 UTC

*** This bug has been marked as a duplicate of bug 1732302 ***


Note You need to log in before you can comment on or make changes to this bug.