Description of problem: catalog-operator pod in crash loop with following error ``` time="2019-07-22T21:21:53Z" level=info msg="log level info" time="2019-07-22T21:21:53Z" level=info msg="TLS keys set, using https for metrics" time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config" time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config" time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config" time="2019-07-22T21:21:53Z" level=info msg="connection established. cluster-version: v1.13.4+6569b4f" time="2019-07-22T21:21:53Z" level=info msg="operator ready" time="2019-07-22T21:21:53Z" level=info msg="starting informers..." time="2019-07-22T21:21:53Z" level=info msg="waiting for caches to sync..." time="2019-07-22T21:21:53Z" level=info msg="starting workers..." time="2019-07-22T21:21:53Z" level=info msg=syncing id=J0bpl ip=install-ssz97 namespace=openshift-dedicated-admin phase=Installing time="2019-07-22T21:21:53Z" level=info msg="building connection to registry" currentSource="{community-operators openshift-marketplace}" id=WiU2e source=community-operators time="2019-07-22T21:21:53Z" level=info msg=syncing id=uqL/b ip=install-7h475 namespace=openshift-logging phase=Complete time="2019-07-22T21:21:53Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{community-operators openshift-marketplace}" id=WiU2e source=community-operators time="2019-07-22T21:21:53Z" level=info msg=syncing id=BZh1k ip=install-c2thj namespace=openshift-monitoring phase=Complete time="2019-07-22T21:21:54Z" level=info msg="retrying openshift-logging" E0722 21:21:54.040243 1 queueinformer_operator.go:186] Sync "openshift-logging" failed: no catalog sources available time="2019-07-22T21:21:54Z" level=info msg="retrying openshift-operator-lifecycle-manager" E0722 21:21:54.239699 1 queueinformer_operator.go:186] Sync "openshift-operator-lifecycle-manager" failed: no catalog sources available time="2019-07-22T21:21:54Z" level=info msg="building connection to registry" currentSource="{configure-alertmanager-operator-registry openshift-operator-lifecycle-manager}" id=lLs1i source=configure-alertmanager-operator-registry time="2019-07-22T21:21:54Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{configure-alertmanager-operator-registry openshift-operator-lifecycle-manager}" id=lLs1i source=configure-alertmanager-operator-registry panic: assignment to entry in nil map goroutine 183 [running]: github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.(*Operator).ExecutePlan(0xc4204bec00, 0xc421924000, 0x1, 0x1) /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go:1187 +0x4243 github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.transitionInstallPlanState(0xc420084180, 0x1601da0, 0xc4204bec00, 0xc420cde4c0, 0xb, 0xc4206cb5a0, 0x1d, 0xc420cde500, 0xd, 0xc420cde4e0, ...) /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go:992 +0x2ad github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.(*Operator).syncInstallPlans(0xc4204bec00, 0x14a01c0, 0xc420888000, 0xc420888000, 0x1) /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go:938 +0x458 github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.(*Operator).(github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.syncInstallPlans)-fm(0x14a01c0, 0xc420888000, 0x27, 0x14a01c0) /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go:149 +0x3e github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*Operator).sync(0xc4201c41c0, 0xc420309ec0, 0xc42086fcb0, 0x27, 0xc4207193c0, 0x0) /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:215 +0x1a4 github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*Operator).processNextWorkItem(0xc4201c41c0, 0xc420309ec0, 0x0) /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:183 +0xfa github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*Operator).worker(0xc4201c41c0, 0xc420309ec0) /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:169 +0x35 created by github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*Operator).Run.func1 /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:151 +0x9cd ``` Version-Release number of selected component (if applicable): 4.1.6 (release-4.1) How reproducible: Consistent if you exercise this code path. Steps to Reproduce: 1. Deploy ClusterRole/ClusterRoleBinding to the cluster manually 2. Deploy operator that attempts to create ClusterRole/ClusterRoleBinding with same name 3. Observe panic and crash loop in catalog-operator pod Actual results: Crash looping catalog-operator pod Expected results: catalog-operator is able to re-label ClusterRole/ClusterRoleBinding and continue Additional info: Proposed fix for master branch: https://github.com/operator-framework/operator-lifecycle-manager/pull/959 This will also require backporting a fix to release branches, as the github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog package was changed in a substantial way in https://github.com/operator-framework/operator-lifecycle-manager/pull/892
Hi, Christoph Thanks for your report, I create bug 1732302 for 4.1.z version. @Evan Do we need to submit another fixed PR to release-4.1 branch? Or just cherry-pick this fixed PR to it from the master branch?
I cherry picked the master pr to 4.1, should merge after approval.
*** Bug 1733324 has been marked as a duplicate of this bug. ***
LGTM, steps as below: Cluster version is 4.1.0-0.nightly-2019-08-19-173358 OLM version: io.openshift.build.commit.url=https://github.com/operator-framework/operator-lifecycle-manager/commit/e782ca5034ae1fc706145ffd4634ebdffb58b2ba io.openshift.build.source-location=https://github.com/operator-framework/operator-lifecycle-manager 1) Create a CatalogSource which contains additional Clusterrole/ClusterRoleBinding files. mac:~ jianzhang$ cat cs-bug.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: etcd-bug-operator namespace: openshift-marketplace spec: sourceType: grpc image: quay.io/jiazha/etcd-operator:bug-1732302 displayName: ETCD Bug Operators publisher: jian mac:~ jianzhang$ oc create -f cs-bug.yaml catalogsource.operators.coreos.com/etcd-bug-operator created mac:~ jianzhang$ oc get catalogsource -n openshift-marketplace NAME NAME TYPE PUBLISHER AGE certified-operators Certified Operators grpc Red Hat 139m community-operators Community Operators grpc Red Hat 139m etcd-bug-operator ETCD Bug Operators grpc jian 17s redhat-operators Red Hat Operators grpc Red Hat 139m 2) Create that static ClusterRole/ClusterRoleBinding objects. mac:operator-registry jianzhang$ oc create -f manifests/etcd/etcdclusterrole.yaml clusterrole.rbac.authorization.k8s.io/etcdoperator.v0.9.4-clusterwide-test created mac:operator-registry jianzhang$ oc create -f manifests/etcd/etcdclusterrolebinding.yaml clusterrolebinding.rbac.authorization.k8s.io/etcdoperator.v0.9.4-clusterrolebinding-test created mac:operator-registry jianzhang$ oc get clusterrolebinding |grep etcd etcdoperator.v0.9.4-clusterrolebinding-test 12s mac:operator-registry jianzhang$ oc get clusterrole |grep etcd etcdoperator.v0.9.4-clusterwide-test 43s 3) Create a OperatorGroup in openshift-marketplace project. mac:~ jianzhang$ oc get og -n openshift-marketplace NAME AGE bug-og 32s 4) Subscribe this test operator. mac:~ jianzhang$ cat sub-bug.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: generateName: etcd-bug- namespace: openshift-marketplace spec: source: etcd-bug-operator sourceNamespace: openshift-marketplace name: etcd startingCSV: etcdoperator.v0.9.4-clusterwide channel: clusterwide-alpha mac:~ jianzhang$ oc get csv -n openshift-marketplace NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.4-clusterwide etcd 0.9.4-clusterwide Succeeded mac:~ jianzhang$ oc get pods -n openshift-marketplace NAME READY STATUS RESTARTS AGE certified-operators-68f759cbc7-v4q4r 1/1 Running 0 154m community-operators-6c5ffdc5f-ldg5f 1/1 Running 0 154m etcd-bug-operator-gqkqf 1/1 Running 0 15m etcd-operator-bf4866946-m7vdz 3/3 Running 0 47s marketplace-operator-5fc975bc86-c9qsv 1/1 Running 0 154m redhat-operators-775568dd5-ckb5k 1/1 Running 0 154m 5) Check the OLM pods status. mac:~ jianzhang$ oc get pods -n openshift-operator-lifecycle-manager NAME READY STATUS RESTARTS AGE catalog-operator-5d48c4d4bc-xmg5t 1/1 Running 0 164m olm-operator-7f66446cfb-cb9zq 1/1 Running 0 164m olm-operators-jcqbz 1/1 Running 0 160m packageserver-5c6d7445df-45j9v 1/1 Running 0 160m packageserver-5c6d7445df-sd8hj 1/1 Running 0 160m 6) Re-run above steps 1,2,4,5 with a new registry image(quay.io/jiazha/etcd-operator:bug2-1732302) which no `clusterPermission` configured in the csv. mac:~ jianzhang$ cat cs-bug.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: etcd-bug-operator namespace: openshift-marketplace spec: sourceType: grpc image: quay.io/jiazha/etcd-operator:bug2-1732302 displayName: ETCD Bug Operators publisher: jian mac:~ jianzhang$ oc create -f cs-bug.yaml catalogsource.operators.coreos.com/etcd-bug-operator created mac:~ jianzhang$ oc get catalogsource -n openshift-marketplace NAME NAME TYPE PUBLISHER AGE certified-operators Certified Operators grpc Red Hat 160m community-operators Community Operators grpc Red Hat 160m etcd-bug-operator ETCD Bug Operators grpc jian 5s redhat-operators Red Hat Operators grpc Red Hat 160m mac:~ jianzhang$ oc get sub -n openshift-marketplace NAME PACKAGE SOURCE CHANNEL etcd-bug-4ls2t etcd etcd-bug-operator clusterwide-alpha mac:~ jianzhang$ oc get csv -n openshift-marketplace NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.4-clusterwide etcd 0.9.4-clusterwide Succeeded mac:~ jianzhang$ oc get pods -n openshift-marketplace NAME READY STATUS RESTARTS AGE certified-operators-68f759cbc7-v4q4r 1/1 Running 0 162m community-operators-6c5ffdc5f-ldg5f 1/1 Running 0 162m etcd-bug-operator-w2jb9 1/1 Running 0 119s etcd-operator-bf4866946-vrwfj 3/3 Running 0 79s marketplace-operator-5fc975bc86-c9qsv 1/1 Running 0 162m redhat-operators-775568dd5-ckb5k 1/1 Running 0 162m mac:~ jianzhang$ oc get pods -n openshift-operator-lifecycle-manager NAME READY STATUS RESTARTS AGE catalog-operator-5d48c4d4bc-xmg5t 1/1 Running 0 169m olm-operator-7f66446cfb-cb9zq 1/1 Running 0 169m olm-operators-jcqbz 1/1 Running 0 165m packageserver-5c6d7445df-45j9v 1/1 Running 0 165m packageserver-5c6d7445df-sd8hj 1/1 Running 0 165m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2547