Bug 1732214 - catalog-operator panic on labelling ClusterRole/ClusterRoleBinding
Summary: catalog-operator panic on labelling ClusterRole/ClusterRoleBinding
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.1.z
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
: 1733324 (view as bug list)
Depends On:
Blocks: 1732911
TreeView+ depends on / blocked
 
Reported: 2019-07-23 03:25 UTC by Christoph Blecker
Modified: 2019-08-28 19:55 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-28 19:54:49 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2547 None None None 2019-08-28 19:54:59 UTC
Github operator-framework operator-lifecycle-manager pull 964 None None None 2019-08-06 21:24:48 UTC

Description Christoph Blecker 2019-07-23 03:25:02 UTC
Description of problem:
catalog-operator pod in crash loop with following error

```
time="2019-07-22T21:21:53Z" level=info msg="log level info"
time="2019-07-22T21:21:53Z" level=info msg="TLS keys set, using https for metrics"
time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config"
time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config"
time="2019-07-22T21:21:53Z" level=info msg="Using in-cluster kube client config"
time="2019-07-22T21:21:53Z" level=info msg="connection established. cluster-version: v1.13.4+6569b4f"
time="2019-07-22T21:21:53Z" level=info msg="operator ready"
time="2019-07-22T21:21:53Z" level=info msg="starting informers..."
time="2019-07-22T21:21:53Z" level=info msg="waiting for caches to sync..."
time="2019-07-22T21:21:53Z" level=info msg="starting workers..."
time="2019-07-22T21:21:53Z" level=info msg=syncing id=J0bpl ip=install-ssz97 namespace=openshift-dedicated-admin phase=Installing
time="2019-07-22T21:21:53Z" level=info msg="building connection to registry" currentSource="{community-operators openshift-marketplace}" id=WiU2e source=community-operators
time="2019-07-22T21:21:53Z" level=info msg=syncing id=uqL/b ip=install-7h475 namespace=openshift-logging phase=Complete
time="2019-07-22T21:21:53Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{community-operators openshift-marketplace}" id=WiU2e source=community-operators
time="2019-07-22T21:21:53Z" level=info msg=syncing id=BZh1k ip=install-c2thj namespace=openshift-monitoring phase=Complete
time="2019-07-22T21:21:54Z" level=info msg="retrying openshift-logging"
E0722 21:21:54.040243       1 queueinformer_operator.go:186] Sync "openshift-logging" failed: no catalog sources available
time="2019-07-22T21:21:54Z" level=info msg="retrying openshift-operator-lifecycle-manager"
E0722 21:21:54.239699       1 queueinformer_operator.go:186] Sync "openshift-operator-lifecycle-manager" failed: no catalog sources available
time="2019-07-22T21:21:54Z" level=info msg="building connection to registry" currentSource="{configure-alertmanager-operator-registry openshift-operator-lifecycle-manager}" id=lLs1i source=configure-alertmanager-operator-registry
time="2019-07-22T21:21:54Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{configure-alertmanager-operator-registry openshift-operator-lifecycle-manager}" id=lLs1i source=configure-alertmanager-operator-registry
panic: assignment to entry in nil map

goroutine 183 [running]:
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.(*Operator).ExecutePlan(0xc4204bec00, 0xc421924000, 0x1, 0x1)
        /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go:1187 +0x4243
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.transitionInstallPlanState(0xc420084180, 0x1601da0, 0xc4204bec00, 0xc420cde4c0, 0xb, 0xc4206cb5a0, 0x1d, 0xc420cde500, 0xd, 0xc420cde4e0, ...)
        /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go:992 +0x2ad
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.(*Operator).syncInstallPlans(0xc4204bec00, 0x14a01c0, 0xc420888000, 0xc420888000, 0x1)
        /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go:938 +0x458
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.(*Operator).(github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog.syncInstallPlans)-fm(0x14a01c0, 0xc420888000, 0x27, 0x14a01c0)
        /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go:149 +0x3e
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*Operator).sync(0xc4201c41c0, 0xc420309ec0, 0xc42086fcb0, 0x27, 0xc4207193c0, 0x0)
        /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:215 +0x1a4
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*Operator).processNextWorkItem(0xc4201c41c0, 0xc420309ec0, 0x0)
        /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:183 +0xfa
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*Operator).worker(0xc4201c41c0, 0xc420309ec0)
        /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:169 +0x35
created by github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*Operator).Run.func1
        /go/src/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:151 +0x9cd
```

Version-Release number of selected component (if applicable):
4.1.6 (release-4.1)


How reproducible:
Consistent if you exercise this code path.


Steps to Reproduce:
1. Deploy ClusterRole/ClusterRoleBinding to the cluster manually
2. Deploy operator that attempts to create ClusterRole/ClusterRoleBinding with same name
3. Observe panic and crash loop in catalog-operator pod

Actual results:
Crash looping catalog-operator pod


Expected results:
catalog-operator is able to re-label ClusterRole/ClusterRoleBinding and continue


Additional info:
Proposed fix for master branch: https://github.com/operator-framework/operator-lifecycle-manager/pull/959

This will also require backporting a fix to release branches, as the github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog package was changed in a substantial way in https://github.com/operator-framework/operator-lifecycle-manager/pull/892

Comment 1 Jian Zhang 2019-07-23 07:03:28 UTC
Hi, Christoph

Thanks for your report, I create bug 1732302 for 4.1.z version.

@Evan
Do we need to submit another fixed PR to release-4.1 branch? Or just cherry-pick this fixed PR to it from the master branch?

Comment 2 Evan Cordell 2019-08-06 21:26:06 UTC
I cherry picked the master pr to 4.1, should merge after approval.

Comment 3 Evan Cordell 2019-08-06 21:27:06 UTC
*** Bug 1733324 has been marked as a duplicate of this bug. ***

Comment 5 Jian Zhang 2019-08-20 05:59:18 UTC
LGTM, steps as below:
Cluster version is 4.1.0-0.nightly-2019-08-19-173358
OLM version:                
io.openshift.build.commit.url=https://github.com/operator-framework/operator-lifecycle-manager/commit/e782ca5034ae1fc706145ffd4634ebdffb58b2ba
io.openshift.build.source-location=https://github.com/operator-framework/operator-lifecycle-manager

1) Create a CatalogSource which contains additional Clusterrole/ClusterRoleBinding files.
mac:~ jianzhang$ cat cs-bug.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: etcd-bug-operator
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/jiazha/etcd-operator:bug-1732302
  displayName: ETCD Bug Operators
  publisher: jian

mac:~ jianzhang$ oc create -f cs-bug.yaml 
catalogsource.operators.coreos.com/etcd-bug-operator created
mac:~ jianzhang$ oc get catalogsource -n openshift-marketplace
NAME                  NAME                  TYPE   PUBLISHER   AGE
certified-operators   Certified Operators   grpc   Red Hat     139m
community-operators   Community Operators   grpc   Red Hat     139m
etcd-bug-operator     ETCD Bug Operators    grpc   jian        17s
redhat-operators      Red Hat Operators     grpc   Red Hat     139m

2)  Create that static ClusterRole/ClusterRoleBinding objects.
mac:operator-registry jianzhang$ oc create -f manifests/etcd/etcdclusterrole.yaml 
clusterrole.rbac.authorization.k8s.io/etcdoperator.v0.9.4-clusterwide-test created
mac:operator-registry jianzhang$ oc create -f manifests/etcd/etcdclusterrolebinding.yaml 
clusterrolebinding.rbac.authorization.k8s.io/etcdoperator.v0.9.4-clusterrolebinding-test created
mac:operator-registry jianzhang$ oc get clusterrolebinding |grep etcd
etcdoperator.v0.9.4-clusterrolebinding-test                                       12s
mac:operator-registry jianzhang$ oc get clusterrole |grep etcd
etcdoperator.v0.9.4-clusterwide-test                                   43s

3) Create a OperatorGroup in openshift-marketplace project.
mac:~ jianzhang$ oc get og -n openshift-marketplace
NAME     AGE
bug-og   32s

4) Subscribe this test operator.
mac:~ jianzhang$ cat sub-bug.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  generateName: etcd-bug-
  namespace: openshift-marketplace
spec:
  source: etcd-bug-operator
  sourceNamespace: openshift-marketplace
  name: etcd
  startingCSV: etcdoperator.v0.9.4-clusterwide
  channel: clusterwide-alpha

mac:~ jianzhang$ oc get csv -n openshift-marketplace
NAME                              DISPLAY   VERSION             REPLACES   PHASE
etcdoperator.v0.9.4-clusterwide   etcd      0.9.4-clusterwide              Succeeded

mac:~ jianzhang$ oc get pods -n openshift-marketplace
NAME                                    READY   STATUS    RESTARTS   AGE
certified-operators-68f759cbc7-v4q4r    1/1     Running   0          154m
community-operators-6c5ffdc5f-ldg5f     1/1     Running   0          154m
etcd-bug-operator-gqkqf                 1/1     Running   0          15m
etcd-operator-bf4866946-m7vdz           3/3     Running   0          47s
marketplace-operator-5fc975bc86-c9qsv   1/1     Running   0          154m
redhat-operators-775568dd5-ckb5k        1/1     Running   0          154m

5) Check the OLM pods status.
mac:~ jianzhang$ oc get pods -n openshift-operator-lifecycle-manager
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-5d48c4d4bc-xmg5t   1/1     Running   0          164m
olm-operator-7f66446cfb-cb9zq       1/1     Running   0          164m
olm-operators-jcqbz                 1/1     Running   0          160m
packageserver-5c6d7445df-45j9v      1/1     Running   0          160m
packageserver-5c6d7445df-sd8hj      1/1     Running   0          160m

6) Re-run above steps 1,2,4,5 with a new registry image(quay.io/jiazha/etcd-operator:bug2-1732302) which no `clusterPermission` configured in the csv.

mac:~ jianzhang$ cat cs-bug.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: etcd-bug-operator
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/jiazha/etcd-operator:bug2-1732302
  displayName: ETCD Bug Operators
  publisher: jian

mac:~ jianzhang$ oc create -f cs-bug.yaml 
catalogsource.operators.coreos.com/etcd-bug-operator created
mac:~ jianzhang$ oc get catalogsource -n openshift-marketplace
NAME                  NAME                  TYPE   PUBLISHER   AGE
certified-operators   Certified Operators   grpc   Red Hat     160m
community-operators   Community Operators   grpc   Red Hat     160m
etcd-bug-operator     ETCD Bug Operators    grpc   jian        5s
redhat-operators      Red Hat Operators     grpc   Red Hat     160m

mac:~ jianzhang$ oc get sub -n openshift-marketplace
NAME             PACKAGE   SOURCE              CHANNEL
etcd-bug-4ls2t   etcd      etcd-bug-operator   clusterwide-alpha
mac:~ jianzhang$ oc get csv -n openshift-marketplace
NAME                              DISPLAY   VERSION             REPLACES   PHASE
etcdoperator.v0.9.4-clusterwide   etcd      0.9.4-clusterwide              Succeeded
mac:~ jianzhang$ oc get pods -n openshift-marketplace
NAME                                    READY   STATUS    RESTARTS   AGE
certified-operators-68f759cbc7-v4q4r    1/1     Running   0          162m
community-operators-6c5ffdc5f-ldg5f     1/1     Running   0          162m
etcd-bug-operator-w2jb9                 1/1     Running   0          119s
etcd-operator-bf4866946-vrwfj           3/3     Running   0          79s
marketplace-operator-5fc975bc86-c9qsv   1/1     Running   0          162m
redhat-operators-775568dd5-ckb5k        1/1     Running   0          162m

mac:~ jianzhang$ oc get pods -n openshift-operator-lifecycle-manager
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-5d48c4d4bc-xmg5t   1/1     Running   0          169m
olm-operator-7f66446cfb-cb9zq       1/1     Running   0          169m
olm-operators-jcqbz                 1/1     Running   0          165m
packageserver-5c6d7445df-45j9v      1/1     Running   0          165m
packageserver-5c6d7445df-sd8hj      1/1     Running   0          165m

Comment 7 errata-xmlrpc 2019-08-28 19:54:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2547


Note You need to log in before you can comment on or make changes to this bug.