Description of problem: Could not create the CR in the namespace which global-operator's watching. For example, etcd-operator, create the etcd pods fail in target namespace. Version-Release number of selected component (if applicable): mac:project jianzhang$ oc exec olm-operator-75f785f98b-fgvtl -- olm -version OLM version: 0.8.0 git commit: 8429cb3 How reproducible: always Steps to Reproduce: 1. Create a global operator in project "default". apiVersion: operators.coreos.com/v1alpha2 kind: OperatorGroup metadata: name: global-operators namespace: default spec: selector: {} 2. Create an operator in the project "default". It will watch all namespaces. For example, etcd-operator. mac:project jianzhang$ oc get pods -n default NAME READY STATUS RESTARTS AGE ... etcd-operator-68b4997899-wnnqd 3/3 Running 0 37m 3. Create a project called "jian", and create "etcdcluster" in it. Like below: mac:project jianzhang$ cat etcd-cluster.yaml apiVersion: "etcd.database.coreos.com/v1beta2" kind: "EtcdCluster" metadata: namespace: "jian" name: "example-etcd-cluster" spec: size: 3 version: "3.2.13" mac:project jianzhang$ oc get csv -n jian NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 4, Check the "etcdcluster" resource. mac:project jianzhang$ oc get etcdcluster -n jian NAME AGE example-etcd-cluster 30m 5, Check the etcd pods. Actual results: mac:project jianzhang$ oc get operatorgroup -n jian No resources found. Expected results: The etcd pods should be running in the target namespace. Additional info: I guess the below info has nothing with this OperatorGroup issue, just list it for your reference. I debug the etcd-operator by enabling the `-cluster-wide` option, and got below errors: E1213 08:07:09.279571 1 reflector.go:205] github.com/coreos/etcd-operator/pkg/controller/informer.go:78: Failed to list *v1beta2.EtcdCluster: etcdclusters.etcd.database.coreos.com is forbidden: User "system:serviceaccount:default:etcd-operator" cannot list etcdclusters.etcd.database.coreos.com at the cluster scope: no RBAC policy matched
I also test the Couchbase operator, it doesn't support the OperatorGroup feature by now. 1) create an operatorgroup to watch all namespaces: [jzhang@dhcp-140-18 installer]$ oc get operatorgroup global-operators -o yaml apiVersion: operators.coreos.com/v1alpha2 kind: OperatorGroup metadata: creationTimestamp: 2018-12-17T08:52:37Z generation: 1 name: global-operators namespace: default resourceVersion: "235543" selfLink: /apis/operators.coreos.com/v1alpha2/namespaces/default/operatorgroups/global-operators uid: 1a7d7c3b-01d9-11e9-a3c9-3635c3f43365 spec: selector: {} status: lastUpdated: 2018-12-17T08:52:39Z namespaces: - "" 2) Create the Couchbase operator in project "default"(operator namespace). [jzhang@dhcp-140-18 installer]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE couchbase-operator.v1.0.0 Couchbase Operator 1.0.0 Succeeded [jzhang@dhcp-140-18 installer]$ oc get pods NAME READY STATUS RESTARTS AGE couchbase-operator-85c8574796-sb7vw 1/1 Running 0 18h [jzhang@dhcp-140-18 installer]$cat couchbase-secret.yaml apiVersion: v1 kind: Secret metadata: name: couchbase-admin-creds namespace: couchbase-test type: Opaque stringData: username: admin password: password 3) Create the Couchbase cluster in project "couchbase-test"(target namespace). [jzhang@dhcp-140-18 installer]$ oc create -f couchbase-secret.yaml [jzhang@dhcp-140-18 installer]$ oc create -f couchbase-cluster.yaml [jzhang@dhcp-140-18 installer]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE couchbase-operator.v1.0.0 Couchbase Operator 1.0.0 [jzhang@dhcp-140-18 installer]$ oc get pods No resources found. [jzhang@dhcp-140-18 installer]$ oc get couchbasecluster NAME AGE cb-example 1h [jzhang@dhcp-140-18 installer]$ oc get pods No resources found. The pods weren't created as expected. And, no logs in the couchbase operator. File this bug to trace the operators for supporting the operatorgroup feature. The related YAML files: [jzhang@dhcp-140-18 installer]$ cat couchbase-secret.yaml apiVersion: v1 kind: Secret metadata: name: couchbase-admin-creds namespace: couchbase-test type: Opaque stringData: username: admin password: password [jzhang@dhcp-140-18 installer]$ cat couchbase-cluster.yaml apiVersion: couchbase.com/v1 kind: CouchbaseCluster metadata: name: cb-example namespace: couchbase-test spec: authSecret: couchbase-admin-creds baseImage: registry.connect.redhat.com/couchbase/server buckets: - conflictResolution: seqno enableFlush: true evictionPolicy: fullEviction ioPriority: high memoryQuota: 128 name: default replicas: 1 type: couchbase cluster: analyticsServiceMemoryQuota: 1024 autoFailoverMaxCount: 3 autoFailoverOnDataDiskIssues: true autoFailoverOnDataDiskIssuesTimePeriod: 120 autoFailoverServerGroup: false autoFailoverTimeout: 120 clusterName: cb-example dataServiceMemoryQuota: 256 eventingServiceMemoryQuota: 256 indexServiceMemoryQuota: 256 indexStorageSetting: memory_optimized searchServiceMemoryQuota: 256 servers: - name: all_services services: - data - index - query - search - eventing - analytics size: 3 version: 5.5.1-1
Not all of the operators have been validated to work with operator groups yet. However, the etcd one does work, which is why it was recommended for testing. In step 3, in order to create an etcd cluster in a namespace that's not the same as the operator you have to include the clusterwide annotation: apiVersion: "etcd.database.coreos.com/v1beta2" kind: "EtcdCluster" metadata: name: "example-etcd-cluster" namespace: "jian" annotations: etcd.database.coreos.com/scope: clusterwide spec: size: 3 version: "3.2.13" Before doing that (assuming you installed the CSVs from a subscription), it's easiest to modify the etcd deploy and add the cluster-wide=true argument as you mentioned. -- My test workflow from start to finish: 1) Deploy operator group to operator namespace 2) Create etcd subscription in operator namespace 3) Create target namespace, allow time for CSV to be copied 4) Edit etcd deployment, add cluster-wide=true to args 5) Deploy etcdcluster in target namespace
Jeff Thanks very much for your information! > Not all of the operators have been validated to work with operator groups yet. I'd suggest that we point out which operator supporting the OperatorGroup feature cleary in our doc. What do you think? Maybe we can write it in here: https://github.com/operator-framework/operator-lifecycle-manager/blob/master/Documentation/design/architecture.md#operator-group-design For the test workflow, you mentioned in above. Yes, it works. But, there are two problems here: 1, No documents for this. I highly suggest we point out this in our documents. 2, The copied csv wasn't created in the new project. In other words, it won't work on a new project. Details as below: 1) create an operatorgroup called "test-operators" to watch all namespaces. mac:aws-ocp jianzhang$ cat og-all.yaml apiVersion: operators.coreos.com/v1alpha2 kind: OperatorGroup metadata: name: test-operators namespace: etcd-operator spec: selector: {} mac:aws-ocp jianzhang$ oc get operatorgroup -n etcd-operator NAME AGE test-operators 4h 2) create the etcd-operator in project "etcd-operator". mac:aws-ocp jianzhang$ cat etcd-sub.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: namespace: etcd-operator generateName: etcd- spec: source: rh-operators name: etcd startingCSV: etcdoperator.v0.9.2 channel: alpha mac:aws-ocp jianzhang$ oc get csv -n etcd-operator NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Succeeded marketplace-operator.v0.0.1 marketplace-operator 0.0.1 mac:aws-ocp jianzhang$ oc get pods -n etcd-operator NAME READY STATUS RESTARTS AGE etcd-operator-5696dbc4c8-7vr6s 3/3 Running 0 4h 3) Modify the csv/deployment, add the `--cluster-wide=true`, like below: ... - command: - etcd-operator - --create-crd=false - --cluster-wide=true 4) Check the copied csv in project "default". mac:aws-ocp jianzhang$ oc get csv -n default NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 packageserver.v0.8.0 Package Server 0.8.0 5) Create the etcd cluster in it. It did work. mac:aws-ocp jianzhang$ cat etcd-cluster.yaml apiVersion: "etcd.database.coreos.com/v1beta2" kind: "EtcdCluster" metadata: name: "example-etcd-cluster" namespace: "default" annotations: etcd.database.coreos.com/scope: clusterwide spec: size: 3 version: "3.2.13" mac:aws-ocp jianzhang$ oc get pods -n default NAME READY STATUS RESTARTS AGE example-etcd-cluster-57w94x82nh 1/1 Running 0 7m example-etcd-cluster-cxtvlvwgpw 1/1 Running 0 6m example-etcd-cluster-jnprtd77wd 1/1 Running 0 6m No any copied csv in it. @Jeff Is it a bug? Or am I missing something? 6) create a new project called "jian". And, create the etcd cluster in it. mac:aws-ocp jianzhang$ oc get csv -n jian No resources found. mac:aws-ocp jianzhang$ oc get ns jian NAME STATUS AGE jian Active 2h I enable the debug mode of the olm-operator, got below info: ... time="2018-12-19T13:50:28Z" level=info msg="getting from queue" key=jian queue=namespaces time="2018-12-19T13:50:28Z" level=debug msg=syncing name=jian namespace= I also checked the test-opertors, it did watch all namespaces, as below: mac:aws-ocp jianzhang$ oc get operatorgroup test-operators -o yaml -n etcd-operator apiVersion: operators.coreos.com/v1alpha2 kind: OperatorGroup metadata: creationTimestamp: 2018-12-19T10:12:42Z generation: 1 name: test-operators namespace: etcd-operator resourceVersion: "11661" selfLink: /apis/operators.coreos.com/v1alpha2/namespaces/etcd-operator/operatorgroups/test-operators uid: 9f4cb714-0376-11e9-9ef3-0ee8c8774ff2 spec: selector: {} status: lastUpdated: 2018-12-19T10:12:42Z namespaces: - "" The version info: mac:aws-ocp jianzhang$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.alpha-2018-12-19-084728 True False 3h Cluster version is 4.0.0-0.alpha-2018-12-19-084728 mac:aws-ocp jianzhang$ oc get pods NAME READY STATUS RESTARTS AGE catalog-operator-67954f4744-bbjkz 1/1 Running 0 3h certified-operators-4sfhr 1/1 Running 0 3h olm-operator-7484d5697d-qn2k4 1/1 Running 0 17m olm-operators-5swx7 1/1 Running 0 3h packageserver-7b9bb7478f-mt878 1/1 Running 0 3h rh-operators-7dh7f 1/1 Running 0 3h mac:aws-ocp jianzhang$ oc exec olm-operator-7484d5697d-qn2k4 -- olm -version OLM version: 0.8.0 git commit: c53c51a
1) I hope that documentation isn't necessary as eventually the operators will be written to support watching all namespaces by default. 2) The CSV copying does not happen immediately currently in all cases. If you find that after 5 mins the CSV still doesn't exist, then yes it's definitely a bug. Otherwise, it might be a reasonable request to ensure copying happens faster.
Jeff > I hope that documentation isn't necessary as eventually the operators will be written to support watching all namespaces by default. Glad to know it. But, I think we'd better point out which operator support the OperatorGroup feature. > If you find that after 5 mins the CSV still doesn't exist, then yes it's definitely a bug. Yes, you can see step 6 in comment 3, the copied CSV still didn't exist in the new project after 2 hours. As below: mac:aws-ocp jianzhang$ oc get csv -n jian No resources found. mac:aws-ocp jianzhang$ oc get ns jian NAME STATUS AGE jian Active 2h
In progress: https://github.com/operator-framework/operator-lifecycle-manager/pull/675
LGTM, verify it. OLM version: io.openshift.build.commit.id=840d806a3b20e5ebb7229631d0168864b1cfed12 io.openshift.build.commit.url=https://github.com/operator-framework/operator-lifecycle-manager/commit/840d806a3b20e5ebb7229631d0168864b1cfed12 io.openshift.build.source-location=https://github.com/operator-framework/operator-lifecycle-manager Steps: 1, Install the Descheduler on the Web console. It will be installed in the `openshift-operators` namespace since its InstallMode is `Allnamespaces`. 2, Create a new project called "test". 3, Check the csv of the Descheduler in test namespace. LGTM. [jzhang@dhcp-140-18 ocp14]$ oc get csv -n test NAME DISPLAY VERSION REPLACES PHASE descheduler.v0.0.1 Descheduler 0.0.1 Succeeded [jzhang@dhcp-140-18 ocp14]$ oc get csv descheduler.v0.0.1 -n test -o yaml |grep -i copied olm.copiedFrom: openshift-operators reason: Copied
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758