Bug 1659522
| Summary: | [OperatorGroup] the copied CSV won't exist in new created project | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jian Zhang <jiazha> |
| Component: | OLM | Assignee: | Evan Cordell <ecordell> |
| Status: | CLOSED ERRATA | QA Contact: | Jian Zhang <jiazha> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.1.0 | CC: | jpeeler, xtian |
| Target Milestone: | --- | ||
| Target Release: | 4.1.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-04 10:41:16 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I also test the Couchbase operator, it doesn't support the OperatorGroup feature by now.
1) create an operatorgroup to watch all namespaces:
[jzhang@dhcp-140-18 installer]$ oc get operatorgroup global-operators -o yaml
apiVersion: operators.coreos.com/v1alpha2
kind: OperatorGroup
metadata:
creationTimestamp: 2018-12-17T08:52:37Z
generation: 1
name: global-operators
namespace: default
resourceVersion: "235543"
selfLink: /apis/operators.coreos.com/v1alpha2/namespaces/default/operatorgroups/global-operators
uid: 1a7d7c3b-01d9-11e9-a3c9-3635c3f43365
spec:
selector: {}
status:
lastUpdated: 2018-12-17T08:52:39Z
namespaces:
- ""
2) Create the Couchbase operator in project "default"(operator namespace).
[jzhang@dhcp-140-18 installer]$ oc get csv
NAME DISPLAY VERSION REPLACES PHASE
couchbase-operator.v1.0.0 Couchbase Operator 1.0.0 Succeeded
[jzhang@dhcp-140-18 installer]$ oc get pods
NAME READY STATUS RESTARTS AGE
couchbase-operator-85c8574796-sb7vw 1/1 Running 0 18h
[jzhang@dhcp-140-18 installer]$cat couchbase-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: couchbase-admin-creds
namespace: couchbase-test
type: Opaque
stringData:
username: admin
password: password
3) Create the Couchbase cluster in project "couchbase-test"(target namespace).
[jzhang@dhcp-140-18 installer]$ oc create -f couchbase-secret.yaml
[jzhang@dhcp-140-18 installer]$ oc create -f couchbase-cluster.yaml
[jzhang@dhcp-140-18 installer]$ oc get csv
NAME DISPLAY VERSION REPLACES PHASE
couchbase-operator.v1.0.0 Couchbase Operator 1.0.0
[jzhang@dhcp-140-18 installer]$ oc get pods
No resources found.
[jzhang@dhcp-140-18 installer]$ oc get couchbasecluster
NAME AGE
cb-example 1h
[jzhang@dhcp-140-18 installer]$ oc get pods
No resources found.
The pods weren't created as expected. And, no logs in the couchbase operator.
File this bug to trace the operators for supporting the operatorgroup feature.
The related YAML files:
[jzhang@dhcp-140-18 installer]$ cat couchbase-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: couchbase-admin-creds
namespace: couchbase-test
type: Opaque
stringData:
username: admin
password: password
[jzhang@dhcp-140-18 installer]$ cat couchbase-cluster.yaml
apiVersion: couchbase.com/v1
kind: CouchbaseCluster
metadata:
name: cb-example
namespace: couchbase-test
spec:
authSecret: couchbase-admin-creds
baseImage: registry.connect.redhat.com/couchbase/server
buckets:
- conflictResolution: seqno
enableFlush: true
evictionPolicy: fullEviction
ioPriority: high
memoryQuota: 128
name: default
replicas: 1
type: couchbase
cluster:
analyticsServiceMemoryQuota: 1024
autoFailoverMaxCount: 3
autoFailoverOnDataDiskIssues: true
autoFailoverOnDataDiskIssuesTimePeriod: 120
autoFailoverServerGroup: false
autoFailoverTimeout: 120
clusterName: cb-example
dataServiceMemoryQuota: 256
eventingServiceMemoryQuota: 256
indexServiceMemoryQuota: 256
indexStorageSetting: memory_optimized
searchServiceMemoryQuota: 256
servers:
- name: all_services
services:
- data
- index
- query
- search
- eventing
- analytics
size: 3
version: 5.5.1-1
Not all of the operators have been validated to work with operator groups yet. However, the etcd one does work, which is why it was recommended for testing.
In step 3, in order to create an etcd cluster in a namespace that's not the same as the operator you have to include the clusterwide annotation:
apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
name: "example-etcd-cluster"
namespace: "jian"
annotations:
etcd.database.coreos.com/scope: clusterwide
spec:
size: 3
version: "3.2.13"
Before doing that (assuming you installed the CSVs from a subscription), it's easiest to modify the etcd deploy and add the cluster-wide=true argument as you mentioned.
--
My test workflow from start to finish:
1) Deploy operator group to operator namespace
2) Create etcd subscription in operator namespace
3) Create target namespace, allow time for CSV to be copied
4) Edit etcd deployment, add cluster-wide=true to args
5) Deploy etcdcluster in target namespace
Jeff Thanks very much for your information! > Not all of the operators have been validated to work with operator groups yet. I'd suggest that we point out which operator supporting the OperatorGroup feature cleary in our doc. What do you think? Maybe we can write it in here: https://github.com/operator-framework/operator-lifecycle-manager/blob/master/Documentation/design/architecture.md#operator-group-design For the test workflow, you mentioned in above. Yes, it works. But, there are two problems here: 1, No documents for this. I highly suggest we point out this in our documents. 2, The copied csv wasn't created in the new project. In other words, it won't work on a new project. Details as below: 1) create an operatorgroup called "test-operators" to watch all namespaces. mac:aws-ocp jianzhang$ cat og-all.yaml apiVersion: operators.coreos.com/v1alpha2 kind: OperatorGroup metadata: name: test-operators namespace: etcd-operator spec: selector: {} mac:aws-ocp jianzhang$ oc get operatorgroup -n etcd-operator NAME AGE test-operators 4h 2) create the etcd-operator in project "etcd-operator". mac:aws-ocp jianzhang$ cat etcd-sub.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: namespace: etcd-operator generateName: etcd- spec: source: rh-operators name: etcd startingCSV: etcdoperator.v0.9.2 channel: alpha mac:aws-ocp jianzhang$ oc get csv -n etcd-operator NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Succeeded marketplace-operator.v0.0.1 marketplace-operator 0.0.1 mac:aws-ocp jianzhang$ oc get pods -n etcd-operator NAME READY STATUS RESTARTS AGE etcd-operator-5696dbc4c8-7vr6s 3/3 Running 0 4h 3) Modify the csv/deployment, add the `--cluster-wide=true`, like below: ... - command: - etcd-operator - --create-crd=false - --cluster-wide=true 4) Check the copied csv in project "default". mac:aws-ocp jianzhang$ oc get csv -n default NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 packageserver.v0.8.0 Package Server 0.8.0 5) Create the etcd cluster in it. It did work. mac:aws-ocp jianzhang$ cat etcd-cluster.yaml apiVersion: "etcd.database.coreos.com/v1beta2" kind: "EtcdCluster" metadata: name: "example-etcd-cluster" namespace: "default" annotations: etcd.database.coreos.com/scope: clusterwide spec: size: 3 version: "3.2.13" mac:aws-ocp jianzhang$ oc get pods -n default NAME READY STATUS RESTARTS AGE example-etcd-cluster-57w94x82nh 1/1 Running 0 7m example-etcd-cluster-cxtvlvwgpw 1/1 Running 0 6m example-etcd-cluster-jnprtd77wd 1/1 Running 0 6m No any copied csv in it. @Jeff Is it a bug? Or am I missing something? 6) create a new project called "jian". And, create the etcd cluster in it. mac:aws-ocp jianzhang$ oc get csv -n jian No resources found. mac:aws-ocp jianzhang$ oc get ns jian NAME STATUS AGE jian Active 2h I enable the debug mode of the olm-operator, got below info: ... time="2018-12-19T13:50:28Z" level=info msg="getting from queue" key=jian queue=namespaces time="2018-12-19T13:50:28Z" level=debug msg=syncing name=jian namespace= I also checked the test-opertors, it did watch all namespaces, as below: mac:aws-ocp jianzhang$ oc get operatorgroup test-operators -o yaml -n etcd-operator apiVersion: operators.coreos.com/v1alpha2 kind: OperatorGroup metadata: creationTimestamp: 2018-12-19T10:12:42Z generation: 1 name: test-operators namespace: etcd-operator resourceVersion: "11661" selfLink: /apis/operators.coreos.com/v1alpha2/namespaces/etcd-operator/operatorgroups/test-operators uid: 9f4cb714-0376-11e9-9ef3-0ee8c8774ff2 spec: selector: {} status: lastUpdated: 2018-12-19T10:12:42Z namespaces: - "" The version info: mac:aws-ocp jianzhang$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.alpha-2018-12-19-084728 True False 3h Cluster version is 4.0.0-0.alpha-2018-12-19-084728 mac:aws-ocp jianzhang$ oc get pods NAME READY STATUS RESTARTS AGE catalog-operator-67954f4744-bbjkz 1/1 Running 0 3h certified-operators-4sfhr 1/1 Running 0 3h olm-operator-7484d5697d-qn2k4 1/1 Running 0 17m olm-operators-5swx7 1/1 Running 0 3h packageserver-7b9bb7478f-mt878 1/1 Running 0 3h rh-operators-7dh7f 1/1 Running 0 3h mac:aws-ocp jianzhang$ oc exec olm-operator-7484d5697d-qn2k4 -- olm -version OLM version: 0.8.0 git commit: c53c51a 1) I hope that documentation isn't necessary as eventually the operators will be written to support watching all namespaces by default. 2) The CSV copying does not happen immediately currently in all cases. If you find that after 5 mins the CSV still doesn't exist, then yes it's definitely a bug. Otherwise, it might be a reasonable request to ensure copying happens faster. Jeff > I hope that documentation isn't necessary as eventually the operators will be written to support watching all namespaces by default. Glad to know it. But, I think we'd better point out which operator support the OperatorGroup feature. > If you find that after 5 mins the CSV still doesn't exist, then yes it's definitely a bug. Yes, you can see step 6 in comment 3, the copied CSV still didn't exist in the new project after 2 hours. As below: mac:aws-ocp jianzhang$ oc get csv -n jian No resources found. mac:aws-ocp jianzhang$ oc get ns jian NAME STATUS AGE jian Active 2h LGTM, verify it.
OLM version:
io.openshift.build.commit.id=840d806a3b20e5ebb7229631d0168864b1cfed12
io.openshift.build.commit.url=https://github.com/operator-framework/operator-lifecycle-manager/commit/840d806a3b20e5ebb7229631d0168864b1cfed12
io.openshift.build.source-location=https://github.com/operator-framework/operator-lifecycle-manager
Steps:
1, Install the Descheduler on the Web console. It will be installed in the `openshift-operators` namespace since its InstallMode is `Allnamespaces`.
2, Create a new project called "test".
3, Check the csv of the Descheduler in test namespace. LGTM.
[jzhang@dhcp-140-18 ocp14]$ oc get csv -n test
NAME DISPLAY VERSION REPLACES PHASE
descheduler.v0.0.1 Descheduler 0.0.1 Succeeded
[jzhang@dhcp-140-18 ocp14]$ oc get csv descheduler.v0.0.1 -n test -o yaml |grep -i copied
olm.copiedFrom: openshift-operators
reason: Copied
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |
Description of problem: Could not create the CR in the namespace which global-operator's watching. For example, etcd-operator, create the etcd pods fail in target namespace. Version-Release number of selected component (if applicable): mac:project jianzhang$ oc exec olm-operator-75f785f98b-fgvtl -- olm -version OLM version: 0.8.0 git commit: 8429cb3 How reproducible: always Steps to Reproduce: 1. Create a global operator in project "default". apiVersion: operators.coreos.com/v1alpha2 kind: OperatorGroup metadata: name: global-operators namespace: default spec: selector: {} 2. Create an operator in the project "default". It will watch all namespaces. For example, etcd-operator. mac:project jianzhang$ oc get pods -n default NAME READY STATUS RESTARTS AGE ... etcd-operator-68b4997899-wnnqd 3/3 Running 0 37m 3. Create a project called "jian", and create "etcdcluster" in it. Like below: mac:project jianzhang$ cat etcd-cluster.yaml apiVersion: "etcd.database.coreos.com/v1beta2" kind: "EtcdCluster" metadata: namespace: "jian" name: "example-etcd-cluster" spec: size: 3 version: "3.2.13" mac:project jianzhang$ oc get csv -n jian NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 4, Check the "etcdcluster" resource. mac:project jianzhang$ oc get etcdcluster -n jian NAME AGE example-etcd-cluster 30m 5, Check the etcd pods. Actual results: mac:project jianzhang$ oc get operatorgroup -n jian No resources found. Expected results: The etcd pods should be running in the target namespace. Additional info: I guess the below info has nothing with this OperatorGroup issue, just list it for your reference. I debug the etcd-operator by enabling the `-cluster-wide` option, and got below errors: E1213 08:07:09.279571 1 reflector.go:205] github.com/coreos/etcd-operator/pkg/controller/informer.go:78: Failed to list *v1beta2.EtcdCluster: etcdclusters.etcd.database.coreos.com is forbidden: User "system:serviceaccount:default:etcd-operator" cannot list etcdclusters.etcd.database.coreos.com at the cluster scope: no RBAC policy matched