Bug 1659522 - [OperatorGroup] the copied CSV won't exist in new created project
Summary: [OperatorGroup] the copied CSV won't exist in new created project
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.1.0
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-14 15:21 UTC by Jian Zhang
Modified: 2019-06-04 10:41 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:41:16 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:41:21 UTC

Description Jian Zhang 2018-12-14 15:21:22 UTC
Description of problem:
Could not create the CR in the namespace which global-operator's watching. For example, etcd-operator, create the etcd pods fail in target namespace.

Version-Release number of selected component (if applicable):
mac:project jianzhang$ oc exec olm-operator-75f785f98b-fgvtl -- olm -version
OLM version: 0.8.0
git commit: 8429cb3

How reproducible:
always

Steps to Reproduce:
1. Create a global operator in project "default".
apiVersion: operators.coreos.com/v1alpha2
kind: OperatorGroup
metadata:
  name: global-operators
  namespace: default
spec:
  selector: {}

2. Create an operator in the project "default". It will watch all namespaces. For example, etcd-operator.
mac:project jianzhang$ oc get pods -n default
NAME                                  READY     STATUS             RESTARTS   AGE
...        
etcd-operator-68b4997899-wnnqd        3/3       Running            0          37m

3. Create a project called "jian", and create "etcdcluster" in it. Like below:
mac:project jianzhang$ cat etcd-cluster.yaml 
apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
  namespace: "jian"
  name: "example-etcd-cluster"
spec:
  size: 3
  version: "3.2.13"
mac:project jianzhang$ oc get csv -n jian
NAME                  DISPLAY   VERSION   REPLACES              PHASE
etcdoperator.v0.9.2   etcd      0.9.2     etcdoperator.v0.9.0   

4, Check the "etcdcluster" resource.
mac:project jianzhang$ oc get etcdcluster -n jian
NAME                   AGE
example-etcd-cluster   30m

5, Check the etcd pods.

Actual results:
mac:project jianzhang$ oc get operatorgroup -n jian
No resources found.


Expected results:
The etcd pods should be running in the target namespace.

Additional info:
I guess the below info has nothing with this OperatorGroup issue, just list it for your reference.
I debug the etcd-operator by enabling the `-cluster-wide` option, and got below errors:
E1213 08:07:09.279571       1 reflector.go:205] github.com/coreos/etcd-operator/pkg/controller/informer.go:78: Failed to list *v1beta2.EtcdCluster: etcdclusters.etcd.database.coreos.com is forbidden: User "system:serviceaccount:default:etcd-operator" cannot list etcdclusters.etcd.database.coreos.com at the cluster scope: no RBAC policy matched

Comment 1 Jian Zhang 2018-12-18 03:29:17 UTC
I also test the Couchbase operator, it doesn't support the OperatorGroup feature by now.

1) create an operatorgroup to watch all namespaces:
[jzhang@dhcp-140-18 installer]$ oc get operatorgroup global-operators -o yaml
apiVersion: operators.coreos.com/v1alpha2
kind: OperatorGroup
metadata:
  creationTimestamp: 2018-12-17T08:52:37Z
  generation: 1
  name: global-operators
  namespace: default
  resourceVersion: "235543"
  selfLink: /apis/operators.coreos.com/v1alpha2/namespaces/default/operatorgroups/global-operators
  uid: 1a7d7c3b-01d9-11e9-a3c9-3635c3f43365
spec:
  selector: {}
status:
  lastUpdated: 2018-12-17T08:52:39Z
  namespaces:
  - ""

2) Create the Couchbase operator in project "default"(operator namespace).
[jzhang@dhcp-140-18 installer]$ oc get csv
NAME                        DISPLAY              VERSION   REPLACES   PHASE
couchbase-operator.v1.0.0   Couchbase Operator   1.0.0                Succeeded
[jzhang@dhcp-140-18 installer]$ oc get pods
NAME                                  READY     STATUS    RESTARTS   AGE
couchbase-operator-85c8574796-sb7vw   1/1       Running   0          18h
[jzhang@dhcp-140-18 installer]$cat couchbase-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: couchbase-admin-creds
  namespace: couchbase-test
type: Opaque
stringData:
  username: admin
  password: password

3) Create the Couchbase cluster in project "couchbase-test"(target namespace).
[jzhang@dhcp-140-18 installer]$ oc create -f couchbase-secret.yaml 
[jzhang@dhcp-140-18 installer]$ oc create -f couchbase-cluster.yaml 
[jzhang@dhcp-140-18 installer]$ oc get csv
NAME                        DISPLAY              VERSION   REPLACES   PHASE
couchbase-operator.v1.0.0   Couchbase Operator   1.0.0                
[jzhang@dhcp-140-18 installer]$ oc get pods
No resources found.
[jzhang@dhcp-140-18 installer]$ oc get couchbasecluster
NAME         AGE
cb-example   1h
[jzhang@dhcp-140-18 installer]$ oc get pods
No resources found.

The pods weren't created as expected. And, no logs in the couchbase operator.
File this bug to trace the operators for supporting the operatorgroup feature.

The related YAML files:
[jzhang@dhcp-140-18 installer]$ cat couchbase-secret.yaml 
apiVersion: v1
kind: Secret
metadata:
  name: couchbase-admin-creds
  namespace: couchbase-test 
type: Opaque
stringData:
  username: admin
  password: password
[jzhang@dhcp-140-18 installer]$ cat couchbase-cluster.yaml 
apiVersion: couchbase.com/v1
kind: CouchbaseCluster
metadata:
  name: cb-example
  namespace: couchbase-test
spec:
  authSecret: couchbase-admin-creds
  baseImage: registry.connect.redhat.com/couchbase/server
  buckets:
    - conflictResolution: seqno
      enableFlush: true
      evictionPolicy: fullEviction
      ioPriority: high
      memoryQuota: 128
      name: default
      replicas: 1
      type: couchbase
  cluster:
    analyticsServiceMemoryQuota: 1024
    autoFailoverMaxCount: 3
    autoFailoverOnDataDiskIssues: true
    autoFailoverOnDataDiskIssuesTimePeriod: 120
    autoFailoverServerGroup: false
    autoFailoverTimeout: 120
    clusterName: cb-example
    dataServiceMemoryQuota: 256
    eventingServiceMemoryQuota: 256
    indexServiceMemoryQuota: 256
    indexStorageSetting: memory_optimized
    searchServiceMemoryQuota: 256
  servers:
    - name: all_services
      services:
        - data
        - index
        - query
        - search
        - eventing
        - analytics
      size: 3
  version: 5.5.1-1

Comment 2 Jeff Peeler 2018-12-18 22:51:27 UTC
Not all of the operators have been validated to work with operator groups yet. However, the etcd one does work, which is why it was recommended for testing.

In step 3, in order to create an etcd cluster in a namespace that's not the same as the operator you have to include the clusterwide annotation:

apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
  name: "example-etcd-cluster"
  namespace: "jian"
  annotations:
      etcd.database.coreos.com/scope: clusterwide
spec:
  size: 3
  version: "3.2.13"

Before doing that (assuming you installed the CSVs from a subscription), it's easiest to modify the etcd deploy and add the cluster-wide=true argument as you mentioned.

--

My test workflow from start to finish:

1) Deploy operator group to operator namespace
2) Create etcd subscription in operator namespace
3) Create target namespace, allow time for CSV to be copied
4) Edit etcd deployment, add cluster-wide=true to args
5) Deploy etcdcluster in target namespace

Comment 3 Jian Zhang 2018-12-19 14:00:13 UTC
Jeff

Thanks very much for your information!
> Not all of the operators have been validated to work with operator groups yet.
I'd suggest that we point out which operator supporting the OperatorGroup feature cleary in our doc. What do you think?
Maybe we can write it in here: https://github.com/operator-framework/operator-lifecycle-manager/blob/master/Documentation/design/architecture.md#operator-group-design

For the test workflow, you mentioned in above. Yes, it works. But, there are two problems here:
1, No documents for this. I highly suggest we point out this in our documents. 
2, The copied csv wasn't created in the new project. In other words, it won't work on a new project.

Details as below:
1) create an operatorgroup called "test-operators" to watch all namespaces.
mac:aws-ocp jianzhang$ cat og-all.yaml 
apiVersion: operators.coreos.com/v1alpha2
kind: OperatorGroup
metadata:
  name: test-operators
  namespace: etcd-operator
spec:
  selector: {}
mac:aws-ocp jianzhang$ oc get operatorgroup -n etcd-operator
NAME             AGE
test-operators   4h

2) create the etcd-operator in project "etcd-operator".
mac:aws-ocp jianzhang$ cat etcd-sub.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  namespace: etcd-operator
  generateName: etcd-
spec:
  source: rh-operators
  name: etcd
  startingCSV: etcdoperator.v0.9.2
  channel: alpha
mac:aws-ocp jianzhang$ oc get csv -n etcd-operator
NAME                          DISPLAY                VERSION   REPLACES              PHASE
etcdoperator.v0.9.2           etcd                   0.9.2     etcdoperator.v0.9.0   Succeeded
marketplace-operator.v0.0.1   marketplace-operator   0.0.1                           
mac:aws-ocp jianzhang$ oc get pods -n etcd-operator
NAME                             READY     STATUS    RESTARTS   AGE
etcd-operator-5696dbc4c8-7vr6s   3/3       Running   0          4h

3) Modify the csv/deployment, add the `--cluster-wide=true`, like below:
...
      - command:
        - etcd-operator
        - --create-crd=false
        - --cluster-wide=true

4) Check the copied csv in project "default".
mac:aws-ocp jianzhang$ oc get csv -n default
NAME                   DISPLAY          VERSION   REPLACES              PHASE
etcdoperator.v0.9.2    etcd             0.9.2     etcdoperator.v0.9.0   
packageserver.v0.8.0   Package Server   0.8.0 

5) Create the etcd cluster in it. It did work.
mac:aws-ocp jianzhang$ cat etcd-cluster.yaml 
apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
  name: "example-etcd-cluster"
  namespace: "default"
  annotations:
      etcd.database.coreos.com/scope: clusterwide
spec:
  size: 3
  version: "3.2.13"
mac:aws-ocp jianzhang$ oc get pods -n default
NAME                              READY     STATUS    RESTARTS   AGE
example-etcd-cluster-57w94x82nh   1/1       Running   0          7m
example-etcd-cluster-cxtvlvwgpw   1/1       Running   0          6m
example-etcd-cluster-jnprtd77wd   1/1       Running   0          6m

No any copied csv in it. @Jeff Is it a bug? Or am I missing something?
6) create a new project called "jian". And, create the etcd cluster in it.
mac:aws-ocp jianzhang$ oc get csv -n jian
No resources found.
mac:aws-ocp jianzhang$ oc get ns jian
NAME      STATUS    AGE
jian      Active    2h

I enable the debug mode of the olm-operator, got below info:
...
time="2018-12-19T13:50:28Z" level=info msg="getting from queue" key=jian queue=namespaces
time="2018-12-19T13:50:28Z" level=debug msg=syncing name=jian namespace=

I also checked the test-opertors, it did watch all namespaces, as below:
mac:aws-ocp jianzhang$ oc get operatorgroup test-operators -o yaml -n etcd-operator
apiVersion: operators.coreos.com/v1alpha2
kind: OperatorGroup
metadata:
  creationTimestamp: 2018-12-19T10:12:42Z
  generation: 1
  name: test-operators
  namespace: etcd-operator
  resourceVersion: "11661"
  selfLink: /apis/operators.coreos.com/v1alpha2/namespaces/etcd-operator/operatorgroups/test-operators
  uid: 9f4cb714-0376-11e9-9ef3-0ee8c8774ff2
spec:
  selector: {}
status:
  lastUpdated: 2018-12-19T10:12:42Z
  namespaces:
  - ""

The version info:
mac:aws-ocp jianzhang$ oc get clusterversion
NAME      VERSION                           AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.alpha-2018-12-19-084728   True        False         3h        Cluster version is 4.0.0-0.alpha-2018-12-19-084728
mac:aws-ocp jianzhang$ oc get pods
NAME                                READY     STATUS    RESTARTS   AGE
catalog-operator-67954f4744-bbjkz   1/1       Running   0          3h
certified-operators-4sfhr           1/1       Running   0          3h
olm-operator-7484d5697d-qn2k4       1/1       Running   0          17m
olm-operators-5swx7                 1/1       Running   0          3h
packageserver-7b9bb7478f-mt878      1/1       Running   0          3h
rh-operators-7dh7f                  1/1       Running   0          3h
mac:aws-ocp jianzhang$ oc exec olm-operator-7484d5697d-qn2k4 -- olm -version
OLM version: 0.8.0
git commit: c53c51a

Comment 4 Jeff Peeler 2018-12-19 16:58:45 UTC
1) I hope that documentation isn't necessary as eventually the operators will be written to support watching all namespaces by default.

2) The CSV copying does not happen immediately currently in all cases. If you find that after 5 mins the CSV still doesn't exist, then yes it's definitely a bug. Otherwise, it might be a reasonable request to ensure copying happens faster.

Comment 5 Jian Zhang 2018-12-20 01:35:34 UTC
Jeff

> I hope that documentation isn't necessary as eventually the operators will be written to support watching all namespaces by default.

Glad to know it. But, I think we'd better point out which operator support the OperatorGroup feature.

> If you find that after 5 mins the CSV still doesn't exist, then yes it's definitely a bug.

Yes, you can see step 6 in comment 3, the copied CSV still didn't exist in the new project after 2 hours. As below:
mac:aws-ocp jianzhang$ oc get csv -n jian
No resources found.
mac:aws-ocp jianzhang$ oc get ns jian
NAME      STATUS    AGE
jian      Active    2h

Comment 13 Jian Zhang 2019-03-14 07:23:02 UTC
LGTM, verify it.

OLM version:
               io.openshift.build.commit.id=840d806a3b20e5ebb7229631d0168864b1cfed12
               io.openshift.build.commit.url=https://github.com/operator-framework/operator-lifecycle-manager/commit/840d806a3b20e5ebb7229631d0168864b1cfed12
               io.openshift.build.source-location=https://github.com/operator-framework/operator-lifecycle-manager

Steps:
1, Install the Descheduler on the Web console. It will be installed in the `openshift-operators` namespace since its InstallMode is `Allnamespaces`.
2, Create a new project called "test".
3, Check the csv of the Descheduler in test namespace. LGTM.
[jzhang@dhcp-140-18 ocp14]$ oc get csv -n test
NAME                     DISPLAY           VERSION   REPLACES   PHASE
descheduler.v0.0.1       Descheduler       0.0.1                Succeeded

[jzhang@dhcp-140-18 ocp14]$ oc get csv descheduler.v0.0.1 -n test -o yaml |grep -i copied
    olm.copiedFrom: openshift-operators
  reason: Copied

Comment 15 errata-xmlrpc 2019-06-04 10:41:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.