Description of problem: Sometimes, the generated InstallPlan object refers to the wrong subscription object. And, no csv object generated. Version-Release number of selected component (if applicable): Cluster version is 4.1.0-0.nightly-2019-05-24-040103 OLM source commit.id=586ffaf57b5da9cc2301b01e2ea10ce6117928c9 How reproducible: Sometimes, but we encountered this kind of issue several times recently. Similar to bug 1702540 Steps to Reproduce: 1. Install the Couchbase, anchore-engine and TiDB operators, select the "All Namespaces ..." option. 2. Remove the Couchbase and anchore-engine operators. 3. Install the anchore-engine operator again, select the "All Namespaces ..." option. Actual results: There two problems here: 1) No anchore-engine operator csv object generated. And, its InstallPlan object only refers to the "TiDB" subscription, lack of itself subscription object. mac:~ jianzhang$ oc get sub -n openshift-operators NAME PACKAGE SOURCE CHANNEL anchore-engine anchore-engine installed-certified-openshift-operators alpha tidb-operator-certified tidb-operator-certified installed-certified-openshift-operators beta mac:~ jianzhang$ oc get ip -n openshift-operators NAME CSV SOURCE APPROVAL APPROVED install-c5qpl anchore-engine-operator.v0.0.1 Automatic true install-hhnx7 tidb-operator.1.0.0-beta1 Automatic true install-sbbv9 anchore-engine-operator.v0.0.1 Automatic true No anchore-engine csv object generated: mac:~ jianzhang$ oc get csv -n openshift-operators NAME DISPLAY VERSION REPLACES PHASE tidb-operator.1.0.0-beta1 TiDB Operator 1.0.0-beta1 Succeeded That two InstallPlan both only refer to the `tidb-operator-certified`, they should refer to the `anchore-engine` subscription object too. mac:~ jianzhang$ oc get ip -n openshift-operators install-c5qpl -o yaml |grep ownerReferences: -A 9 ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: Subscription name: tidb-operator-certified uid: ec38df40-803e-11e9-b9dc-02c6e457e9d6 resourceVersion: "76521" selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operators/installplans/install-c5qpl uid: 86d4ef99-8041-11e9-b9dc-02c6e457e9d6 mac:~ jianzhang$ oc get ip -n openshift-operators install-sbbv9 -o yaml |grep ownerReferences: -A 9 ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: Subscription name: tidb-operator-certified uid: ec38df40-803e-11e9-b9dc-02c6e457e9d6 resourceVersion: "79779" selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operators/installplans/install-sbbv9 uid: bdcce16d-8042-11e9-b9dc-02c6e457e9d6 2) The old InstallPlan wasn't be removed. And, it refers to two csv objects, is it as expected? In my personal opinion, the InstallPlan object should only refer to itself csv object. And, the InstallPlan object still contains the Couchbase csv object, but no Couchbase sub/csv objects anymore in this cluster. mac:~ jianzhang$ oc get ip -n openshift-operators install-c5qpl -o yaml |grep clusterServiceVersionNames: -A 3 clusterServiceVersionNames: - anchore-engine-operator.v0.0.1 - couchbase-operator.v1.1.0 source: "" Expected results: The operators' InstallPlan object should refer to itself subscription object and only contain itself csv object. And its csv object should be generated successfully. Additional info: mac:~ jianzhang$ oc get catalogsource -n openshift-operators installed-certified-openshift-operators -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: creationTimestamp: "2019-05-27T05:17:05Z" generation: 10 labels: csc-owner-name: installed-certified-openshift-operators csc-owner-namespace: openshift-marketplace name: installed-certified-openshift-operators namespace: openshift-operators resourceVersion: "80804" selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operators/catalogsources/installed-certified-openshift-operators uid: aa9b8594-803e-11e9-bf5b-06c6297e18aa spec: address: 172.30.70.66:50051 displayName: Certified Operators icon: base64data: "" mediatype: "" publisher: Certified sourceType: grpc status: lastSync: "2019-05-27T05:48:54Z" registryService: createdAt: "2019-05-27T05:48:54Z" protocol: grpc mac:~ jianzhang$ oc get svc -n openshift-marketplace NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE certified-operators ClusterIP 172.30.61.153 <none> 50051/TCP 4h57m community-operators ClusterIP 172.30.112.217 <none> 50051/TCP 4h57m installed-certified-default ClusterIP 172.30.149.30 <none> 50051/TCP 4h2m installed-certified-jian ClusterIP 172.30.246.18 <none> 50051/TCP 128m installed-certified-openshift-operators ClusterIP 172.30.70.66 <none> 50051/TCP 105m installed-community-jian ClusterIP 172.30.71.92 <none> 50051/TCP 138m installed-redhat-jian ClusterIP 172.30.67.141 <none> 50051/TCP 135m redhat-operators ClusterIP 172.30.13.32 <none> 50051/TCP 4h57m mac:~ jianzhang$ oc logs installed-certified-openshift-operators-65d48c4456-9mxwx -n openshift-marketplace time="2019-05-27T05:48:40Z" level=info msg="Using in-cluster kube client config" port=50051 type=appregistry time="2019-05-27T05:48:40Z" level=info msg="operator source(s) specified are - [https://quay.io/cnr|certified-operators]" port=50051 type=appregistry time="2019-05-27T05:48:40Z" level=info msg="package(s) specified are - tidb-operator-certified,anchore-engine" port=50051 type=appregistry time="2019-05-27T05:48:40Z" level=info msg="input has been sanitized" port=50051 type=appregistry time="2019-05-27T05:48:40Z" level=info msg="sources: [https://quay.io/cnr/certified-operators]" port=50051 type=appregistry time="2019-05-27T05:48:40Z" level=info msg="packages: [tidb-operator-certified anchore-engine]" port=50051 type=appregistry time="2019-05-27T05:48:41Z" level=info msg="resolved the following packages: [certified-operators/anchore-engine:1.0.0 certified-operators/tidb-operator-certified:1.0.0]" port=50051 type=appregistry time="2019-05-27T05:48:41Z" level=info msg="downloading repository: certified-operators/anchore-engine:1.0.0 from https://quay.io/cnr" port=50051 type=appregistry time="2019-05-27T05:48:41Z" level=info msg="downloading repository: certified-operators/tidb-operator-certified:1.0.0 from https://quay.io/cnr" port=50051 type=appregistry time="2019-05-27T05:48:41Z" level=info msg="download complete - 2 repositories have been downloaded" port=50051 type=appregistry time="2019-05-27T05:48:41Z" level=info msg="decoding the downloaded operator manifest(s)" port=50051 type=appregistry time="2019-05-27T05:48:41Z" level=info msg="manifest format is - flattened" port=50051 repository="certified-operators/anchore-engine:1.0.0" type=appregistry time="2019-05-27T05:48:41Z" level=info msg="decoded successfully" port=50051 repository="certified-operators/anchore-engine:1.0.0" type=appregistry time="2019-05-27T05:48:41Z" level=info msg="manifest format is - flattened" port=50051 repository="certified-operators/tidb-operator-certified:1.0.0" type=appregistry time="2019-05-27T05:48:41Z" level=info msg="decoded successfully" port=50051 repository="certified-operators/tidb-operator-certified:1.0.0" type=appregistry time="2019-05-27T05:48:41Z" level=info msg="merging all flattened manifests into a single configmap 'data' section" port=50051 type=appregistry time="2019-05-27T05:48:41Z" level=info msg="decoded 2 flattened and 0 nested operator manifest(s)" port=50051 type=appregistry time="2019-05-27T05:48:41Z" level=info msg="loading flattened operator manifest(s) into sqlite" port=50051 type=appregistry time="2019-05-27T05:48:41Z" level=info msg="using configmap loader to build sqlite database" port=50051 type=appregistry time="2019-05-27T05:48:41Z" level=info msg="loading CRDs" port=50051 type=appregistry time="2019-05-27T05:48:41Z" level=info msg="loading Bundles" port=50051 type=appregistry time="2019-05-27T05:48:41Z" level=info msg="loading Packages" port=50051 type=appregistry time="2019-05-27T05:48:41Z" level=info msg="extracting provided API information" port=50051 type=appregistry time="2019-05-27T05:48:41Z" level=info msg="serving registry" port=50051 type=appregistry
Created attachment 1573849 [details] The whole catalog-operator logs
Although it occurs not often, it will block the users to install the operators. Increased Severity.
I believe I have seen this before, and I wrote down what I thought was a description of the bug: - when you uninstall an operator and remove the subscription, the installplan for that subscription has an ownerreference pointing back to the subscription. - if you create a new subscription for the same operator before kube GCs the installplan, we detect it as farther along in the install process (because the installplan exists) and just sit, waiting for the operator to start up (but it never does because we skipped that part) The mitigation for this is to wait for the installplans to GC before re-installing an operator. Because there is a workaround, I'm moving down to medium severity. Moved to 4.1.z so that we can work on a fix for this.
Evan, Jeff I think this fixed PR(https://github.com/operator-framework/operator-lifecycle-manager/pull/965) only merged in the master branch, not the release-4.1. Could you help cherry-pick it to release-4.1 branch? Thanks! Verify failed since no fixed PR for 4.1.z version.
The 4.1 version of this in progress.
Targeting this to 4.2, will clone for 4.1
Evan, > Targeting this to 4.2, will clone for 4.1 Ok, create bug 1740491 for 4.1.z version.
LGTM, verify it, detail steps as below: Cluster version is 4.2.0-0.nightly-2019-08-13-183722 OLM version: mac:~ jianzhang$ oc exec catalog-operator-5cc66dd5c4-7sw95 -- olm --version OLM version: 0.11.0 git commit: 586e941bd1f42ea1f331453ed431fb43699fef70 1. Install the Couchbase, anchore-engine and TiDB operators, select the "All Namespaces ..." option. mac:~ jianzhang$ oc get ip NAME CSV SOURCE APPROVAL APPROVED install-jlwm7 couchbase-operator.v1.1.0 Automatic true install-s546g anchore-engine-operator.v0.0.2 Automatic true install-zjd7j tidb-operator.1.0.0-beta1 Automatic true mac:~ jianzhang$ oc get pods NAME READY STATUS RESTARTS AGE anchore-engine-operator-5b55589d6f-829md 1/1 Running 0 10m couchbase-operator-79b995b87d-t8qvh 1/1 Running 0 10m tidb-controller-manager-7546d898df-6zzbk 1/1 Running 0 4m55s 2. Remove the Couchbase and anchore-engine operators. 3. Install the anchore-engine operator again, select the "All Namespaces ..." option. mac:~ jianzhang$ oc get ip NAME CSV SOURCE APPROVAL APPROVED install-6dkxn anchore-engine-operator.v0.0.2 Automatic true install-zjd7j tidb-operator.1.0.0-beta1 Automatic true mac:~ jianzhang$ oc get pods NAME READY STATUS RESTARTS AGE anchore-engine-operator-5b55589d6f-cw5cr 1/1 Running 0 9m6s tidb-controller-manager-7546d898df-6zzbk 1/1 Running 0 20m Now, the Anchore operator works well, and the TiDB only refers to itself subscription, others have been deleted. mac:~ jianzhang$ oc get ip install-zjd7j -o yaml|grep "ownerReferences" -A 10 ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: Subscription name: tidb-operator-certified uid: 9a80fb2d-be76-11e9-874b-022328418248 resourceVersion: "126830" selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operators/installplans/install-zjd7j uid: 9a8d7a9d-be76-11e9-874b-022328418248 spec: mac:~ jianzhang$ oc get ip install-6dkxn -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: InstallPlan metadata: creationTimestamp: "2019-08-14T09:45:16Z" generateName: install- generation: 1 name: install-6dkxn namespace: openshift-operators ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: Subscription name: anchore-engine uid: 388aabb4-be78-11e9-874b-022328418248 - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: Subscription name: tidb-operator-certified uid: 9a80fb2d-be76-11e9-874b-022328418248 resourceVersion: "127334" ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922