Hide Forgot
Description of problem: When a CatalogSource is deleted, all associated subscriptions torn down, etc. and then recreated, and a new subscription is created to something offered by that CatalogSource, the subscription appears to be 'stuck' (ie, never gains a status) until the catalog-operator pod is deleted and the subscription is recreated/modified. Version-Release number of selected component (if applicable): How reproducible: 1. Create a catalog source. 2. Create a subscription to something that comes from that catalog source. 3. Everything works OK. 4. Delete the subscription. 5. Delete the catalog source. 6. Create the catalog source. 7. Create the subscription. 8. Nothing happens. 9. Delete the catalog-operator pod 10. Sometimes, things get unstuck at this point; many times it takes bumping the subscription in some way, however, for the Subscription to gain a status. Sometimes, restarting the catalog-operator pod appears to be insufficient and manually bumping the subscription resource is required. Actual results: Subscription does not gain a status without outside intervention. Expected results: Subscriptions should be serviced when they are created; if there is a problem servicing a resource it should be reflected in the status. Additional info: I am happy to provide a reproducer for this interactively with someone; I know that folks have said that they have trouble reproducing this.
Paul, Thanks for your reporting! I could NOT reproduce this issue with the below version: OLM version: io.openshift.build.commit.id=b2d1cd21368bc8cc10e4ca11a231f09077630c33 Cluster version is 4.1.0-0.nightly-2019-05-06-011159 1, Create a new project called "debug" and install the "AMQ" operator in it. mac:~ jianzhang$ oc get pods NAME READY STATUS RESTARTS AGE amq-streams-cluster-operator-779f9ffbd4-dfz69 1/1 Running 0 8m28s 2, Delete the subscription and catalog source. 3, Recreate the subscription and catalog source. 4, Check the status of the subscription. It works well, as below: mac:~ jianzhang$ oc get sub NAME PACKAGE SOURCE CHANNEL amq-streams amq-streams installed-redhat-debug stable mac:~ jianzhang$ oc get sub amq-streams -o go-template='{{ .status.state }}' AtLatestKnownmac:~ jianzhang$ Could you help share me with the details steps to reproduce this issue? Thanks!
I was finally able to reproduce this by running the same test multiple times in a cluster - thanks for the report! This is fixed in this commit: https://github.com/operator-framework/operator-lifecycle-manager/pull/846/commits/8d9664a6e3ecbf5615a1e74911a6a87efb11e998 (may go in a different PR depending on how other PRs merge) After this change, I can no longer reproduce the stuck subscription bug.
Proposed fix doesn't address the issue.
This PR contains the fix for the issue: https://github.com/operator-framework/operator-lifecycle-manager/pull/847
*** Bug 1704940 has been marked as a duplicate of this bug. ***
Verified Failed with the below version: OLM version: io.openshift.build.commit.id=19e7914e33f723c6f77f7aaa0892c7684ce94ed4 Cluster version: 4.1.0-0.nightly-2019-05-09-182710 1, Install the "etcd" operator in project "default". [chuo@dhcp-140-165 .kube]$ oc get sub NAME PACKAGE SOURCE CHANNEL couchbase-enterprise-certified couchbase-enterprise-certified installed-certified-default preview etcd etcd installed-community-default singlenamespace-alpha [chuo@dhcp-140-165 .kube]$ oc get catsrc NAME NAME TYPE PUBLISHER AGE installed-certified-default Certified Operators grpc Certified 38m installed-community-default Community Operators grpc Community 46s [chuo@dhcp-140-165 .kube]$ oc get ip NAME CSV SOURCE APPROVAL APPROVED install-cgsm4 couchbase-operator.v1.1.0 Automatic true install-g7wmf etcdoperator.v0.9.4 Manual false 2, Manual approved ip [chuo@dhcp-140-165 .kube]$ oc edit ip install-g7wmf installplan.operators.coreos.com/install-g7wmf edited [chuo@dhcp-140-165 .kube]$ oc get ip NAME CSV SOURCE APPROVAL APPROVED install-g7wmf etcdoperator.v0.9.4 Manual true [chuo@dhcp-140-165 .kube]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Succeeded 3, delete the subscription and catalog source. [chuo@dhcp-140-165 .kube]$ oc delete catsrc installed-community-default catalogsource.operators.coreos.com "installed-community-default" deleted [chuo@dhcp-140-165 .kube]$ oc delete sub etcd subscription.operators.coreos.com "etcd" deleted [chuo@dhcp-140-165 .kube]$ oc get catsrc NAME NAME TYPE PUBLISHER AGE installed-certified-default Certified Operators grpc Certified 46m [chuo@dhcp-140-165 .kube]$ oc get sub NAME PACKAGE SOURCE CHANNEL couchbase-enterprise-certified couchbase-enterprise-certified installed-certified-default preview [chuo@dhcp-140-165 .kube]$ oc get ip NAME CSV SOURCE APPROVAL APPROVED install-cgsm4 couchbase-operator.v1.1.0 Automatic true install-g7wmf etcdoperator.v0.9.4 Manual true subscription can be deleted from the back end, but ip still exsits, meanwhile from Webconsole-Installed Operators(for Project "default") etcd operator exists with status "InstallSucceeded".
Verification success with the below version: OLM version: io.openshift.build.commit.id=19e7914e33f723c6f77f7aaa0892c7684ce94ed4 Cluster version: 4.1.0-0.nightly-2019-05-09-182710 1, Install the "etcd" operator in project "test". [chuo@dhcp-140-165 .kube]$ oc get sub NAME PACKAGE SOURCE CHANNEL etcd etcd installed-community-test singlenamespace-alpha [chuo@dhcp-140-165 .kube]$ oc get catsrc NAME NAME TYPE PUBLISHER AGE installed-community-test Community Operators grpc Community 2m16s [chuo@dhcp-140-165 .kube]$ oc get ip NAME CSV SOURCE APPROVAL APPROVED install-7wnvx etcdoperator.v0.9.4 Automatic true 2, delete the subscription and catalog source. 3, re-create subscription and catlogsource [chuo@dhcp-140-165 .kube]$ oc get sub NAME PACKAGE SOURCE CHANNEL etcd etcd installed-community-test singlenamespace-alpha [chuo@dhcp-140-165 .kube]$ oc get catsrc NAME NAME TYPE PUBLISHER AGE installed-community-test Community Operators grpc Community 2m16s [chuo@dhcp-140-165 .kube]$ oc get ip NAME CSV SOURCE APPROVAL APPROVED install-7wnvx etcdoperator.v0.9.4 Automatic true 4, repeat step2 and step 3 for 10 times, subscription success 10 times 5, delete catalog-operator and wait until new pod is running [chuo@dhcp-140-165 .kube]$ oc delete po catalog-operator-569b689878-g8zzh -n openshift-operator-lifecycle-manager pod "catalog-operator-569b689878-g8zzh" deleted [chuo@dhcp-140-165 .kube]$ oc get po -n openshift-operator-lifecycle-manager NAME READY STATUS RESTARTS AGE catalog-operator-569b689878-p7c2f 0/1 Running 0 14s 6.re-create subscription and catlogsource [chuo@dhcp-140-165 .kube]$ oc get sub NAME PACKAGE SOURCE CHANNEL etcd etcd installed-community-test singlenamespace-alpha [chuo@dhcp-140-165 .kube]$ oc get ip NAME CSV SOURCE APPROVAL APPROVED install-znv6w etcdoperator.v0.9.4 Automatic true [chuo@dhcp-140-165 .kube]$ oc get catsrc NAME NAME TYPE PUBLISHER AGE installed-community-test Community Operators grpc Community 2m33s [chuo@dhcp-140-165 .kube]$ oc get sub etcd -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: creationTimestamp: "2019-05-13T08:34:10Z" generation: 1 labels: csc-owner-name: installed-community-test csc-owner-namespace: openshift-marketplace name: etcd namespace: test resourceVersion: "149673" selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/test/subscriptions/etcd uid: e1912e81-7559-11e9-8544-0aa12d6c2fce spec: channel: singlenamespace-alpha installPlanApproval: Automatic name: etcd source: installed-community-test sourceNamespace: test startingCSV: etcdoperator.v0.9.4 status: currentCSV: etcdoperator.v0.9.4 installPlanRef: apiVersion: operators.coreos.com/v1alpha1 kind: InstallPlan name: install-znv6w namespace: test resourceVersion: "149643" uid: e2095587-7559-11e9-8bba-02c4299c1f3a installedCSV: etcdoperator.v0.9.4 installplan: apiVersion: operators.coreos.com/v1alpha1 kind: InstallPlan name: install-znv6w uuid: e2095587-7559-11e9-8bba-02c4299c1f3a lastUpdated: "2019-05-13T08:34:14Z" state: AtLatestKnown [chuo@dhcp-140-165 .kube]$ oc get catsrc installed-community-test -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: creationTimestamp: "2019-05-13T08:34:02Z" generation: 1 labels: csc-owner-name: installed-community-test csc-owner-namespace: openshift-marketplace name: installed-community-test namespace: test resourceVersion: "152087" selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/test/catalogsources/installed-community-test uid: dcdf4ca8-7559-11e9-b532-06d8365d7bd0 spec: address: 172.30.208.199:50051 displayName: Community Operators icon: base64data: "" mediatype: "" publisher: Community sourceType: grpc status: lastSync: "2019-05-13T08:39:09Z" registryService: createdAt: "2019-05-13T08:39:07Z" protocol: grpc
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758