Description of problem: ClusterServiceVersion objects with "Deleting" status appear to sometimes get leaked - that is, they never get deleted. This appears to be contradictory to the following statement from the github docs[1]: "Deleting: the GC loop has determined this CSV is safe to delete from the cluster. It will disappear soon." [1] https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/architecture.md#clusterserviceversion-control-loop For example: oc get clusterserviceversions.operators.coreos.com --all-namespaces | grep Deleting clustervalidation elasticsearch-operator.4.2.5-201911121709 Elasticsearch Op... 4.2.5-201911121709 Deleting hipster elasticsearch-operator.4.2.5-201911121709 Elasticsearch Op... 4.2.5-201911121709 Deleting openshift-metering elasticsearch-operator.4.2.5-201911121709 Elasticsearch Op... 4.2.5-201911121709 Deleting openshift-monitoring elasticsearch-operator.4.2.5-201911121709 Elasticsearch Op... 4.2.5-201911121709 Deleting openshift-operator-lifecycle-manager elasticsearch-operator.4.2.10-201912022352 Elasticsearch Op... 4.2.10-201912022352 elasticsearch-operator.4.2.9-201911261133 Deleting openshift-operator-lifecycle-manager elasticsearch-operator.4.2.5-201911121709 Elasticsearch Op... 4.2.5-201911121709 Deleting openshift-operator-lifecycle-manager elasticsearch-operator.4.2.8-201911190952 Elasticsearch Op... 4.2.8-201911190952 elasticsearch-operator.4.2.5-201911121709 Deleting openshift-operator-lifecycle-manager elasticsearch-operator.4.2.9-201911261133 Elasticsearch Op... 4.2.9-201911261133 elasticsearch-operator.4.2.8-201911190952 Deleting openshift-operators elasticsearch-operator.4.2.5-201911121709 Elasticsearch Op... 4.2.5-201911121709 Deleting How reproducible: Uncertain Actual results: ClusterServiceVersion objects with "Deleting" status remain in the cluster indefinitely, even after an operator upgrade has completed Expected results: ClusterServiceVersion objects with "Deleting" status would be automatically removed (per the documentation).
This appears, at first glance, to be a bug that can occur in the projection of ownership and operatorgroup labels. The non-deleted CSVs in question have conflicting metadata: annotations: olm.operatorGroup: global-operators olm.operatorNamespace: openshift-operators labels: olm.api.e43efcaa45c9f8d0: provided olm.copiedFrom: openshift-operators-redhat name: elasticsearch-operator.4.2.5-201911121709 namespace: openshift-operators This indicates the although this copied CSV came originally from `openshift-operators-redhat`, it's current "parent" operator (the non-copied version) is in `openshift-operators`. annotations.operatorNamespace is checked for the parent - in this case, since the operator is itself in the namespace listed as the parent, it is never cleaned up (even though it is in the Deleting phase). This will require a bit more investigation, but I suspect that the annotations for operatorgroups are too aggressive. They should not be overwritten on copied CSVs.
Cluster version is 4.4.0-0.nightly-2020-02-04-171905 mac:~ jianzhang$ oc exec catalog-operator-75df97fddd-t9tfz -- olm --version OLM version: 0.14.1 git commit: 8775d5a0d7c632b1b0f1b0ccaf5c7c27fcd81f95 1, Create an OperatorGroup which selecting all namespaces in project openshift-operators-redhat mac:~ jianzhang$ oc get operatorgroup -n openshift-operators-redhat -o yaml apiVersion: v1 items: - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: annotations: olm.providedAPIs: Elasticsearch.v1.logging.openshift.io creationTimestamp: "2020-02-05T08:23:46Z" generation: 1 name: openshift-operators-redhat namespace: openshift-operators-redhat resourceVersion: "68155" selfLink: /apis/operators.coreos.com/v1/namespaces/openshift-operators-redhat/operatorgroups/openshift-operators-redhat uid: c3b55bb5-4545-4232-a8cb-afad8b1c8792 spec: {} status: lastUpdated: "2020-02-05T08:23:46Z" namespaces: - "" kind: List metadata: resourceVersion: "" selfLink: "" 2, Subscribe the elasticsearch-operator in this project. mac:~ jianzhang$ oc get sub -n openshift-operators-redhat NAME PACKAGE SOURCE CHANNEL elasticsearch-operator elasticsearch-operator qe-app-registry 4.4 mac:~ jianzhang$ oc get csv -n openshift-operators-redhat NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.4.4.0-202001241932 Elasticsearch Operator 4.4.0-202001241932 Succeeded mac:~ jianzhang$ oc get pods -n openshift-operators-redhat NAME READY STATUS RESTARTS AGE elasticsearch-operator-55b8b5f49f-b2s5s 1/1 Running 0 97m 3, Check the copied CSV in the openshift-operators namespace, and then check the annotation.olm.operatorGroup field. mac:~ jianzhang$ oc get csv elasticsearch-operator.4.4.0-202001241932 -n openshift-operators -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: ClusterServiceVersion metadata: annotations: ... olm.operatorGroup: openshift-operators-redhat olm.operatorNamespace: openshift-operators-redhat olm.skipRange: '>=4.2.0 <4.4.0' support: AOS Cluster Logging, Jaeger creationTimestamp: "2020-02-05T08:24:21Z" generation: 1 labels: olm.api.e43efcaa45c9f8d0: provided olm.copiedFrom: openshift-operators-redhat mac:~ jianzhang$ oc get operatorgroup -n openshift-operators NAME AGE global-operators 4h37m As we can see the olm.operatorGroup value is openshift-operators-redhat, not the global-operators. LGTM. 4, Delete the elasticsearch-operator. mac:~ jianzhang$ oc delete sub elasticsearch-operator -n openshift-operators-redhat subscription.operators.coreos.com "elasticsearch-operator" deleted mac:~ jianzhang$ oc delete csv elasticsearch-operator.4.4.0-202001241932 -n openshift-operators-redhat clusterserviceversion.operators.coreos.com "elasticsearch-operator.4.4.0-202001241932" deleted mac:~ jianzhang$ oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-logging clusterlogging.4.4.0-202001241616 Cluster Logging 4.4.0-202001241616 Succeeded openshift-operator-lifecycle-manager packageserver Package Server 0.14.1 Succeeded All copied CSVs are deleted as expected. LGTM, verify it.
*** Bug 1775518 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581