Bug 1789920 - ClusterServiceVersion resource with Deleting status is never removed
Summary: ClusterServiceVersion resource with Deleting status is never removed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.4.0
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
: 1775518 (view as bug list)
Depends On:
Blocks: 1797019
TreeView+ depends on / blocked
 
Reported: 2020-01-10 17:44 UTC by Luke Stanton
Modified: 2020-05-04 11:24 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1797019 (view as bug list)
Environment:
Last Closed: 2020-05-04 11:23:58 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 1267 None closed Bug 1789920: Fix bad opgroup annotations 2020-10-28 01:32:28 UTC
Red Hat Product Errata RHBA-2020:0581 None None None 2020-05-04 11:24:24 UTC

Description Luke Stanton 2020-01-10 17:44:18 UTC
Description of problem:

ClusterServiceVersion objects with "Deleting" status appear to sometimes get leaked - that is, they never get deleted. This appears to be contradictory to the following statement from the github docs[1]: "Deleting: the GC loop has determined this CSV is safe to delete from the cluster. It will disappear soon."

[1] https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/architecture.md#clusterserviceversion-control-loop

For example:

oc get clusterserviceversions.operators.coreos.com --all-namespaces | grep Deleting                                                                                                         
clustervalidation                      elasticsearch-operator.4.2.5-201911121709    Elasticsearch Op...   4.2.5-201911121709                                                 Deleting
hipster                                elasticsearch-operator.4.2.5-201911121709    Elasticsearch Op...   4.2.5-201911121709                                                 Deleting
openshift-metering                     elasticsearch-operator.4.2.5-201911121709    Elasticsearch Op...   4.2.5-201911121709                                                 Deleting
openshift-monitoring                   elasticsearch-operator.4.2.5-201911121709    Elasticsearch Op...   4.2.5-201911121709                                                 Deleting
openshift-operator-lifecycle-manager   elasticsearch-operator.4.2.10-201912022352   Elasticsearch Op...   4.2.10-201912022352   elasticsearch-operator.4.2.9-201911261133    Deleting
openshift-operator-lifecycle-manager   elasticsearch-operator.4.2.5-201911121709    Elasticsearch Op...   4.2.5-201911121709                                                 Deleting
openshift-operator-lifecycle-manager   elasticsearch-operator.4.2.8-201911190952    Elasticsearch Op...   4.2.8-201911190952    elasticsearch-operator.4.2.5-201911121709    Deleting
openshift-operator-lifecycle-manager   elasticsearch-operator.4.2.9-201911261133    Elasticsearch Op...   4.2.9-201911261133    elasticsearch-operator.4.2.8-201911190952    Deleting
openshift-operators                    elasticsearch-operator.4.2.5-201911121709    Elasticsearch Op...   4.2.5-201911121709                                                 Deleting


How reproducible:

Uncertain


Actual results:

ClusterServiceVersion objects with "Deleting" status remain in the cluster indefinitely, even after an operator upgrade has completed


Expected results:

ClusterServiceVersion objects with "Deleting" status would be automatically removed (per the documentation).

Comment 2 Evan Cordell 2020-01-21 23:25:21 UTC
This appears, at first glance, to be a bug that can occur in the projection of ownership and operatorgroup labels.

The non-deleted CSVs in question have conflicting metadata:

    annotations:
      olm.operatorGroup: global-operators
      olm.operatorNamespace: openshift-operators
    labels:
      olm.api.e43efcaa45c9f8d0: provided
      olm.copiedFrom: openshift-operators-redhat

    name: elasticsearch-operator.4.2.5-201911121709
    namespace: openshift-operators

This indicates the although this copied CSV came originally from `openshift-operators-redhat`, it's current "parent" operator (the non-copied version) is in `openshift-operators`.

annotations.operatorNamespace is checked for the parent - in this case, since the operator is itself in the namespace listed as the parent, it is never cleaned up (even though it is in the Deleting phase).

This will require a bit more investigation, but I suspect that the annotations for operatorgroups are too aggressive. They should not be overwritten on copied CSVs.

Comment 4 Jian Zhang 2020-02-05 10:18:33 UTC
Cluster version is 4.4.0-0.nightly-2020-02-04-171905
mac:~ jianzhang$ oc exec catalog-operator-75df97fddd-t9tfz -- olm --version
OLM version: 0.14.1
git commit: 8775d5a0d7c632b1b0f1b0ccaf5c7c27fcd81f95

1, Create an OperatorGroup which selecting all namespaces in project openshift-operators-redhat
mac:~ jianzhang$ oc get operatorgroup -n openshift-operators-redhat -o yaml
apiVersion: v1
items:
- apiVersion: operators.coreos.com/v1
  kind: OperatorGroup
  metadata:
    annotations:
      olm.providedAPIs: Elasticsearch.v1.logging.openshift.io
    creationTimestamp: "2020-02-05T08:23:46Z"
    generation: 1
    name: openshift-operators-redhat
    namespace: openshift-operators-redhat
    resourceVersion: "68155"
    selfLink: /apis/operators.coreos.com/v1/namespaces/openshift-operators-redhat/operatorgroups/openshift-operators-redhat
    uid: c3b55bb5-4545-4232-a8cb-afad8b1c8792
  spec: {}
  status:
    lastUpdated: "2020-02-05T08:23:46Z"
    namespaces:
    - ""
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

2, Subscribe the  elasticsearch-operator in this project.
mac:~ jianzhang$ oc get sub -n openshift-operators-redhat
NAME                     PACKAGE                  SOURCE            CHANNEL
elasticsearch-operator   elasticsearch-operator   qe-app-registry   4.4
mac:~ jianzhang$ oc get csv -n openshift-operators-redhat
NAME                                        DISPLAY                  VERSION              REPLACES   PHASE
elasticsearch-operator.4.4.0-202001241932   Elasticsearch Operator   4.4.0-202001241932              Succeeded

mac:~ jianzhang$ oc get pods -n openshift-operators-redhat
NAME                                      READY   STATUS    RESTARTS   AGE
elasticsearch-operator-55b8b5f49f-b2s5s   1/1     Running   0          97m

3, Check the copied CSV in the openshift-operators namespace, and then check the annotation.olm.operatorGroup field.
mac:~ jianzhang$ oc get csv elasticsearch-operator.4.4.0-202001241932 -n openshift-operators -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: ClusterServiceVersion
metadata:
  annotations:
...
    olm.operatorGroup: openshift-operators-redhat
    olm.operatorNamespace: openshift-operators-redhat
    olm.skipRange: '>=4.2.0 <4.4.0'
    support: AOS Cluster Logging, Jaeger
  creationTimestamp: "2020-02-05T08:24:21Z"
  generation: 1
  labels:
    olm.api.e43efcaa45c9f8d0: provided
    olm.copiedFrom: openshift-operators-redhat

mac:~ jianzhang$ oc get operatorgroup -n openshift-operators 
NAME               AGE
global-operators   4h37m

As we can see the olm.operatorGroup value is openshift-operators-redhat, not the global-operators. LGTM.

4, Delete the elasticsearch-operator.
mac:~ jianzhang$ oc delete sub elasticsearch-operator -n openshift-operators-redhat 
subscription.operators.coreos.com "elasticsearch-operator" deleted
mac:~ jianzhang$ oc delete csv elasticsearch-operator.4.4.0-202001241932 -n openshift-operators-redhat 
clusterserviceversion.operators.coreos.com "elasticsearch-operator.4.4.0-202001241932" deleted

mac:~ jianzhang$ oc get csv -A
NAMESPACE                              NAME                                DISPLAY           VERSION              REPLACES   PHASE
openshift-logging                      clusterlogging.4.4.0-202001241616   Cluster Logging   4.4.0-202001241616              Succeeded
openshift-operator-lifecycle-manager   packageserver                       Package Server    0.14.1                          Succeeded

All copied CSVs are deleted as expected. LGTM, verify it.

Comment 5 Ben Luddy 2020-02-06 17:21:31 UTC
*** Bug 1775518 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2020-05-04 11:23:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.