Bug 1975543 - [OLM] Remove stale cruft installed by CVO in earlier releases
Summary: [OLM] Remove stale cruft installed by CVO in earlier releases
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.9
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.11.0
Assignee: Alexander Greene
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-23 21:38 UTC by Jack Ottofaro
Modified: 2022-08-10 10:36 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Each component managed by CVO consists of a set of yamls defined in the `/manifest` directory in the root of the project's repo. When removing a yaml from the `/manifest` directory, you must apply the `release.openshift.io/delete: "true"` annotation, otherwise CVO will not delete the resources from the cluster. Consequence: Resources Fix: Reintroduce any resources that were removed from the `/manifest` directory and include the `release.openshift.io/delete: "true"` annotation so CVO cleans up the resources. Result: Resources that are no longer required for the OLM component are removed from customer clusters.
Clone Of: 1975533
Environment:
Last Closed: 2022-08-10 10:36:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Spreadsheet containing leaked resources. (11.48 KB, text/plain)
2021-06-23 21:38 UTC, Jack Ottofaro
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift operator-framework-olm pull 245 0 None open Bug 1975543: Use CVO to delete manifests removed from OLM 2022-01-24 20:30:43 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:36:35 UTC

Description Jack Ottofaro 2021-06-23 21:38:14 UTC
Created attachment 1793639 [details]
Spreadsheet containing leaked resources.

+++ This bug was initially created as a clone of Bug #1975533 +++

This "stale cruft" is created as a result of the following scenario. Release A had manifest M that lead the CVO to reconcile resource R. But then the component maintainers decided they didn't need R any longer, so they dropped manifest M in release B. The new CVO will no longer reconcile R, but clusters updating from A to B will still have resource R in-cluster, as an unmaintained orphan.

Now that https://issues.redhat.com/browse/OTA-222 has been implemented teams can go back through and create deletion manifests for these leaked resources.

The attachment delete-candidates.csv contains a list of leaked resources as compared to a freshly installed 4.9 cluster. Use this list to find your component's resources and use the manifest delete annotation (https://github.com/openshift/cluster-version-operator/pull/438) to remove them.

Note also that in the case of a cluster-scoped resource it may not need to be removed but simply be modified to remove namespace.

Comment 1 Alexander Greene 2022-01-24 19:57:01 UTC
It seems like OLM has 4 items that need cleanup:
```
v1,ConfigMap,olm-operators,openshift-operator-lifecycle-manager,4.1,4.1,0000_50_olm_11-olm-operators.configmap.yaml
operators.coreos.com,CatalogSource,olm-operators,openshift-operator-lifecycle-manager,4.1,4.1,0000_50_olm_12-olm-operators.catalogsource.yaml
operators.coreos.com,Subscription,packageserver,openshift-operator-lifecycle-manager,4.1,4.1,0000_50_olm_14-packageserver.subscription.yaml
operators.coreos.com,ClusterServiceVersion,packageserver.v0.9.0,openshift-operator-lifecycle-manager,4.1,4.1,0000_50_olm_16-packageserver.clusterserviceversion.yaml
```

Comment 3 Jian Zhang 2022-03-07 08:10:11 UTC
1, Install a cluster that doesn't contain the fixed PR, for example,
[cloud-user@preserve-olm-env2 jian]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-03-05-023708   True        False         6h18m   Cluster version is 4.10.0-0.nightly-2022-03-05-023708

2, Create the ConfigMap, CatalogSource, Subscription resource in the openshift-operator-lifecycle-manager namespace.

[cloud-user@preserve-olm-env2 jian]$ cat cm-bug1975543.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: olm-operators
  namespace: openshift-operator-lifecycle-manager
  annotations:
    release.openshift.io/delete: "true"
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
[cloud-user@preserve-olm-env2 jian]$ oc create -f cm-bug1975543.yaml 
configmap/olm-operators created

[cloud-user@preserve-olm-env2 jian]$ cat cs-bug1975543.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: olm-operators
  namespace: openshift-operator-lifecycle-manager
  annotations:
    release.openshift.io/delete: "true"
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
spec:
  sourceType: grpc
[cloud-user@preserve-olm-env2 jian]$ oc create -f cs-bug1975543.yaml 
catalogsource.operators.coreos.com/olm-operators created

[cloud-user@preserve-olm-env2 jian]$ cat sub-bug1975543.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: packageserver
  namespace: openshift-operator-lifecycle-manager
  annotations:
    release.openshift.io/delete: "true"
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
spec:
  name: packageserver
  source: qe-app-registry
  sourceNamespace: openshift-marketplace
[cloud-user@preserve-olm-env2 jian]$ oc create -f sub-bug1975543.yaml 
subscription.operators.coreos.com/packageserver created

[cloud-user@preserve-olm-env2 jian]$ oc get catalogsource
NAME            DISPLAY   TYPE   PUBLISHER   AGE
olm-operators             grpc               2m37s
[cloud-user@preserve-olm-env2 jian]$ oc get cm
NAME                            DATA   AGE
catalog-operator-heap-l6ml9     1      9m45s
collect-profiles-config         1      6h56m
kube-root-ca.crt                1      6h56m
olm-operator-heap-cmqfh         1      9m45s
olm-operators                   0      4m44s
openshift-service-ca.crt        1      6h56m
packageserver-controller-lock   0      6h52m
[cloud-user@preserve-olm-env2 jian]$ oc get sub
NAME            PACKAGE         SOURCE            CHANNEL
packageserver   packageserver   qe-app-registry 

3, Upgrade it to a release that contains the fixed PR.

[cloud-user@preserve-olm-env2 jian]$ oc adm release info registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-03-06-112819 -a .dockerconfigjson --commits |grep olm
  operator-lifecycle-manager                     https://github.com/openshift/operator-framework-olm                         494df5aa9d10d3b0719e24a1ce819223cbe6cd69
  operator-registry                              https://github.com/openshift/operator-framework-olm                         494df5aa9d10d3b0719e24a1ce819223cbe6cd69

[cloud-user@preserve-olm-env2 jian]$ oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:1d076c247014219fbcc389e8e260f6847fcc258148c1abe927252882b2507658 --force --allow-explicit-upgrade --allow-not-recommended
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Updating to release image registry.ci.openshift.org/ocp/release@sha256:1d076c247014219fbcc389e8e260f6847fcc258148c1abe927252882b2507658
[cloud-user@preserve-olm-env2 jian]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-03-05-023708   True        True          38s     Working towards 4.11.0-0.nightly-2022-03-06-112819: 9 of 776 done (1% complete)

[cloud-user@preserve-olm-env2 client]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-03-06-112819   True        False         16m     Cluster version is 4.11.0-0.nightly-2022-03-06-112819

[cloud-user@preserve-olm-env2 client]$ oc project
Using project "openshift-operator-lifecycle-manager" on server "https://api.qe-daily-0307.qe.devcluster.openshift.com:6443".
[cloud-user@preserve-olm-env2 client]$ oc get sub
No resources found in openshift-operator-lifecycle-manager namespace.
[cloud-user@preserve-olm-env2 client]$ oc get catalogsource
No resources found in openshift-operator-lifecycle-manager namespace.
[cloud-user@preserve-olm-env2 client]$ oc get cm
NAME                            DATA   AGE
catalog-operator-heap-j2ggz     1      6m6s
collect-profiles-config         1      8h
kube-root-ca.crt                1      8h
olm-operator-heap-jb28v         1      6m6s
openshift-service-ca.crt        1      8h
packageserver-controller-lock   0      8h

The resources created above have been removed successfully, LGTM, verify it.

Comment 5 errata-xmlrpc 2022-08-10 10:36:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.