Bug 1975538

Summary: [Storage] Remove stale cruft installed by CVO in earlier releases
Product: OpenShift Container Platform Reporter: Jack Ottofaro <jack.ottofaro>
Component: StorageAssignee: Jonathan Dobson <jdobson>
Storage sub component: Operators QA Contact: Wei Duan <wduan>
Status: CLOSED WONTFIX Docs Contact:
Severity: low    
Priority: low CC: aos-bugs, jsafrane, mfojtik, sttts, xxia, yanyang
Version: 4.9   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1975533 Environment:
Last Closed: 2023-02-10 23:53:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Spreadsheet containing leaked resources. none

Description Jack Ottofaro 2021-06-23 21:30:12 UTC
Created attachment 1793636 [details]
Spreadsheet containing leaked resources.

+++ This bug was initially created as a clone of Bug #1975533 +++

This "stale cruft" is created as a result of the following scenario. Release A had manifest M that lead the CVO to reconcile resource R. But then the component maintainers decided they didn't need R any longer, so they dropped manifest M in release B. The new CVO will no longer reconcile R, but clusters updating from A to B will still have resource R in-cluster, as an unmaintained orphan.

Now that https://issues.redhat.com/browse/OTA-222 has been implemented teams can go back through and create deletion manifests for these leaked resources.

The attachment delete-candidates.csv contains a list of leaked resources as compared to a freshly installed 4.9 cluster. Use this list to find your component's resources and use the manifest delete annotation (https://github.com/openshift/cluster-version-operator/pull/438) to remove them.

Note also that in the case of a cluster-scoped resource it may not need to be removed but simply be modified to remove namespace.

Comment 1 Tomas Smetana 2021-06-24 08:01:56 UTC
I checked the attached CSV: There potential "cruft" seems to belong to cluster-storage-operator or csi-snapshot-controller operator. Channging subcomponent accordingly.

Comment 2 Jan Safranek 2021-06-25 14:34:33 UTC
It looks like we should make sure these objects are cleaned:

Namespaced in openshift-cluster-storage-operator:
ClusterRoleBinding	csi-snapshot-controller-operator-role
RoleBinding	cluster-storage-operator
Role	cluster-storage-operator
ClusterRoleBinding	cluster-storage-operator-role

Non-namespaced:
ClusterRoleBinding	cluster-storage-operator
ClusterRole	cluster-storage-operator

Comment 3 Jonathan Dobson 2021-06-30 23:05:54 UTC
The following 2 objects are still present and do not need to be removed. It's just that the namespace was removed from each of them starting in 4.7 with the following commits.

ClusterRoleBinding      csi-snapshot-controller-operator-role   openshift-cluster-storage-operator      4.4     4.6     0000_50_cluster-csi-snapshot-controller-operator_05_operator_rbac.yaml
https://github.com/openshift/cluster-csi-snapshot-controller-operator/commit/fb5d0a4e2171276a81d319eedfe73b284e08f439

ClusterRoleBinding      cluster-storage-operator-role   openshift-cluster-storage-operator      4.6     4.6     0000_50_cluster-storage-operator_08_operator_rbac.yaml
https://github.com/openshift/cluster-storage-operator/commit/27fc35b95b5f71b218544b4b187f6f23f74b60ef


The following 4 objects were removed starting in 4.6:

ClusterRoleBinding      cluster-storage-operator        <none>  4.1     4.5     0000_50_cluster-storage-operator_01-cluster-role-binding.yaml

ClusterRole     cluster-storage-operator        <none>  4.1     4.5     0000_50_cluster-storage-operator_01-cluster-role.yaml

RoleBinding     cluster-storage-operator        openshift-cluster-storage-operator      4.1     4.5     0000_50_cluster-storage-operator_01-role-binding.yaml

Role    cluster-storage-operator        openshift-cluster-storage-operator      4.1     4.5     0000_50_cluster-storage-operator_01-role.yaml

They were renamed from manifests/01-* to manifests/03-* with this commit:
https://github.com/openshift/cluster-storage-operator/commit/471ec786f01cb00106691a2b43f7b6c571feaf37

And then later removed altogether with this commit:
https://github.com/openshift/cluster-storage-operator/commit/f0411e5a596164aeda9e74fc269278d3abf01bc3

These were removed in favor of creating driver specific objects under assets/csidriveroperators/*

So these 4 files need to be restored from f0411e5a596164aeda9e74fc269278d3abf01bc3 and then add the release.openshift.io/delete annotation for CVO to clean up stale objects that may be left behind from previous releases:

manifests/03-cluster-role-binding.yaml
manifests/03-cluster-role.yaml
manifests/03-role-binding.yaml
manifests/03-role.yaml

See "Manifest Annotation For Object Deletion" doc:
https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/object-deletion.md

Comment 5 Jonathan Dobson 2022-02-02 23:34:51 UTC
Need to re-test this on a newer build, I let it go stale for too long:
https://github.com/openshift/cluster-storage-operator/pull/182