Created attachment 1793634 [details]
Spreadsheet containing leaked resources.
+++ This bug was initially created as a clone of Bug #1975533 +++
This "stale cruft" is created as a result of the following scenario. Release A had manifest M that lead the CVO to reconcile resource R. But then the component maintainers decided they didn't need R any longer, so they dropped manifest M in release B. The new CVO will no longer reconcile R, but clusters updating from A to B will still have resource R in-cluster, as an unmaintained orphan.
Now that https://issues.redhat.com/browse/OTA-222 has been implemented teams can go back through and create deletion manifests for these leaked resources.
The attachment delete-candidates.csv contains a list of leaked resources as compared to a freshly installed 4.9 cluster. Use this list to find your component's resources and use the manifest delete annotation (https://github.com/openshift/cluster-version-operator/pull/438) to remove them.
Note also that in the case of a cluster-scoped resource it may not need to be removed but simply be modified to remove namespace.
CMO seems to occur once in the linked spreadsheet:
GROUP KIND NAME NAMESPACE Born In Last In YAML File
extensions Deployment cluster-monitoring-operator openshift-monitoring 4.1 4.1 0000_50_cluster-monitoring-operator_04-deployment.yaml
The manifest in question seems to be this https://github.com/openshift/cluster-monitoring-operator/blob/release-4.1/manifests/04-deployment.yaml
Iiuc the manifest was just renamed and there shouldn't be any orphans left behind.
Leaving the final decision to Simon though.
Changing the manifest in a given release does not change the resource if it was installed in a prior release. So if a customer had a 4.1 cluster this resource would be present and if they upgraded that cluster to 4.9.z the resource would still be present. The purpose of the manifest delete is to remove such resources during the upgrade.
IIUC the changes that happened between 4.1 and 4.2 are:
* manifests/04-deployment.yaml renamed to manifests/0000_50_cluster_monitoring_operator_04-deployment.yaml 
* switching from "apiVersion: extensions/v1beta1" to "apiVersion: apps/v1"
Having said that, the manifest maps in both releases to the deployment resource named "cluster-monitoring-operator" in the "openshift-monitoring" namespace so there's no leaked resource left behind and nothing to be removed. Did I miss something?
(In reply to Simon Pasquier from comment #3)
The issue is that Group 'extensions' != 'apps' so the Deployments are not the same. So there will be an orphaned 'extension openshift-monitoring/cluster-monitoring-operator' in a 4.9 that was upgraded from a 4.1. If it was just a version change, extensions/v1beta1 ->
extensions/v1, that wouldn't be an issue.
I don't see how this can possibly be the case. Deployments were moved from `extensions/v1beta1` to `apps/v1` and the old representation in the `extensions` group was actually removed in Kubernetes v1.16 altogether -- OpenShift 4.9 doesn't know about Deployments with version `extensions/v1beta1`. The previous versions would have been converted to the preferred storage version by the API server, right?
If there was an orphaned resource, we would have a bunch of clusters with multiple cluster-monitoring-operator deployments.
If we create a deletion manifest for this in 4.9, I don't think it would work, and if it did I think it would delete our operator deployment. Am I misunderstanding something here?
(In reply to Brad Ison from comment #5)
> I don't see how this can possibly be the case. Deployments were moved from
> `extensions/v1beta1` to `apps/v1` and the old representation in the
> `extensions` group was actually removed in Kubernetes v1.16 altogether --
> OpenShift 4.9 doesn't know about Deployments with version
> `extensions/v1beta1`. The previous versions would have been converted to the
> preferred storage version by the API server, right?
> If there was an orphaned resource, we would have a bunch of clusters with
> multiple cluster-monitoring-operator deployments.
> If we create a deletion manifest for this in 4.9, I don't think it would
> work, and if it did I think it would delete our operator deployment. Am I
> misunderstanding something here?
That makes sense to me as well. This was one I couldn't easily verify since it was born in 4.1. Feel free to close.
Sounds good. Thanks.