Created attachment 1795797 [details] Spreadsheet containing leaked resources. Created attachment 1795797 [details] Spreadsheet containing leaked resources. +++ This bug was initially created as a clone of Bug #1975533 +++ This "stale cruft" is created as a result of the following scenario. Release A had manifest M that lead the CVO to reconcile resource R. But then the component maintainers decided they didn't need R any longer, so they dropped manifest M in release B. The new CVO will no longer reconcile R, but clusters updating from A to B will still have resource R in-cluster, as an unmaintained orphan. Now that https://issues.redhat.com/browse/OTA-222 has been implemented teams can go back through and create deletion manifests for these leaked resources. The attachment delete-candidates.csv contains a list of leaked resources as compared to a freshly installed 4.9 cluster. Use this list to find your component's resources and use the manifest delete annotation (https://github.com/openshift/cluster-version-operator/pull/438) to remove them. Note also that in the case of a cluster-scoped resource it may not need to be removed but simply be modified to remove namespace. The two lines thought to be owned by Hive are: Service controller-manager-service openshift-cloud-credential-operator 4.1 4.5 0000_50_cloud-credential-operator_01_deployment.yaml ConfigMap cloud-credential-operator-config openshift-cloud-credential-operator 4.3 4.4 0000_50_cloud-credential-operator_01_operator_configmap.yaml
From 4.1 to present the name of the file for the cloud-cred-operator deployment changed from 0000_50_cloud-credential-operator_01_deployment.yaml to 0000_50_cloud-credential-operator_03-deployment.yaml . The resource that CVO manages is still the same Deployment at openshift-cloud-credential-operator/cloud-credential-operator. So there is no orphaned resource. From 4.3 to present cloud-cred-operator has stopped deploying the "default" configmap which used to be deployed via 0000_50_cloud-credential-operator_01_operator_configmap.yaml. While the ConfigMap is deprecated, the cloud-cred-operator still supports that resource if it exists. Marking it for removal by CVO can have unintended effects if a cluster is relying on the old ConfigMap to enable/disable the cloud-cred-operator. It doesn't appear that there is anything to do here, as there are no orphaned resources. When we finally drop support for the old ConfigMap, we will have to ensure that it gets cleaned up (but we have our own "cleanup" controller to self-manage objects that we orphan/retire). Closing...
(In reply to Joel Diaz from comment #3) > From 4.1 to present the name of the file for the cloud-cred-operator > deployment changed from 0000_50_cloud-credential-operator_01_deployment.yaml > to 0000_50_cloud-credential-operator_03-deployment.yaml . The resource that > CVO manages is still the same Deployment at > openshift-cloud-credential-operator/cloud-credential-operator. So there is > no orphaned resource. Comment 0's: Service controller-manager-service openshift-cloud-credential-operator 4.1 4.5 0000_50_cloud-credential-operator_01_deployment.yaml was talking about a Service in that file, not the Deployment. $ git log -p -G 'kind: Service$' manifests | grep '^commit \|kind: Service$' commit 04c400f160500202ea48468b10847f237bf2fcf4 +kind: Service -kind: Service -kind: Service commit f8da01cd8b275a1ce766ee30bc84a13da2f1e09f kind: Service +kind: Service commit fd2cc043dbc7223ac4f361f92be06507e8be2eb5 +kind: Service So yeah, looks like 04c400f16050 dropped a Service. Checking names: $ git show 04c400f16050 manifests | grep -A5 'kind: Service$' +kind: Service +metadata: + name: cco-metrics + namespace: openshift-cloud-credential-operator +spec: + ports: -- -kind: Service -metadata: - name: cco-metrics - namespace: openshift-cloud-credential-operator -spec: - ports: -- -kind: Service -metadata: - labels: - control-plane: controller-manager - controller-tools.k8s.io: "1.0" - name: controller-manager-service So yup, seems like you dropped the controller-manager-service Service, and should grow a delete manifest to remove it from born-before-4.6 clusters. Unless you're handling that in your own orphan/cleanup controller already? > From 4.3 to present cloud-cred-operator has stopped deploying the "default" > configmap which used to be deployed via > 0000_50_cloud-credential-operator_01_operator_configmap.yaml. While the > ConfigMap is deprecated, the cloud-cred-operator still supports that > resource if it exists. Marking it for removal by CVO can have unintended > effects if a cluster is relying on the old ConfigMap to enable/disable the > cloud-cred-operator. > > It doesn't appear that there is anything to do here, as there are no > orphaned resources. When we finally drop support for the old ConfigMap, we > will have to ensure that it gets cleaned up... Can you share more details on how this works. If there is a new config object that, when set, masks the config from the old ConfigMap, it would be safe to remove the ConfigMap in those clusters, right? You wouldn't want admins tweaking the (masked) ConfigMap under the impression that that was still driving operator config. But yeah, if the ConfigMap is a source of defaults for a new config object, and the new config object is unset, that would be one way that the old ConfigMap might still be having some effect on a modern cluster.
Sorry, I missed that it was a Service resource that was reported as orphaned. I do see the Service named 'controller-manager-service' was embedded in the "deployment" file. I'll put up a PR to re-add it as an orphaned object. Thanks. We wrote a controller to clean up anything we dropped from our manifests. Whether it pre-dates this CVO "delete" functionality, I'm not sure. But you can see in https://github.com/openshift/cloud-credential-operator/blob/master/pkg/operator/cleanup/cleanup_controller.go that we watch for a list of orphaned CredentialsRequest resources that need cleaning up. There is only a single one that we watch for and clean up https://github.com/openshift/cloud-credential-operator/blob/master/pkg/operator/constants/constants.go#L150-L155 , and our cleanup controller only really cleans up a single kind of resource (as presently written). Given that CVO can delete resources for us, I think we can look into retiring our cleanup controller going forward.
Opened https://issues.redhat.com/browse/CCO-146 to add removal of the CCO cleanup controller now that we know CVO can do this work for us.
Verified using nightly build a. a fresh 4.10 cluster doesn't have this stale service $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-10-12-203611 True False 144m Cluster version is 4.10.0-0.nightly-2021-10-12-203611 $ oc get service -n openshift-cloud-credential-operator controller-manager-service Error from server (NotFound): services "controller-manager-service" not found b. installing a 4.9 cluster and create a service/controller-manager-service manually, then upgrade to 4.10. ####before upgrade: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-10-12-084355 True False 144m Cluster version is 4.9.0-0.nightly-2021-10-12-084355 oc get service -n openshift-cloud-credential-operator NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE cco-metrics ClusterIP 172.30.34.147 <none> 8443/TCP 165m controller-manager-service ClusterIP 172.30.242.209 <none> 443/TCP 12m pod-identity-webhook ClusterIP 172.30.87.47 <none> 443/TCP 156m ###after upgrade: $ oc get service -n openshift-cloud-credential-operator controller-manager-service No resources found in openshift-cloud-credential-operator controller-manager-service namespace.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056