Bug 1977319
Summary: | [Hive] Remove stale cruft installed by CVO in earlier releases | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jack Ottofaro <jack.ottofaro> | ||||
Component: | Cloud Credential Operator | Assignee: | Nobody <nobody> | ||||
Status: | CLOSED ERRATA | QA Contact: | wang lin <lwan> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.9 | CC: | aos-bugs, arane, lwan, mfojtik, sttts, wking, xxia, yanyang | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | 4.10.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause: 'controller-manager-service' service resource was created by older versions of CCO that is no longer needed
Consequence: stale 'controller-manager-service' service resource created by CCO was still present even though no longer used
Fix: Recreated the service with delete annotation so that CVO can clean it up
Result: stale 'controller-manager-service' service resource created by CCO is no longer present
|
Story Points: | --- | ||||
Clone Of: | 1975533 | Environment: | |||||
Last Closed: | 2022-03-10 16:04:21 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jack Ottofaro
2021-06-29 13:11:08 UTC
From 4.1 to present the name of the file for the cloud-cred-operator deployment changed from 0000_50_cloud-credential-operator_01_deployment.yaml to 0000_50_cloud-credential-operator_03-deployment.yaml . The resource that CVO manages is still the same Deployment at openshift-cloud-credential-operator/cloud-credential-operator. So there is no orphaned resource. From 4.3 to present cloud-cred-operator has stopped deploying the "default" configmap which used to be deployed via 0000_50_cloud-credential-operator_01_operator_configmap.yaml. While the ConfigMap is deprecated, the cloud-cred-operator still supports that resource if it exists. Marking it for removal by CVO can have unintended effects if a cluster is relying on the old ConfigMap to enable/disable the cloud-cred-operator. It doesn't appear that there is anything to do here, as there are no orphaned resources. When we finally drop support for the old ConfigMap, we will have to ensure that it gets cleaned up (but we have our own "cleanup" controller to self-manage objects that we orphan/retire). Closing... (In reply to Joel Diaz from comment #3) > From 4.1 to present the name of the file for the cloud-cred-operator > deployment changed from 0000_50_cloud-credential-operator_01_deployment.yaml > to 0000_50_cloud-credential-operator_03-deployment.yaml . The resource that > CVO manages is still the same Deployment at > openshift-cloud-credential-operator/cloud-credential-operator. So there is > no orphaned resource. Comment 0's: Service controller-manager-service openshift-cloud-credential-operator 4.1 4.5 0000_50_cloud-credential-operator_01_deployment.yaml was talking about a Service in that file, not the Deployment. $ git log -p -G 'kind: Service$' manifests | grep '^commit \|kind: Service$' commit 04c400f160500202ea48468b10847f237bf2fcf4 +kind: Service -kind: Service -kind: Service commit f8da01cd8b275a1ce766ee30bc84a13da2f1e09f kind: Service +kind: Service commit fd2cc043dbc7223ac4f361f92be06507e8be2eb5 +kind: Service So yeah, looks like 04c400f16050 dropped a Service. Checking names: $ git show 04c400f16050 manifests | grep -A5 'kind: Service$' +kind: Service +metadata: + name: cco-metrics + namespace: openshift-cloud-credential-operator +spec: + ports: -- -kind: Service -metadata: - name: cco-metrics - namespace: openshift-cloud-credential-operator -spec: - ports: -- -kind: Service -metadata: - labels: - control-plane: controller-manager - controller-tools.k8s.io: "1.0" - name: controller-manager-service So yup, seems like you dropped the controller-manager-service Service, and should grow a delete manifest to remove it from born-before-4.6 clusters. Unless you're handling that in your own orphan/cleanup controller already? > From 4.3 to present cloud-cred-operator has stopped deploying the "default" > configmap which used to be deployed via > 0000_50_cloud-credential-operator_01_operator_configmap.yaml. While the > ConfigMap is deprecated, the cloud-cred-operator still supports that > resource if it exists. Marking it for removal by CVO can have unintended > effects if a cluster is relying on the old ConfigMap to enable/disable the > cloud-cred-operator. > > It doesn't appear that there is anything to do here, as there are no > orphaned resources. When we finally drop support for the old ConfigMap, we > will have to ensure that it gets cleaned up... Can you share more details on how this works. If there is a new config object that, when set, masks the config from the old ConfigMap, it would be safe to remove the ConfigMap in those clusters, right? You wouldn't want admins tweaking the (masked) ConfigMap under the impression that that was still driving operator config. But yeah, if the ConfigMap is a source of defaults for a new config object, and the new config object is unset, that would be one way that the old ConfigMap might still be having some effect on a modern cluster. Sorry, I missed that it was a Service resource that was reported as orphaned. I do see the Service named 'controller-manager-service' was embedded in the "deployment" file. I'll put up a PR to re-add it as an orphaned object. Thanks. We wrote a controller to clean up anything we dropped from our manifests. Whether it pre-dates this CVO "delete" functionality, I'm not sure. But you can see in https://github.com/openshift/cloud-credential-operator/blob/master/pkg/operator/cleanup/cleanup_controller.go that we watch for a list of orphaned CredentialsRequest resources that need cleaning up. There is only a single one that we watch for and clean up https://github.com/openshift/cloud-credential-operator/blob/master/pkg/operator/constants/constants.go#L150-L155 , and our cleanup controller only really cleans up a single kind of resource (as presently written). Given that CVO can delete resources for us, I think we can look into retiring our cleanup controller going forward. Opened https://issues.redhat.com/browse/CCO-146 to add removal of the CCO cleanup controller now that we know CVO can do this work for us. Verified using nightly build a. a fresh 4.10 cluster doesn't have this stale service $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-10-12-203611 True False 144m Cluster version is 4.10.0-0.nightly-2021-10-12-203611 $ oc get service -n openshift-cloud-credential-operator controller-manager-service Error from server (NotFound): services "controller-manager-service" not found b. installing a 4.9 cluster and create a service/controller-manager-service manually, then upgrade to 4.10. ####before upgrade: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-10-12-084355 True False 144m Cluster version is 4.9.0-0.nightly-2021-10-12-084355 oc get service -n openshift-cloud-credential-operator NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE cco-metrics ClusterIP 172.30.34.147 <none> 8443/TCP 165m controller-manager-service ClusterIP 172.30.242.209 <none> 443/TCP 12m pod-identity-webhook ClusterIP 172.30.87.47 <none> 443/TCP 156m ###after upgrade: $ oc get service -n openshift-cloud-credential-operator controller-manager-service No resources found in openshift-cloud-credential-operator controller-manager-service namespace. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |