Bug 1959586
| Summary: | [master] All resources not being cleaned up after clusterdeployment deletion | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Trey West <trwest> | |
| Component: | assisted-installer | Assignee: | Moti Asayag <masayag> | |
| assisted-installer sub component: | Deployment Operator | QA Contact: | bjacot | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | urgent | CC: | aos-bugs, atraeger, danili, masayag, mfilanov | |
| Version: | 4.8 | Keywords: | Triaged | |
| Target Milestone: | --- | |||
| Target Release: | 4.9.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | AI-Team-Projects KNI-EDGE-4.8 | |||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1974743 (view as bug list) | Environment: | ||
| Last Closed: | 2021-10-18 17:31:04 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1974743 | |||
|
Description
Trey West
2021-05-11 20:15:11 UTC
@atraeger @masayag I know that GC was disabled for kube-api but what needs to be done for a manual cleanup? We need to think of several cases like cluster deployment doesn't exist and it can be said that maybe cluster in the backed doesn't exist as well. for example user deleted cluster deployment and only then deleted infra env. We can probably ignore this case for 4.8 but at least if the deletion is done in the right order we need to do some cleanup. Similar flow can happen after we finished the installation, in 4.8 infra env is not relevant anymore and cluster is deleted from the backend after installation is done. When ClusterDeployment CR is deleted, the controller calls DeregisterClusterInternal which is responsible for deleting and releasing some of cluster resources. However it doesn't delete all of the resources relevant for the cluster which used to be deleted by the garbage collector's PermanentClustersDeletion. We should remove the rest of resources which are being deleted by the gc (manifests, logs, any files under cluster's directory on fs/s3) when ClusterDeployment is removed. Even if we'd keep the GC service enabled, at that point the cluster is already removed from the DB and cannot be picked up for completing its removal. The discovery-image is also removed as part of cluster deregistration. Removing of InfraEnv doesn't make any changes on cluster resources not to the discovery-image. This needs to be reconsidered when late-binding will be implemented. You are right regarding late binding, but in other cases we can change a bit the logic of the GC, for example it can look for deleted cluster, when deleting a cluster for some time it is just marked as deleted and we can get it from the DB. So in this case we can monitor deleted cluster and make the cleanup for the image etc. Once we have late binding: Deleting the image/InfraEnv should delete the ISO Deleting the cluster should delete the ignition and other files Now you can't delete the image without deleting the cluster, so deleting the cluster should delete everything. When cluster is deregistered, either by clusterdeployment deletion or by the API, the cluster record is softly deleted, meaning the attribute deleted_at in the DB is being set. It is the responsiblity of the garbage collector to run periodically and remove all of the cluster's resources. For that purpose, the garbage collection should be enabled for clearing cluster resources in operator deployment as well. Verified on 2.3.0-DOWNSTREAM-2021-07-06-15-17-20 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |