Bug 1959586 - [master] All resources not being cleaned up after clusterdeployment deletion
Summary: [master] All resources not being cleaned up after clusterdeployment deletion
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: ---
: 4.9.0
Assignee: Moti Asayag
QA Contact: bjacot
URL:
Whiteboard: AI-Team-Projects KNI-EDGE-4.8
Depends On:
Blocks: 1974743
TreeView+ depends on / blocked
 
Reported: 2021-05-11 20:15 UTC by Trey West
Modified: 2021-10-18 17:31 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1974743 (view as bug list)
Environment:
Last Closed: 2021-10-18 17:31:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 1801 0 None open OCPBUGSM-29066 Remove cluster resources after clusterdeployment deletion 2021-05-24 11:29:11 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:31:56 UTC

Internal Links: 1974743

Description Trey West 2021-05-11 20:15:11 UTC
Description of problem:

After creating a ClusterDeployment and an InfraEnv I can see the discovery iso generated as well as a directory with discovery.ign: 

discovery-image-da758f67-ece7-4fad-84a4-4232b56d2aed.iso
da758f67-ece7-4fad-84a4-4232b56d2aed/discovery.ign


However, after deleting the ClusterDeployment and InfraEnv, the discovery iso is deleted but the directory with with discovery.ign still exists.

I am under the impression that the garbage collector is not configured to periodically cleanup on the operator so how will these files ever be removed? 


How reproducible:
100%


Steps to Reproduce:
1. Create a ClusterDeployment and InfraEnv in order to generate the files
2. Delete both the ClusterDeployment and InfraEnv
3. Check that the files still exist

Actual results:

Directory with discovery.ign still exists


Expected results:

All files/directories related to the cluster are cleaned up once the ClusterDeployment is removed

Comment 2 Michael Filanov 2021-05-18 19:49:09 UTC
@atraeger @masayag I know that GC was disabled for kube-api but what needs to be done for a manual cleanup?
We need to think of several cases like cluster deployment doesn't exist and it can be said that maybe cluster in the backed doesn't exist as well. for example user deleted cluster deployment and only then deleted infra env. 
We can probably ignore this case for 4.8 but at least if the deletion is done in the right order we need to do some cleanup. 
Similar flow can happen after we finished the installation, in 4.8 infra env is not relevant anymore and cluster is deleted from the backend after installation is done.

Comment 3 Moti Asayag 2021-05-19 07:46:17 UTC
When ClusterDeployment CR is deleted, the controller calls DeregisterClusterInternal which is responsible for deleting and releasing some of cluster resources.
However it doesn't delete all of the resources relevant for the cluster which used to be deleted by the garbage collector's PermanentClustersDeletion.
We should remove the rest of resources which are being deleted by the gc (manifests, logs, any files under cluster's directory on fs/s3) when ClusterDeployment is removed.
Even if we'd keep the GC service enabled, at that point the cluster is already removed from the DB and cannot be picked up for completing its removal.
The discovery-image is also removed as part of cluster deregistration.

Removing of InfraEnv doesn't make any changes on cluster resources not to the discovery-image.
This needs to be reconsidered when late-binding will be implemented.

Comment 4 Michael Filanov 2021-05-19 08:06:04 UTC
You are right regarding late binding, but in other cases we can change a bit the logic of the GC, for example it can look for deleted cluster, when deleting a cluster for some time it is just marked as deleted and we can get it from the DB.
So in this case we can monitor deleted cluster and make the cleanup for the image etc.

Comment 5 Avishay Traeger 2021-05-19 08:35:19 UTC
Once we have late binding:
Deleting the image/InfraEnv should delete the ISO
Deleting the cluster should delete the ignition and other files

Now you can't delete the image without deleting the cluster, so deleting the cluster should delete everything.

Comment 6 Moti Asayag 2021-06-01 14:49:41 UTC
When cluster is deregistered, either by clusterdeployment deletion or by the API, the cluster record is softly deleted,
meaning the attribute deleted_at in the DB is being set.
It is the responsiblity of the garbage collector to run periodically and remove all of the cluster's resources.
For that purpose, the garbage collection should be enabled for clearing cluster resources in operator deployment as well.

Comment 11 Trey West 2021-07-14 21:31:46 UTC
Verified on 2.3.0-DOWNSTREAM-2021-07-06-15-17-20

Comment 15 errata-xmlrpc 2021-10-18 17:31:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.