Bug 1959586

Summary:	[master] All resources not being cleaned up after clusterdeployment deletion
Product:	OpenShift Container Platform	Reporter:	Trey West <trwest>
Component:	assisted-installer	Assignee:	Moti Asayag <masayag>
assisted-installer sub component:	Deployment Operator	QA Contact:	bjacot
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	urgent	CC:	aos-bugs, atraeger, danili, masayag, mfilanov
Version:	4.8	Keywords:	Triaged
Target Milestone:	---
Target Release:	4.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	AI-Team-Projects KNI-EDGE-4.8
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:
Clones:	1974743 (view as bug list)		Environment:
Last Closed:	2021-10-18 17:31:04 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1974743

Description Trey West 2021-05-11 20:15:11 UTC

Description of problem:

After creating a ClusterDeployment and an InfraEnv I can see the discovery iso generated as well as a directory with discovery.ign: 

discovery-image-da758f67-ece7-4fad-84a4-4232b56d2aed.iso
da758f67-ece7-4fad-84a4-4232b56d2aed/discovery.ign


However, after deleting the ClusterDeployment and InfraEnv, the discovery iso is deleted but the directory with with discovery.ign still exists.

I am under the impression that the garbage collector is not configured to periodically cleanup on the operator so how will these files ever be removed? 


How reproducible:
100%


Steps to Reproduce:
1. Create a ClusterDeployment and InfraEnv in order to generate the files
2. Delete both the ClusterDeployment and InfraEnv
3. Check that the files still exist

Actual results:

Directory with discovery.ign still exists


Expected results:

All files/directories related to the cluster are cleaned up once the ClusterDeployment is removed

Comment 2 Michael Filanov 2021-05-18 19:49:09 UTC

@atraeger @masayag I know that GC was disabled for kube-api but what needs to be done for a manual cleanup?
We need to think of several cases like cluster deployment doesn't exist and it can be said that maybe cluster in the backed doesn't exist as well. for example user deleted cluster deployment and only then deleted infra env. 
We can probably ignore this case for 4.8 but at least if the deletion is done in the right order we need to do some cleanup. 
Similar flow can happen after we finished the installation, in 4.8 infra env is not relevant anymore and cluster is deleted from the backend after installation is done.

Comment 3 Moti Asayag 2021-05-19 07:46:17 UTC

When ClusterDeployment CR is deleted, the controller calls DeregisterClusterInternal which is responsible for deleting and releasing some of cluster resources.
However it doesn't delete all of the resources relevant for the cluster which used to be deleted by the garbage collector's PermanentClustersDeletion.
We should remove the rest of resources which are being deleted by the gc (manifests, logs, any files under cluster's directory on fs/s3) when ClusterDeployment is removed.
Even if we'd keep the GC service enabled, at that point the cluster is already removed from the DB and cannot be picked up for completing its removal.
The discovery-image is also removed as part of cluster deregistration.

Removing of InfraEnv doesn't make any changes on cluster resources not to the discovery-image.
This needs to be reconsidered when late-binding will be implemented.

Comment 4 Michael Filanov 2021-05-19 08:06:04 UTC

You are right regarding late binding, but in other cases we can change a bit the logic of the GC, for example it can look for deleted cluster, when deleting a cluster for some time it is just marked as deleted and we can get it from the DB.
So in this case we can monitor deleted cluster and make the cleanup for the image etc.

Comment 5 Avishay Traeger 2021-05-19 08:35:19 UTC

Once we have late binding:
Deleting the image/InfraEnv should delete the ISO
Deleting the cluster should delete the ignition and other files

Now you can't delete the image without deleting the cluster, so deleting the cluster should delete everything.

Comment 6 Moti Asayag 2021-06-01 14:49:41 UTC

When cluster is deregistered, either by clusterdeployment deletion or by the API, the cluster record is softly deleted,
meaning the attribute deleted_at in the DB is being set.
It is the responsiblity of the garbage collector to run periodically and remove all of the cluster's resources.
For that purpose, the garbage collection should be enabled for clearing cluster resources in operator deployment as well.

Comment 11 Trey West 2021-07-14 21:31:46 UTC

Verified on 2.3.0-DOWNSTREAM-2021-07-06-15-17-20

Comment 15 errata-xmlrpc 2021-10-18 17:31:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759