The strategy used for destroying cluster on OpenStack platform is sub-optimal. Each destroy function exits on the first conflict and it is expected to retry on a next iteration, hoping that in the mean time the conflict that prevented removal of the resource is fixed. This strategy can be quite expensive as there is a exponential backoff between retries. This can also be problematic with Kuryr deployments where conflicts happens more frequently, often requiring manual intervention (due to OpenStack bugs). This means that until someone looks at the cluster and fixes the conflicts, there may be *a lot* of leftover resources. By adopting a different strategy where destroy functions try to delete all resources and ignore the ones that have conflicts we can solve these two issues. On the next iteration there will be less conflicts, and hopefully less iterations in total. It also means that the destroy command removes everything that it can remove in the case of stuck Kuryr deployment, and consumes less useless resources.
Verified on Kuryr 1UPI: [cloud-user@installer-host ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-06-28-221420 True False 102m Cluster version is 4.9.0-0.nightly-2021-06-28-221420 (shiftstack) [cloud-user@installer-host ~]$ openshift-install --log-level debug destroy cluster --dir ostest/ DEBUG OpenShift Installer 4.9.0-0.nightly-2021-06-28-221420 . . . INFO Time elapsed: 14m40s Log attached.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759