Bug 1953019 - [Installer][baremetal][metal3] The baremetal IPI installer fails on delete cluster with: failed to clean baremetal bootstrap storage pool
Summary: [Installer][baremetal][metal3] The baremetal IPI installer fails on delete cl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.7
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.8.0
Assignee: Kiran Thyagaraja
QA Contact: Polina Rabinovich
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-23 18:04 UTC by Andreas Karis
Modified: 2021-11-01 19:44 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:03:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4897 0 None open Bug 1953019: Baremetal: While deleting cluster, warn instead of exiting 2021-05-03 12:58:37 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:03:47 UTC

Description Andreas Karis 2021-04-23 18:04:52 UTC
[Installer][baremetal][metal3] The baremetal IPI installer fails on delete cluster with: failed to clean baremetal bootstrap storage pool

{code}
[root@openshift-jumpserver-0 ~]# ./openshift-baremetal-install version
./openshift-baremetal-install 4.7.5
built from commit e15f17c958b4a04e770c0cfe758ca69452874508
release image quay.io/openshift-release-dev/ocp-release@sha256:0a4c44daf1666f069258aa983a66afa2f3998b78ced79faa6174e0a0f438f0a5
{code}

After a successful installation, the libvirt bootstrap VM and the storage pool are already deleted by the installer:
{code}

[root@openshift-jumpserver-0 ~]# virsh pool-list
 Name      State    Autostart
-------------------------------
 default   active   yes

[root@openshift-jumpserver-0 ~]# virsh list --all
 Id   Name   State
--------------------

{code}

Therefore, when someone tries to destroy the cluster, it fails:
{code}
[root@openshift-jumpserver-0 ~]# ./openshift-baremetal-install destroy cluster  --log-level=debug --dir=openshift-install
DEBUG OpenShift Installer 4.7.5                    
DEBUG Built from commit e15f17c958b4a04e770c0cfe758ca69452874508 
DEBUG Deleting bare metal resources                
DEBUG Deleting baremetal bootstrap volumes         
FATAL Failed to destroy cluster: failed to clean baremetal bootstrap storage pool: get storage pool "ipi-cluster-kfczp-bootstrap": virError(Code=49, Domain=18, Message='Storage pool not found: no storage pool with matching name 'ipi-cluster-kfczp-bootstrap'') 
{code}

https://github.com/openshift/installer/blob/fae650e24e7036b333b2b2d9dfb5a08a29cd07b1/pkg/destroy/baremetal/baremetal.go#L59

I do not think that we should throw a FATAL error when a resource which we want to clean up cannot be found. If the pool cannot be found, this also indicates that the storage should be gone. Thus, simply log a warning or info and continue the operation.

Comment 7 Andreas Karis 2021-05-14 08:49:03 UTC
I tested with the latest nightly, and this looks way better, now:

[root@openshift-jumpserver-0 ~]#  ./openshift-baremetal-install-4.8.0-0.nightly-2021-05-13-222446 destroy cluster  --log-level=debug --dir=openshift-install
DEBUG OpenShift Installer 4.8.0-0.nightly-2021-05-13-222446 
DEBUG Built from commit 7ba3f375977b7e2a0adc856db3f258f2c53b8aef 
DEBUG Deleting bare metal resources                
DEBUG Deleting baremetal bootstrap volumes         
WARNING Unable to get storage pool ipi-cluster-lhmww-bootstrap: virError(Code=49, Domain=18, Message='Storage pool not found: no storage pool with matching name 'ipi-cluster-lhmww-bootstrap'') 
DEBUG FIXME: delete resources!                     
DEBUG Purging asset "Metadata" from disk           
DEBUG Purging asset "Master Ignition Customization Check" from disk 
DEBUG Purging asset "Worker Ignition Customization Check" from disk 
DEBUG Purging asset "Terraform Variables" from disk 
DEBUG Purging asset "Kubeconfig Admin Client" from disk 
DEBUG Purging asset "Kubeadmin Password" from disk 
DEBUG Purging asset "Certificate (journal-gatewayd)" from disk 
DEBUG Purging asset "Cluster" from disk            
INFO Time elapsed: 1s

Comment 8 Polina Rabinovich 2021-05-14 08:53:36 UTC
Verified with Andreas Karis

Comment 11 errata-xmlrpc 2021-07-27 23:03:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.