Bug 1889779 - error when destroying a vSphere installation that failed early
Summary: error when destroying a vSphere installation that failed early
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.6.z
Assignee: Patrick Dillon
QA Contact: jima
URL:
Whiteboard:
Depends On:
Blocks: 1920552 1925282
TreeView+ depends on / blocked
 
Reported: 2020-10-20 14:40 UTC by Joel Diaz
Modified: 2021-02-24 15:27 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: installer did not tag vsphere ovf template until after upload. Consequence: template upload can take a long time and if installation terminates before upload, the template was not tagged and therefore could not be removed. Because the template could not be removed, the folder was not empty and also could not be deleted. Fix: move tagging to beginning of ovf template upload. Result: a partial template caused by quitting installation early can be removed. Destroying template and folder succeeds even when installation quits in the middle of template upload.
Clone Of:
Environment:
Last Closed: 2021-02-24 15:26:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
openshift install log (74.40 KB, text/plain)
2020-10-20 14:40 UTC, Joel Diaz
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4388 0 None closed Bug 1889779: vSphere destroy: handle failed clusters 2021-02-18 20:27:09 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:27:26 UTC

Description Joel Diaz 2020-10-20 14:40:12 UTC
Created attachment 1722913 [details]
openshift install log

Thanks for opening a bug report!
Before hitting the button, please fill in as much of the template below as you can.
If you leave out information, it's harder to help you.
Be ready for follow-up questions, and please respond in a timely manner.
If we can't reproduce a bug we might close your issue.
If we're wrong, PLEASE feel free to reopen it and explain why.

Version:

$ openshift-install version
[jdiaz@minigoomba os-install-4.6-rc4]$ ./openshift-install version
./openshift-install 4.6.0-rc.4
built from commit ebdbda57fc18d3b73e69f0f2cc499ddfca7e6593
release image registry.svc.ci.openshift.org/ocp/release@sha256:2c22e1c56831935a24efb827d2df572855ccd555c980070f77c39729526037d5


Platform: vSphere

Please specify:
* IPI

What happened?
Cluster installation failed while creating/importing the RHCOS image.
Attempt to destroy any resources created up to the failure to allow a second installation attempt, and the destroy command gives an error.


What did you expect to happen?

The destroy command should gracefully notice that there is nothing to delete/destroy.

How to reproduce it (as minimally and precisely as possible)?

Set up an environment where the RHCOS import fails, or just force quit the installer during the RHCOS image import. Now that we have an incomplete installation, try to run the destroy.

$ ./openshift-install destroy cluster --dir vsphere --log-level=debug 

Anything else we need to know?

Here are the logs of the end of the failed install:

DEBUG vsphereprivate_import_ova.import: Still creating... [1m40s elapsed] 
DEBUG vsphereprivate_import_ova.import: Still creating... [1m50s elapsed] 
DEBUG vsphereprivate_import_ova.import: Still creating... [2m0s elapsed] 
DEBUG vsphereprivate_import_ova.import: Still creating... [2m10s elapsed] 
ERROR                                              
ERROR Error: failed to upload: Post "https://10.3.32.7/nfc/528eb5b2-eca4-9d4a-3126-6c97584cb1fa/disk-0.vmdk": dial tcp 10.3.32.7:443: connect: connection timed out 
ERROR                                              
ERROR   on ../../../../tmp/openshift-install-278843614/main.tf line 43, in resource "vsphereprivate_import_ova" "import": 
ERROR   43: resource "vsphereprivate_import_ova" "import" { 
ERROR                                              
ERROR                                              
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change 


And here is the error when trying to 'destroy cluster':

[jdiaz@minigoomba os-install-4.6-rc4]$ ./openshift-install destroy cluster --dir vmc --log-level=debug
DEBUG OpenShift Installer 4.6.0-rc.4               
DEBUG Built from commit ebdbda57fc18d3b73e69f0f2cc499ddfca7e6593 
DEBUG find attached objects on tag                 
DEBUG find VirtualMachine objects                  
FATAL Failed to destroy cluster: object references is empty 
[jdiaz@minigoomba os-install-4.6-rc4]$ echo $?
1

Comment 2 Patrick Dillon 2020-11-18 16:09:02 UTC
In order to test this run a vsphere install with log-level debug. Once you start seeing "vsphereprivate_import_ova.import: Still creating... " 

Interrupt then run a delete.

Comment 4 jima 2020-11-23 03:36:02 UTC
Verified on ipi on vsphere with 4.7.0-0.nightly-2020-11-22-204912 and passed.

# ./openshift-install destroy cluster --dir ipi/ --log-level debug
DEBUG OpenShift Installer 4.7.0-0.nightly-2020-11-22-204912 
DEBUG Built from commit 68282c185253d4831514b20623b1717535c5e6f2 
DEBUG Find attached objects on tag                 
DEBUG Find VirtualMachine objects                  
DEBUG Delete VirtualMachines                       
INFO Destroyed                                     VirtualMachine=jimaipi-qwz86-rhcos
DEBUG Find Folder objects                          
DEBUG Delete Folder                                
INFO Destroyed                                     Folder=jimaipi-qwz86
DEBUG Delete tag                                   
DEBUG Delete tag category                          
DEBUG Purging asset "Metadata" from disk           
DEBUG Purging asset "Terraform Variables" from disk 
DEBUG Purging asset "Kubeconfig Admin Client" from disk 
DEBUG Purging asset "Kubeadmin Password" from disk 
DEBUG Purging asset "Certificate (journal-gatewayd)" from disk 
INFO Time elapsed: 9m33s

Comment 7 errata-xmlrpc 2021-02-24 15:26:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.