Bug 1920552 - error when destroying a vSphere installation that failed early
Summary: error when destroying a vSphere installation that failed early
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.z
Assignee: Patrick Dillon
QA Contact: jima
URL:
Whiteboard:
Depends On: 1889779
Blocks: 1920554
TreeView+ depends on / blocked
 
Reported: 2021-01-26 14:55 UTC by Patrick Dillon
Modified: 2021-03-09 20:16 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-09 20:16:08 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4618 0 None open [release-4.6] Bug 1920552: vSphere destroy: handle failed clusters 2021-02-19 11:34:42 UTC
Red Hat Product Errata RHBA-2021:0674 0 None None None 2021-03-09 20:16:26 UTC

Description Patrick Dillon 2021-01-26 14:55:58 UTC
This bug was initially created as a copy of Bug #1889779

I am copying this bug because: 



Thanks for opening a bug report!
Before hitting the button, please fill in as much of the template below as you can.
If you leave out information, it's harder to help you.
Be ready for follow-up questions, and please respond in a timely manner.
If we can't reproduce a bug we might close your issue.
If we're wrong, PLEASE feel free to reopen it and explain why.

Version:

$ openshift-install version
[jdiaz@minigoomba os-install-4.6-rc4]$ ./openshift-install version
./openshift-install 4.6.0-rc.4
built from commit ebdbda57fc18d3b73e69f0f2cc499ddfca7e6593
release image registry.svc.ci.openshift.org/ocp/release@sha256:2c22e1c56831935a24efb827d2df572855ccd555c980070f77c39729526037d5


Platform: vSphere

Please specify:
* IPI

What happened?
Cluster installation failed while creating/importing the RHCOS image.
Attempt to destroy any resources created up to the failure to allow a second installation attempt, and the destroy command gives an error.


What did you expect to happen?

The destroy command should gracefully notice that there is nothing to delete/destroy.

How to reproduce it (as minimally and precisely as possible)?

Set up an environment where the RHCOS import fails, or just force quit the installer during the RHCOS image import. Now that we have an incomplete installation, try to run the destroy.

$ ./openshift-install destroy cluster --dir vsphere --log-level=debug 

Anything else we need to know?

Here are the logs of the end of the failed install:

DEBUG vsphereprivate_import_ova.import: Still creating... [1m40s elapsed] 
DEBUG vsphereprivate_import_ova.import: Still creating... [1m50s elapsed] 
DEBUG vsphereprivate_import_ova.import: Still creating... [2m0s elapsed] 
DEBUG vsphereprivate_import_ova.import: Still creating... [2m10s elapsed] 
ERROR                                              
ERROR Error: failed to upload: Post "https://10.3.32.7/nfc/528eb5b2-eca4-9d4a-3126-6c97584cb1fa/disk-0.vmdk": dial tcp 10.3.32.7:443: connect: connection timed out 
ERROR                                              
ERROR   on ../../../../tmp/openshift-install-278843614/main.tf line 43, in resource "vsphereprivate_import_ova" "import": 
ERROR   43: resource "vsphereprivate_import_ova" "import" { 
ERROR                                              
ERROR                                              
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change 


And here is the error when trying to 'destroy cluster':

[jdiaz@minigoomba os-install-4.6-rc4]$ ./openshift-install destroy cluster --dir vmc --log-level=debug
DEBUG OpenShift Installer 4.6.0-rc.4               
DEBUG Built from commit ebdbda57fc18d3b73e69f0f2cc499ddfca7e6593 
DEBUG find attached objects on tag                 
DEBUG find VirtualMachine objects                  
FATAL Failed to destroy cluster: object references is empty 
[jdiaz@minigoomba os-install-4.6-rc4]$ echo $?
1

Comment 2 Patrick Dillon 2021-01-26 15:14:24 UTC
Just needs a cherry-pick but depends on https://github.com/openshift/installer/pull/4579

Comment 4 jima 2021-03-01 02:01:47 UTC
Tested on 4.6.0-0.nightly-2021-02-26-224651 and passed.

Let installation be failed at the step of importing ova template:
ERROR                                              
ERROR Error: failed to upload: Post "https://10.3.32.8/nfc/52239649-17a4-85da-43d4-c84c826e71a4/disk-0.vmdk": dial tcp 10.3.32.8:443: connect: connection timed out 
ERROR                                              
ERROR   on ../../../../tmp/openshift-install-638830781/main.tf line 43, in resource "vsphereprivate_import_ova" "import": 
ERROR   43: resource "vsphereprivate_import_ova" "import" { 
ERROR                                              
ERROR                                              
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change 

Then running destroy command to remove all resources created on vsphere.
# ./openshift-install destroy cluster --dir ipi/ --log-level debug
DEBUG OpenShift Installer 4.6.0-0.nightly-2021-02-26-224651 
DEBUG Built from commit 9c86c823fff234c104f574eaf25953485edfe4b1 
DEBUG Find attached objects on tag                 
DEBUG Find VirtualMachine objects                  
DEBUG Delete VirtualMachines                       
INFO Destroyed                                     VirtualMachine=jimavmc-cjctc-rhcos
DEBUG Find Folder objects                          
DEBUG Delete Folder                                
INFO Destroyed                                     Folder=jimavmc-cjctc
DEBUG Delete tag                                   
DEBUG Delete tag category                          
DEBUG Purging asset "Metadata" from disk           
DEBUG Purging asset "Terraform Variables" from disk 
DEBUG Purging asset "Kubeconfig Admin Client" from disk 
DEBUG Purging asset "Kubeadmin Password" from disk 
DEBUG Purging asset "Certificate (journal-gatewayd)" from disk 
INFO Time elapsed: 2m37s

Comment 7 errata-xmlrpc 2021-03-09 20:16:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.20 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0674


Note You need to log in before you can comment on or make changes to this bug.