Bug 2099872

Summary: openshift-install destroy cluster needs to be run twice on powervs
Product: OpenShift Container Platform Reporter: Manoj Kumar <manokuma>
Component: InstallerAssignee: OCP Installer <ocp-installer>
Installer sub component: openshift-installer QA Contact: Gaoyun Pei <gpei>
Status: CLOSED DEFERRED Docs Contact:
Severity: high    
Priority: unspecified CC: padillon
Version: 4.11   
Target Milestone: ---   
Target Release: ---   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-09 01:21:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Manoj Kumar 2022-06-21 21:03:58 UTC

Version:

$ openshift-install version
./openshift-install 4.11.0-0.nightly-ppc64le-2022-06-16-003709
built from commit f39a15787e72cd803e450500ce6be9ff846e5b81
release image quay.io/openshift-release-dev/ocp-release-nightly@sha256:1887990a385be4d835ccd83a0ff938bb3b9a117686364a10a6014c7f0170f180
release architecture ppc64le


Platform:

powervs
IPI

What happened?

Cluster creation failed.  The first destroy timed out.  The second destroy completed successfully.

The first destroy shows this error:

DEBUG Waiting for job "78c9630b-55e2-451b-a5ab-1ff094d4c054" to delete (status is "running") 
DEBUG Waiting for job "78c9630b-55e2-451b-a5ab-1ff094d4c054" to delete (status is "running") 
DEBUG Waiting for job "78c9630b-55e2-451b-a5ab-1ff094d4c054" to delete (status is "running") 
DEBUG Waiting for job "78c9630b-55e2-451b-a5ab-1ff094d4c054" to delete (status is "running") 
DEBUG Waiting for job "78c9630b-55e2-451b-a5ab-1ff094d4c054" to delete (status is "running") 
DEBUG Waiting for job "78c9630b-55e2-451b-a5ab-1ff094d4c054" to delete (status is "running") 
DEBUG powervs.PolledRun: Failed destroyCluster                                                  
DEBUG powervs.Run: after wait.PollImmediateInfinite, err = failed to destroy cluster: destroyCluster: timed out 
FATAL Failed to destroy cluster: failed to destroy cluster: destroyCluster: timed out 

The second destroy command always works.

INFO Deleted job "7f19e4d5-7dc2-40a0-b4e3-48599d293d8a" 
DEBUG destroyCluster: <-wgDone                     
DEBUG executeStageFunction: Adding: DNS Records    
DEBUG executeStageFunction: Adding: SSH Keys       
DEBUG executeStageFunction: duration = 14m59.999996042s 
DEBUG executeStageFunction: duration = 14m59.999998285s 
DEBUG executeStageFunction: Adding: Cloud Object Storage Instances 
DEBUG executeStageFunction: Executing: DNS Records 
DEBUG executeStageFunction: duration = 14m59.999996356s 
DEBUG Listing DNS records                          
DEBUG executeStageFunction: Executing: Cloud Object Storage Instances 
DEBUG executeStageFunction: Executing: SSH Keys    
DEBUG Listing COS instances                        
DEBUG Listing SSHKeys                              
DEBUG listSSHKeys: FOUND: rdr-kumarmn-p57x5-key    
DEBUG listCOSInstances: FOUND rdr-kumarmn-p57x5-cos dd615706-7c65-4cde-8e7e-87d59caefbb5 
DEBUG listDNSRecords: FOUND: da860eb4509ac0379c3f792cb05c7009, api-int.rdr-kumarmn.scnl-ibm.com 
DEBUG listDNSRecords: FOUND: 68e20d7ec80e01bba689b3f14dbb56be, api.rdr-kumarmn.scnl-ibm.com 
DEBUG listDNSRecords: PerPage = 20, Page = 1, Count = 20 
DEBUG listDNSRecords: moreData = true              
DEBUG Deleting sshKey "rdr-kumarmn-p57x5-key"      
DEBUG listDNSRecords: PerPage = 20, Page = 2, Count = 20 
DEBUG listDNSRecords: moreData = true              
DEBUG listDNSRecords: PerPage = 20, Page = 3, Count = 4 
DEBUG listDNSRecords: moreData = false             
DEBUG Deleting DNS record "api.rdr-kumarmn.scnl-ibm.com" 
INFO Deleted sshKey "rdr-kumarmn-p57x5-key"       
DEBUG Deleting DNS record "api-int.rdr-kumarmn.scnl-ibm.com" 
DEBUG Deleting COS instance "rdr-kumarmn-p57x5-cos" 
INFO Deleted DNS record "api.rdr-kumarmn.scnl-ibm.com" 
INFO Deleted DNS record "api-int.rdr-kumarmn.scnl-ibm.com" 
INFO Deleted COS instance "rdr-kumarmn-p57x5-cos" 
DEBUG destroyCluster: <-wgDone                     
DEBUG powervs.Run: after wait.PollImmediateInfinite, err = <nil> 
DEBUG Purging asset "Metadata" from disk           
DEBUG Purging asset "Master Ignition Customization Check" from disk 
DEBUG Purging asset "Worker Ignition Customization Check" from disk 
DEBUG Purging asset "Terraform Variables" from disk 
DEBUG Purging asset "Kubeconfig Admin Client" from disk 
DEBUG Purging asset "Kubeadmin Password" from disk 
DEBUG Purging asset "Certificate (journal-gatewayd)" from disk 
DEBUG Purging asset "Cluster" from disk            
INFO Time elapsed: 2m34s                          


What did you expect to happen?

Expected destroy to work the first time, and not have to run it a second time.

How to reproduce it (as minimally and precisely as possible)?

$ ./openshift-install destroy cluster --log-level=debug --dir=test
$ ./openshift-install destroy cluster --log-level=debug --dir=test

Comment 3 Shiftzilla 2023-03-09 01:21:59 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9330