Bug 2061947 - IBM Cloud: Uninstall does not succeed when there is nothing to clean up
Summary: IBM Cloud: Uninstall does not succeed when there is nothing to clean up
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.10
Hardware: x86_64
OS: Unspecified
low
low
Target Milestone: ---
: 4.12.0
Assignee: OCP Installer
QA Contact: MayXu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-08 18:44 UTC by Andrew Butcher
Modified: 2023-01-17 19:48 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-17 19:47:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 6013 0 None open Bug 2061947: IBMCloud: Skip DNS Record delete 2022-06-15 19:44:07 UTC
Github openshift installer pull 6152 0 None open Bug 2061947: IBMCloud: Handle missing RG 2022-07-21 15:20:47 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:48:02 UTC

Description Andrew Butcher 2022-03-08 18:44:25 UTC
Version:

quay.io/openshift-release-dev/ocp-release:4.10.0-x86_64

Platform:

IBM Cloud

What happened?

My install failed early during installation (due to a gateway timeout) before any resources were created in IBM Cloud.


time="2022-03-01T18:29:22Z" level=fatal msg="failed to fetch Terraform Variables: failed to fetch dependency of \"Terraform Variables\": failed to generate asset \"Platform Provisioning Check\": baseDomain: Internal error: failed to get cis instance: Gateway Timeout"


Hive attempts to clean up by running uninstall with the ClusterID/Metadata of the failed install including the default ResourceGroupName which is the ClusterID for the cluster (eg. abutcher-lj7bf).

Uninstall crashes because the ResourceGroup cannot be found and does not exist.


time="2022-03-01T18:38:24Z" level=debug msg="Listing virtual service instances"
time="2022-03-01T18:38:24Z" level=debug msg="Listing virtual service instances"
time="2022-03-01T18:38:24Z" level=debug msg="Listing load balancers"
time="2022-03-01T18:38:34Z" level=debug msg="Listing subnets"
time="2022-03-01T18:38:34Z" level=debug msg="Listing images"
time="2022-03-01T18:38:34Z" level=debug msg="Listing public gateways"
time="2022-03-01T18:38:34Z" level=info msg="Skipping deletion of security groups with generated VPC"
time="2022-03-01T18:38:35Z" level=debug msg="Listing floating IPs"
time="2022-03-01T18:38:36Z" level=debug msg="Listing dedicated hosts"
time="2022-03-01T18:38:36Z" level=debug msg="Listing VPCs"
E0301 18:38:37.741094       1 runtime.go:78] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)
goroutine 63 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x537e2e0, 0xc000140168})
        k8s.io/apimachinery.3/pkg/util/runtime/runtime.go:74 +0x7d
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00007a240})
        k8s.io/apimachinery.3/pkg/util/runtime/runtime.go:48 +0x75
panic({0x537e2e0, 0xc000140168})
        runtime/panic.go:1038 +0x215
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).ResourceGroupID(0xc0005986c0)
        github.com/openshift/installer.0-master.0.20220118155007-ad535d3fdbf4/pkg/destroy/ibmcloud/ibmcloud.go:298 +0x388
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).listDedicatedHosts(0xc0005986c0)
        github.com/openshift/installer.0-master.0.20220118155007-ad535d3fdbf4/pkg/destroy/ibmcloud/dedicatedhost.go:24 +0xc5
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyDedicatedHosts(0xc0005986c0)
        github.com/openshift/installer.0-master.0.20220118155007-ad535d3fdbf4/pkg/destroy/ibmcloud/dedicatedhost.go:174 +0x36
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction.func1()
        github.com/openshift/installer.0-master.0.20220118155007-ad535d3fdbf4/pkg/destroy/ibmcloud/ibmcloud.go:159 +0x3f
k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x7f44b41a5980, 0x0})
        k8s.io/apimachinery.3/pkg/util/wait/wait.go:220 +0x1b
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x5cdd610, 0xc000052040}, 0xc00006fe90)
        k8s.io/apimachinery.3/pkg/util/wait/wait.go:233 +0x7c
k8s.io/apimachinery/pkg/util/wait.poll({0x5cdd610, 0xc000052040}, 0xd0, 0x20e9225, 0x30)
        k8s.io/apimachinery.3/pkg/util/wait/wait.go:580 +0x38
k8s.io/apimachinery/pkg/util/wait.PollImmediateInfiniteWithContext({0x5cdd610, 0xc000052040}, 0x1aee987, 0x28)
        k8s.io/apimachinery.3/pkg/util/wait/wait.go:566 +0x49
k8s.io/apimachinery/pkg/util/wait.PollImmediateInfinite(0x57c63e0, 0x0)
        k8s.io/apimachinery.3/pkg/util/wait/wait.go:555 +0x46
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction(0xc0005986c0, {{0x56365c4, 0xc00006ffd0}, 0xc000cfaf60}, 0xc000e314a0, 0xc000e314a0)
        github.com/openshift/installer.0-master.0.20220118155007-ad535d3fdbf4/pkg/destroy/ibmcloud/ibmcloud.go:156 +0x108
created by github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyCluster
        github.com/openshift/installer.0-master.0.20220118155007-ad535d3fdbf4/pkg/destroy/ibmcloud/ibmcloud.go:130 +0xae5
panic: runtime error: index out of range [0] with length 0 [recovered]
        panic: runtime error: index out of range [0] with length 0


What did you expect to happen?

Uninstall succeeds because there is nothing to clean up.


How to reproduce it (as minimally and precisely as possible)?

I hit this running uninstall normally with generated metadata but since the original failure was a gateway timeout we have to simulate the uninstall failure by providing a metadata.json to openshift-install destroy. 

Create a metadata.json file with a "resourceGroupName" that doesn't correspond to an existing ResourceGroup, place within a directory and run openshift-install destroy --dir=<dir with metadata.json>

I was able to reproduce the crash providing the following metadata.json file with valid+existing accountID, cisInstanceCRN and baseDomain.

{"clusterName":"abutcher-test","clusterID":"12345","infraID":"abutcher-test-czpjs","ibmcloud":{"accountID":"VALID_ACCOUNT_ID","baseDomain":"VALID_BASE_DOMAIN","cisInstanceCRN":"VALID_CIS_INSTANCE_CRN_FOR_BASEDOMAIN","region":"us-south","resourceGroupName":"abutcher-test-czpjs"}}

Anything else we need to know?

I'm a developer for the Hive team and encountered this issue testing hive.

Comment 1 Pedro Amoedo 2022-03-10 14:39:15 UTC
Thanks Andrew, I'll notify IBM devs to review this BZ ASAP.

Comment 3 MayXu 2022-07-13 07:24:53 UTC
checked with 
/home/fedora/n410/openshift-install 4.12.0-0.nightly-2022-07-13-062839
built from commit 09e92dc201d741615420eb004cd8021b44a25f67
release image registry.ci.openshift.org/ocp/release@sha256:63bc1950bb6e14a817d7b9415dde32dfd1a995a8aa07f5e5b6e7bbff2aae5bcf
release architecture amd64

copyed the metadata.json to Folder, 'openshift-install destroy cluster --dir ${1} --log-level debug' 
the resource group is not existed. 

cat metadata.json: 
{"clusterName":"logci4123","clusterID":"44932a36-054d-41a4-8d5a-857a424a00c6","infraID":"logci4123-5z7j9","ibmcloud":{"accountID":"fdc2e14cf8bc4d53a67f972dc2e2c861","baseDomain":"ibmcloud.qe.devcluster.openshift.com","cisInstanceCRN":"crn:v1:bluemix:public:internet-svcs:global:a/fdc2e14cf8bc4d53a67f972dc2e2c861:e8ee6ca1-4b31-4307-8190-e67f6925f83b::","region":"us-east","resourceGroupName":"logci4123-5z7j9"}}

destroy cluster failed. with the following output: 

DEBUG OpenShift Installer 4.12.0-0.nightly-2022-07-13-062839 
DEBUG Built from commit 09e92dc201d741615420eb004cd8021b44a25f67 
DEBUG Listing virtual service instances            
DEBUG Listing virtual service instances            
INFO Listing disks                                
DEBUG All disks fetched                            
DEBUG Listing load balancers                       
DEBUG Listing subnets                              
DEBUG Listing public gateways                      
INFO Skipping deletion of security groups with generated VPC 
DEBUG Listing images                               
DEBUG Listing floating IPs                         
DEBUG Listing dedicated hosts                      
DEBUG Listing VPCs
DEBUG Listing VPCs                                 
E0713 07:10:19.870835  341712 runtime.go:78] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)
goroutine 115 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x41d8f80?, 0xc000ec27b0})
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x86
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00010c240?})
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x75
panic({0x41d8f80, 0xc000ec27b0})
	/usr/lib/golang/src/runtime/panic.go:838 +0x207
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).ResourceGroupID(0xc000bc2360)
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:343 +0x388
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).listDedicatedHosts(0xc000bc2360)
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/dedicatedhost.go:24 +0xc5
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyDedicatedHosts(0xc000bc2360)
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/dedicatedhost.go:174 +0x36
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction.func1()
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:186 +0x3f
k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x18, 0xc000484800})
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:220 +0x1b
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x19f0ba28?, 0xc00012a000?}, 0xc00069f690?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:233 +0x57
k8s.io/apimachinery/pkg/util/wait.poll({0x19f0ba28, 0xc00012a000}, 0xd0?, 0x1108625?, 0x30?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:580 +0x38
k8s.io/apimachinery/pkg/util/wait.PollImmediateInfiniteWithContext({0x19f0ba28, 0xc00012a000}, 0x40d687?, 0x28?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:566 +0x49
k8s.io/apimachinery/pkg/util/wait.PollImmediateInfinite(0x0?, 0x0?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:555 +0x46
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction(0xc000bc2360, {{0x46ac765?, 0xc00069f7d0?}, 0xc00090af10?}, 0xc0008fe660?, 0xc0008fe660?)
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:183 +0x108
created by github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyCluster
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:157 +0xb3b
panic: runtime error: index out of range [0] with length 0 [recovered]
	panic: runtime error: index out of range [0] with length 0

goroutine 115 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00010c240?})
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0xd8
panic({0x41d8f80, 0xc000ec27b0})
	/usr/lib/golang/src/runtime/panic.go:838 +0x207
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).ResourceGroupID(0xc000bc2360)
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:343 +0x388
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).listDedicatedHosts(0xc000bc2360)
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/dedicatedhost.go:24 +0xc5
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyDedicatedHosts(0xc000bc2360)
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/dedicatedhost.go:174 +0x36
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction.func1()
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:186 +0x3f
k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x18, 0xc000484800})
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:220 +0x1b
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x19f0ba28?, 0xc00012a000?}, 0xc00069f690?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:233 +0x57
k8s.io/apimachinery/pkg/util/wait.poll({0x19f0ba28, 0xc00012a000}, 0xd0?, 0x1108625?, 0x30?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:580 +0x38
k8s.io/apimachinery/pkg/util/wait.PollImmediateInfiniteWithContext({0x19f0ba28, 0xc00012a000}, 0x40d687?, 0x28?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:566 +0x49
k8s.io/apimachinery/pkg/util/wait.PollImmediateInfinite(0x0?, 0x0?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:555 +0x46
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction(0xc000bc2360, {{0x46ac765?, 0xc00069f7d0?}, 0xc00090af10?}, 0xc0008fe660?, 0xc0008fe660?)
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:183 +0x108
created by github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyCluster
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:157 +0xb3b

Comment 4 Christopher J Schaefer 2022-07-21 15:31:51 UTC
I performed some additional investigation and testing and came up with the following PR
https://github.com/openshift/installer/pull/6152

In cases where the metadata.json contains an incorrect/non-existing ResourceGroupName, or an empty string, the installer will exit safely with a failure, noted below for example.


# bin/openshift-install destroy cluster --dir bz_2061947/bz2061947-no-rg-2
FATAL Failed to destroy cluster: No ResourceGroupName provided

# bin/openshift-install destroy cluster --dir bz_2061947/bz2061947-no-rg-2
FATAL Failed to destroy cluster: ResourceGroup '"bz2061947-no-rg-not-rg-tqcmc"' not found



Rather than return successfully, which will end up removing the metadata.json, a failure was desired better.

This will prevent errors on the user side, where perhaps they have the wrong IC_API_KEY, or account setup, and the installer not finding the ResourceGroup (which exists in another account or accessible via another IC_API_KEY) from returning successfully without destroying and then removing the metadata.json file, making it more difficult to reconstruct and perform the destroy on the proper account/etc.

Comment 7 MayXu 2022-08-24 05:05:41 UTC
tested with ./openshift-install 4.12.0-0.ci-2022-08-23-112842 
DEBUG Built from commit f84ce649a1e8cba455fb2411ca9abc00050a1e01 

1. try to destroy the methodata.json which the resource group is not existed, can exit with failure, 

$timeout 20m ./openshift-install destroy cluster --dir ${1} --log-level debug
DEBUG OpenShift Installer 4.12.0-0.ci-2022-08-23-112842 
DEBUG Built from commit f84ce649a1e8cba455fb2411ca9abc00050a1e01 
FATAL Failed to destroy cluster: ResourceGroup '"rioliu-20423-khqx8"' not found 

2. in the metadata.json the resource group is existed, no resource in it, destroy succeed
and the resourc group is deleted.

Comment 11 errata-xmlrpc 2023-01-17 19:47:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.