Bug 1888378 - [IPI on Azure] errors destroying cluster when Azure resource group was never created
Summary: [IPI on Azure] errors destroying cluster when Azure resource group was never ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.7.0
Assignee: John Hixson
QA Contact: Etienne Simard
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-14 17:32 UTC by Joel Diaz
Modified: 2021-02-24 15:26 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: installer does not verify resource group exists before destroy Consequence: installer loops repeatedly with errors Fix: verify resource group exists before destroying cluster Result: installer successfully destroys cluster
Clone Of:
Environment:
Last Closed: 2021-02-24 15:26:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4320 0 None closed Bug 1888378: [Azure][Destroy] Check if resource group exists 2021-02-17 16:20:22 UTC
Github openshift installer pull 4325 0 None closed Bug 1888378: Ignore error if resource group already deleted 2021-02-17 16:20:22 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:26:40 UTC

Description Joel Diaz 2020-10-14 17:32:58 UTC
Version: OpenShift Installer 4.6.0-0.ci-2020-09-30-115254

$ openshift-install version
[jdiaz@minigoomba os-install-4.6ci-2020-09-30]$ ./openshift-install version
./openshift-install 4.6.0-0.ci-2020-09-30-115254
built from commit ed5dcf4bd22e857137ffcbf3e028e1fdbeb6593f
release image registry.svc.ci.openshift.org/ocp/release@sha256:9b78e23d1e71eefe170784303305bc961c6ccceee5c0b3bfe9cbf90ae59ca556


Platform:
azure

Please specify:
IPI

What happened?

Start an IPI Azure cluster installation. If it fails before the Azure resource group is made, trying the 'destroy' subcommand will loop endlessly due to the resource group being missing.


What did you expect to happen?

Since the install never made it far enough to even create the resource group, the 'destroy' command should just notice that it is already deleted/never created and move along.

How to reproduce it (as minimally and precisely as possible)?

$ ./openshift-installer create cluster --dir azure
$ a few seconds after all the interview questions are answered, abort the install (so that the resource group doesn't get created, but the metadata.json file is created)
$ ./openshift-installer destroy cluster --dir azure --log-level=debug

Anything else we need to know?

Here's an example where the metadata.json file has valid data, followed by the failed cluster destroy:

[jdiaz@minigoomba os-install-4.6ci-2020-09-30]$ cat az/metadata.json 
{"clusterName":"jdtest","clusterID":"2c4b38fd-d727-45c6-b813-0dbf109f4f74","infraID":"jdtest-7b8cq","azure":{"cloudName":"AzurePublicCloud","region":"centralus","resourceGroupName":""}}

[jdiaz@minigoomba os-install-4.6ci-2020-09-30]$ ./openshift-install destroy cluster --dir az --log-level=debug
DEBUG OpenShift Installer 4.6.0-0.ci-2020-09-30-115254 
DEBUG Built from commit ed5dcf4bd22e857137ffcbf3e028e1fdbeb6593f 
INFO Credentials loaded from file "/home/jdiaz/.azure/osServicePrincipal.json" 
DEBUG deleting public records                      
DEBUG [failed to list dns zone: dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceGroupNotFound" Message="Resource group 'jdtest-7b8cq-rg' could not be found.", failed to list private dns zone: privatedns.PrivateZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceGroupNotFound" Message="Resource group 'jdtest-7b8cq-rg' could not be found."] 
DEBUG deleting public records                      
DEBUG [failed to list dns zone: dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceGroupNotFound" Message="Resource group 'jdtest-7b8cq-rg' could not be found.", failed to list private dns zone: privatedns.PrivateZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceGroupNotFound" Message="Resource group 'jdtest-7b8cq-rg' could not be found."] 
DEBUG deleting public records                      
DEBUG [failed to list dns zone: dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceGroupNotFound" Message="Resource group 'jdtest-7b8cq-rg' could not be found.", failed to list private dns zone: privatedns.PrivateZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceGroupNotFound" Message="Resource group 'jdtest-7b8cq-rg' could not be found."] 
^C
[jdiaz@minigoomba os-install-4.6ci-2020-09-30]$

Comment 3 Etienne Simard 2020-11-09 19:42:18 UTC
Verified with:

./openshift-install 4.7.0-0.nightly-2020-11-09-190845
built from commit 519f21bfada103752bb78929f170580d505b92c0
release image registry.svc.ci.openshift.org/ocp/release@sha256:f7bf10489f11c05927e901310ad7d824599b385f69f12c2de61cf8c437c85fc4

./openshift-install destroy cluster --dir ./testdestroy --log-level debug
DEBUG OpenShift Installer 4.7.0-0.nightly-2020-11-09-190845 
DEBUG Built from commit 519f21bfada103752bb78929f170580d505b92c0 
INFO Credentials loaded from file "/home/esimard/.azure/osServicePrincipal.json" 
DEBUG deleting public records                      
DEBUG already deleted                              
DEBUG deleting resource group                      
DEBUG already deleted                               resource group=estestdestroy01-fjfts-rg
DEBUG deleting application registrations           
DEBUG Purging asset "Metadata" from disk           
DEBUG Purging asset "Terraform Variables" from disk 
DEBUG Purging asset "Kubeconfig Admin Client" from disk 
DEBUG Purging asset "Kubeadmin Password" from disk 
DEBUG Purging asset "Certificate (journal-gatewayd)" from disk 
INFO Time elapsed: 2s

Comment 6 errata-xmlrpc 2021-02-24 15:26:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.