Bug 1866925 - openshift-install destroy cluster should fail quickly when provided with invalid credentials on Azure.
Summary: openshift-install destroy cluster should fail quickly when provided with inva...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: All
OS: All
high
medium
Target Milestone: ---
: 4.7.0
Assignee: Patrick Dillon
QA Contact: Mike Gahagan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-06 20:20 UTC by Mike Gahagan
Modified: 2021-03-16 16:45 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: Attempt to delete a cluster in Azure with invalid credentials will appear to be successful if debug logs are not enabled. A customer may encounter this situation if their service principal expires before attempting to destroy the cluster. Consequence: openshift-install destroy cluster --dir=/cluster/dir will appear to succeed although it fails to actually delete the cluster and will also delete the locally stored cluster metadata making the cluster not removable by subsequent runs of openshift-install destroy cluster. Workaround (if any): Create a backup of the cluster metadata prior to attempting to delete it. Openshift-install can then use the backed up contents to remove the cluster once the invalid credentials are corrected. Result: Manual intervention may be required if any attempt to delete a cluster is made with expired or invalid credentials.
Clone Of:
Environment:
Last Closed: 2021-02-24 15:15:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4025 0 None closed bug 1866925: pkg/destroy/azure: fail fast if unable to list resources for any reason 2021-02-19 16:23:18 UTC
Github openshift installer pull 4331 0 None closed Bug 1866925: display Azure destroy auth error 2021-02-19 16:23:18 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:15:49 UTC

Comment 3 Mike Gahagan 2020-10-08 15:40:10 UTC
There seems to be 2 issues with this fix in 4.6.0-0.nightly-2020-10-07-093101:

1.) if you do not run the cluster deletion command with debug level logs enabled the error regarding invalid credentials is never displayed
2.) There is a much more serious issue that occurs where the deletion appears to succeed and there is no indication of the bad credentials (unless debug logs are used) and further not only does the installer not delete the cluster (it can't due to bad credentials) it removes the locally stored metadata which makes removal of the cluster resources a manual process. 

Attempt to delete with invalid credentials:
[m@localhost 46-azure-install]$ ./openshift-install destroy cluster --dir clusters/mgahagan-090810 --log-level=debug
DEBUG OpenShift Installer 4.6.0-0.nightly-2020-10-07-093101 
DEBUG Built from commit bff124c9941762d2532490774b3f910241bd63f6 
INFO Credentials loaded from file "/home/m/.azure/osServicePrincipal.json" 
DEBUG deleting public records                      
DEBUG azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/mgahagan-090810-n4vnb-rg/providers/Microsoft.Network/dnsZones?%24top=100&api-version=2018-03-01-preview: StatusCode=401 -- Original Error: adal: Refresh request failed. Status Code = '401'. Response body: {"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret is provided.\r\nTrace ID: b7bfa8d2-56c3-4ea0-be5f-71dea28f7b00\r\nCorrelation ID: cb123619-b0f5-4756-945d-e1c25c0e334f\r\nTimestamp: 2020-10-08 14:58:00Z","error_codes":[7000215],"timestamp":"2020-10-08 14:58:00Z","trace_id":"b7bfa8d2-56c3-4ea0-be5f-71dea28f7b00","correlation_id":"cb123619-b0f5-4756-945d-e1c25c0e334f","error_uri":"https://login.microsoftonline.com/error?code=7000215"} 
DEBUG deleting resource group                      
DEBUG azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourcegroups/mgahagan-090810-n4vnb-rg?api-version=2018-05-01: StatusCode=401 -- Original Error: adal: Refresh request failed. Status Code = '401'. Response body: {"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret is provided.\r\nTrace ID: e119e1e8-c0fd-4b47-989d-4697d9527600\r\nCorrelation ID: 4ffe51d2-cdc0-4b74-aca3-210ba03230ca\r\nTimestamp: 2020-10-08 14:58:00Z","error_codes":[7000215],"timestamp":"2020-10-08 14:58:00Z","trace_id":"e119e1e8-c0fd-4b47-989d-4697d9527600","correlation_id":"4ffe51d2-cdc0-4b74-aca3-210ba03230ca","error_uri":"https://login.microsoftonline.com/error?code=7000215"} 
DEBUG deleting application registrations           
DEBUG Purging asset "Metadata" from disk           
DEBUG Purging asset "Terraform Variables" from disk 
DEBUG Purging asset "Kubeconfig Admin Client" from disk 
DEBUG Purging asset "Kubeadmin Password" from disk 
DEBUG Purging asset "Certificate (journal-gatewayd)" from disk 
DEBUG Purging asset "Cluster" from disk            
INFO Time elapsed: 1s

If --log-level=debug is not specified then the user only sees the INFO lines which may mislead them to believe the cluster was actually removed. 

Attempt to delete after credentials are corrected:

[m@localhost 46-azure-install]$ ./openshift-install destroy cluster --dir clusters/mgahagan-090810 --log-level=debug
DEBUG OpenShift Installer 4.6.0-0.nightly-2020-10-07-093101 
DEBUG Built from commit bff124c9941762d2532490774b3f910241bd63f6 
FATAL Failed while preparing to destroy cluster: open clusters/mgahagan-090810/metadata.json: no such file or directory 

Successful attempt to delete cluster with correct credentials and the cluster installation directory restored from backup:

[m@localhost 46-azure-install]$ ./openshift-install destroy cluster --dir clusters/mgahagan-090810.bak
INFO Credentials loaded from file "/home/m/.azure/osServicePrincipal.json" 
INFO deleted                                       record=api.mgahagan-090810
INFO deleted                                       record="*.apps.mgahagan-090810"
INFO deleted                                       resource group=mgahagan-090810-n4vnb-rg
INFO deleted                                       appID=fbecc025-88e5-4447-9921-5ab1b0a9962c
INFO deleted                                       appID=777cfa32-c77a-4f60-bcd1-cee8abb877cc
INFO deleted                                       appID=285a4b7f-52f5-45f5-8611-17634640df7d
INFO Time elapsed: 17m0s

Comment 4 Scott Dodson 2020-10-08 16:04:12 UTC
This is not a 4.6.0 blocker, moving to 4.7.0.

Comment 8 Mike Gahagan 2020-11-05 19:40:07 UTC
Confirmed install fails quickly when trying to delete a cluster with invalid credentials with 4.7.0-0.nightly-2020-11-05-140313

[m@localhost 47_azure-install]$ ./openshift-install destroy cluster --dir clusters/mgahagan-120511 
INFO Credentials loaded from file "/home/m/.azure/osServicePrincipal.json" 
FATAL Failed to destroy cluster: [unable to authenticate when deleting public DNS records: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to
<subscription and other information redacted>

Comment 11 errata-xmlrpc 2021-02-24 15:15:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.