Bug 1840705

Summary: Unclear error when Azure credentials have expired or are invalid
Product: OpenShift Container Platform Reporter: Dan Mace <dmace>
Component: InstallerAssignee: Abhinav Dahiya <adahiya>
Installer sub component: openshift-installer QA Contact: Etienne Simard <esimard>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: high CC: adahiya, bleanhar, chuffman, dgrigore, esimard, kgarriso, sdodson, wking
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:01:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Mace 2020-05-27 13:05:21 UTC
Description of problem:

All Azure jobs in CI and periodics appear to be failing since 5/23.


Run template e2e-azure - e2e-azure container setup expand_less	1s
Installing from release registry.svc.ci.openshift.org/ci-op-f0y2nnjm/release@sha256:cb584c61ad66cf4643c4cb37bf7efac69a2ccc4d93db971c74a2d8e41f71112c
Azure region: centralus
level=info msg="Credentials loaded from file \"/etc/openshift-installer/osServicePrincipal.json\""
level=fatal msg="failed to fetch Master Machines: failed to load asset \"Install Config\": platform.azure.region: Invalid value: \"centralus\": failed to retrieve available regions"


https://search.apps.build01.ci.devcluster.openshift.com/chart?search=failed+to+retrieve+available+regions&maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job


https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-informing#release-openshift-ocp-installer-e2e-azure-4.5



Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Christian Huffman 2020-05-27 18:49:22 UTC
*** Bug 1840852 has been marked as a duplicate of this bug. ***

Comment 3 John Hixson 2020-05-28 18:29:13 UTC
The issue ended up being expired credentials and it has been addressed. I'm changing the target release and priority of this bug and will use it to track progress on updating the error message to actually print out the problem.

Comment 9 John Hixson 2020-07-10 03:45:30 UTC
Since this is low priority, I didn't get to it this sprint. I'll update when there is time to work on this.

Comment 13 Etienne Simard 2020-08-04 20:55:52 UTC
Verified with:

./openshift-install 4.6.0-0.nightly-2020-08-01-172303
built from commit 7a5af8cddbd04a7c6af6006696141d8afe2fb027
release image registry.svc.ci.openshift.org/ocp/release@sha256:6d4b31af9959b02b8589bb4b804812c436f38a9726827fa5e5a0ea66d6d79cf4


Reproduction steps:

1) Generate a working install-config.yaml from a current Service Principal
2) Configure your osServicePrincipal.json to use an EXPIRED Service Principal
3) Try to install a cluster using the install-config.yaml generated in step 1

~~~
./openshift-install create cluster --dir ./install_config_folder
INFO Credentials loaded from file "/home/openshift-qe/.azure/osServicePrincipal.json" 
FATAL failed to fetch Metadata: failed to load asset "Install Config": platform.azure.region: Internal error: failed to retrieve available regions: failed to list locations: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/$SUBID/locations?api-version=2019-06-01: StatusCode=401 -- Original Error: adal: Refresh request failed. Status Code = '401'. Response body: {"error":"invalid_client","error_description":"AADSTS7000222: The provided client secret keys are expired. Visit the Azure Portal to create new keys for your app, or consider using certificate credentials for added security: https://docs.microsoft.com/azure/active-directory/develop/active-directory-certificate-credentials\r\nTrace ID: 3416e35d-5586-4646-ae28-d8a5c8ee3e00\r\nCorrelation ID: 4b8affac-99de-4536-8505-0f269b69d15f\r\nTimestamp: 2020-08-04 20:50:05Z","error_codes":[7000222],"timestamp":"2020-08-04 20:50:05Z","trace_id":"3416e35d-5586-4646-ae28-d8a5c8ee3e00","correlation_id":"4b8affac-99de-4536-8505-0f269b69d15f","error_uri":"https://login.microsoftonline.com/error?code=7000222"}
~~~

Comment 15 errata-xmlrpc 2020-10-27 16:01:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196