Bug 2090359

Summary: Nutanix mapi-controller: misleading error message when the failure is caused by wrong credentials
Product: OpenShift Container Platform Reporter: Yanhua Li <yanhli>
Component: Cloud ComputeAssignee: Yanhua Li <yanhli>
Cloud Compute sub component: Other Providers QA Contact: Huali Liu <huliu>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low    
Version: 4.11   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:14:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yanhua Li 2022-05-25 15:27:20 UTC
Description of problem:
Reported by Richard Vanderpool on 5/6/2022:
"I am running an install from 4.11.0-0.nightly-2022-05-06-060226 in the CI LTS environment and the compute nodes are failing to create with Message: Failed to check if machine exists: Error when finding VM by name rvanderp5-dev-w2mdn-worker-5mlnl. error: json: cannot unmarshal string into Go struct field MessageResource.message_list.details of type map[string]interface {}{}edit:
 
This was due to an auth error, we need to make the error message more useful in the machine resource."

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-05-06-060226

How reproducible:
When the Prism Central credentials in the nutanix-credentials secret data is wrong.

Steps to Reproduce:
1.
2.
3.

Actual results:
The Nutanix mapi-machine-controller failed to create machine vms with the misleading error log: 
error: json: cannot unmarshal string into Go struct field MessageResource.message_list.details of type map[string]interface {}{}edit:

Expected results:
The Nutanix mapi-machine-controller failed to create machine vms with the meaningful error log of the cause is an authentication error.

Additional info:

Comment 1 Yanhua Li 2022-05-26 20:26:02 UTC
The root cause of the issue is in the prism-go-client library (https://github.com/nutanix-cloud-native/prism-go-client) error-handling code. It failed to unmarshal the prism-api call error response message to a ErrorResponse structure with the wrong type assumption.

The bug is fixed with the version ithub.com/nutanix-cloud-native/prism-go-client.0-20220511213441-cc121d3d3c27

Comment 3 Huali Liu 2022-06-01 10:12:05 UTC
Reproduce the issue on 4.11.0-0.nightly-2022-05-25-193227, waiting for latest nightly build to verify the issue.

Steps:
1.Install a fresh cluster on nutanix, should success

liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-05-25-193227   True        False         12m     Cluster version is 4.11.0-0.nightly-2022-05-25-193227
liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.huliu-n10.qe.devcluster.openshift.com:6443".
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                           PHASE     TYPE   REGION   ZONE   AGE
huliu-n10-brxw2-master-0       Running                          37m
huliu-n10-brxw2-master-1       Running                          37m
huliu-n10-brxw2-master-2       Running                          37m
huliu-n10-brxw2-worker-575dx   Running                          33m
huliu-n10-brxw2-worker-wxc97   Running                          33m

2.Change nutanix-credentials secret to a wrong password

liuhuali@Lius-MacBook-Pro huali-test % oc edit secret nutanix-credentials                                     
secret/nutanix-credentials edited

3.Create a new machineset

liuhuali@Lius-MacBook-Pro huali-test % oc create -f msnutanix.yaml 
machineset.machine.openshift.io/huliu-n10-brxw2-1 created
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                           PHASE     TYPE   REGION   ZONE   AGE
huliu-n10-brxw2-1-slkv5                                         4s
huliu-n10-brxw2-master-0       Running                          75m
huliu-n10-brxw2-master-1       Running                          75m
huliu-n10-brxw2-master-2       Running                          75m
huliu-n10-brxw2-worker-575dx   Running                          70m
huliu-n10-brxw2-worker-wxc97   Running                          70m
liuhuali@Lius-MacBook-Pro huali-test % oc describe machine huliu-n10-brxw2-1-slkv5 
...
Status:
  Conditions:
    Last Transition Time:  2022-06-01T09:30:32Z
    Status:                True
    Type:                  Drainable
    Last Transition Time:  2022-06-01T09:30:32Z
    Message:               Failed to check if machine exists: Error when finding VM by name huliu-n10-brxw2-1-slkv5. error: json: cannot unmarshal string into Go struct field MessageResource.message_list.details of type map[string]interface {}
    Reason:                ErrorCheckingProvider
    Status:                Unknown
    Type:                  InstanceExists
    Last Transition Time:  2022-06-01T09:30:32Z
    Status:                True
    Type:                  Terminable
  Last Updated:            2022-06-01T09:30:32Z
...

Comment 5 Huali Liu 2022-06-06 03:14:46 UTC
Verified on 4.11.0-0.nightly-2022-06-04-014713

1.Install a fresh cluster on nutanix, should success

liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-04-014713   True        False         63m     Cluster version is 4.11.0-0.nightly-2022-06-04-014713
liuhuali@Lius-MacBook-Pro huali-test % oc get machine       
NAME                           PHASE     TYPE   REGION   ZONE   AGE
huliu-n11-kcjhx-master-0       Running                          87m
huliu-n11-kcjhx-master-1       Running                          87m
huliu-n11-kcjhx-master-2       Running                          87m
huliu-n11-kcjhx-worker-bwbjf   Running                          84m
huliu-n11-kcjhx-worker-gvsfp   Running                          84m

2.Change nutanix-credentials secret to a wrong password

liuhuali@Lius-MacBook-Pro huali-test % oc edit secret nutanix-credentials
secret/nutanix-credentials edited

3.Create a new machineset

liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml 
machineset.machine.openshift.io/huliu-n11-kcjhx-1 created
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                           PHASE     TYPE   REGION   ZONE   AGE
huliu-n11-kcjhx-1-45fck                                         61s
huliu-n11-kcjhx-master-0       Running                          90m
huliu-n11-kcjhx-master-1       Running                          90m
huliu-n11-kcjhx-master-2       Running                          90m
huliu-n11-kcjhx-worker-bwbjf   Running                          86m
huliu-n11-kcjhx-worker-gvsfp   Running                          86m
liuhuali@Lius-MacBook-Pro huali-test % oc describe machine huliu-n11-kcjhx-1-45fck 
...
Status:
  Conditions:
    Last Transition Time:  2022-06-06T02:57:27Z
    Status:                True
    Type:                  Drainable
    Last Transition Time:  2022-06-06T02:57:27Z
    Message:               Failed to check if machine exists: Error when finding VM by name huliu-n11-kcjhx-1-45fck. error: status: 401 UNAUTHORIZED, error-response: {
  "api_version": "3.1",
  "code": 401,
  "message_list": [
    {
      "details": "Basic realm=\"Intent Gateway Login Required\"",
      "message": "Authentication required.",
      "reason": "AUTHENTICATION_REQUIRED"
    }
  ],
  "state": "ERROR"
}
    Reason:                ErrorCheckingProvider
    Status:                Unknown
    Type:                  InstanceExists
    Last Transition Time:  2022-06-06T02:57:27Z
    Status:                True
    Type:                  Terminable
  Last Updated:            2022-06-06T02:57:27Z
  Phase:                   
Events:                    <none>

Comment 7 errata-xmlrpc 2022-08-10 11:14:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069