Bug 2090359 - Nutanix mapi-controller: misleading error message when the failure is caused by wrong credentials
Summary: Nutanix mapi-controller: misleading error message when the failure is caused ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.11
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.11.0
Assignee: Yanhua Li
QA Contact: Huali Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-25 15:27 UTC by Yanhua Li
Modified: 2022-08-10 11:14 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:14:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-api-provider-nutanix pull 15 0 None open Bug 2090359: Nutanix mapi-controller: misleading error message when the failure is caused by wrong credentials 2022-05-30 13:29:17 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:14:22 UTC

Description Yanhua Li 2022-05-25 15:27:20 UTC
Description of problem:
Reported by Richard Vanderpool on 5/6/2022:
"I am running an install from 4.11.0-0.nightly-2022-05-06-060226 in the CI LTS environment and the compute nodes are failing to create with Message: Failed to check if machine exists: Error when finding VM by name rvanderp5-dev-w2mdn-worker-5mlnl. error: json: cannot unmarshal string into Go struct field MessageResource.message_list.details of type map[string]interface {}{}edit:
 
This was due to an auth error, we need to make the error message more useful in the machine resource."

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-05-06-060226

How reproducible:
When the Prism Central credentials in the nutanix-credentials secret data is wrong.

Steps to Reproduce:
1.
2.
3.

Actual results:
The Nutanix mapi-machine-controller failed to create machine vms with the misleading error log: 
error: json: cannot unmarshal string into Go struct field MessageResource.message_list.details of type map[string]interface {}{}edit:

Expected results:
The Nutanix mapi-machine-controller failed to create machine vms with the meaningful error log of the cause is an authentication error.

Additional info:

Comment 1 Yanhua Li 2022-05-26 20:26:02 UTC
The root cause of the issue is in the prism-go-client library (https://github.com/nutanix-cloud-native/prism-go-client) error-handling code. It failed to unmarshal the prism-api call error response message to a ErrorResponse structure with the wrong type assumption.

The bug is fixed with the version ithub.com/nutanix-cloud-native/prism-go-client.0-20220511213441-cc121d3d3c27

Comment 3 Huali Liu 2022-06-01 10:12:05 UTC
Reproduce the issue on 4.11.0-0.nightly-2022-05-25-193227, waiting for latest nightly build to verify the issue.

Steps:
1.Install a fresh cluster on nutanix, should success

liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-05-25-193227   True        False         12m     Cluster version is 4.11.0-0.nightly-2022-05-25-193227
liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.huliu-n10.qe.devcluster.openshift.com:6443".
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                           PHASE     TYPE   REGION   ZONE   AGE
huliu-n10-brxw2-master-0       Running                          37m
huliu-n10-brxw2-master-1       Running                          37m
huliu-n10-brxw2-master-2       Running                          37m
huliu-n10-brxw2-worker-575dx   Running                          33m
huliu-n10-brxw2-worker-wxc97   Running                          33m

2.Change nutanix-credentials secret to a wrong password

liuhuali@Lius-MacBook-Pro huali-test % oc edit secret nutanix-credentials                                     
secret/nutanix-credentials edited

3.Create a new machineset

liuhuali@Lius-MacBook-Pro huali-test % oc create -f msnutanix.yaml 
machineset.machine.openshift.io/huliu-n10-brxw2-1 created
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                           PHASE     TYPE   REGION   ZONE   AGE
huliu-n10-brxw2-1-slkv5                                         4s
huliu-n10-brxw2-master-0       Running                          75m
huliu-n10-brxw2-master-1       Running                          75m
huliu-n10-brxw2-master-2       Running                          75m
huliu-n10-brxw2-worker-575dx   Running                          70m
huliu-n10-brxw2-worker-wxc97   Running                          70m
liuhuali@Lius-MacBook-Pro huali-test % oc describe machine huliu-n10-brxw2-1-slkv5 
...
Status:
  Conditions:
    Last Transition Time:  2022-06-01T09:30:32Z
    Status:                True
    Type:                  Drainable
    Last Transition Time:  2022-06-01T09:30:32Z
    Message:               Failed to check if machine exists: Error when finding VM by name huliu-n10-brxw2-1-slkv5. error: json: cannot unmarshal string into Go struct field MessageResource.message_list.details of type map[string]interface {}
    Reason:                ErrorCheckingProvider
    Status:                Unknown
    Type:                  InstanceExists
    Last Transition Time:  2022-06-01T09:30:32Z
    Status:                True
    Type:                  Terminable
  Last Updated:            2022-06-01T09:30:32Z
...

Comment 5 Huali Liu 2022-06-06 03:14:46 UTC
Verified on 4.11.0-0.nightly-2022-06-04-014713

1.Install a fresh cluster on nutanix, should success

liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-04-014713   True        False         63m     Cluster version is 4.11.0-0.nightly-2022-06-04-014713
liuhuali@Lius-MacBook-Pro huali-test % oc get machine       
NAME                           PHASE     TYPE   REGION   ZONE   AGE
huliu-n11-kcjhx-master-0       Running                          87m
huliu-n11-kcjhx-master-1       Running                          87m
huliu-n11-kcjhx-master-2       Running                          87m
huliu-n11-kcjhx-worker-bwbjf   Running                          84m
huliu-n11-kcjhx-worker-gvsfp   Running                          84m

2.Change nutanix-credentials secret to a wrong password

liuhuali@Lius-MacBook-Pro huali-test % oc edit secret nutanix-credentials
secret/nutanix-credentials edited

3.Create a new machineset

liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml 
machineset.machine.openshift.io/huliu-n11-kcjhx-1 created
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                           PHASE     TYPE   REGION   ZONE   AGE
huliu-n11-kcjhx-1-45fck                                         61s
huliu-n11-kcjhx-master-0       Running                          90m
huliu-n11-kcjhx-master-1       Running                          90m
huliu-n11-kcjhx-master-2       Running                          90m
huliu-n11-kcjhx-worker-bwbjf   Running                          86m
huliu-n11-kcjhx-worker-gvsfp   Running                          86m
liuhuali@Lius-MacBook-Pro huali-test % oc describe machine huliu-n11-kcjhx-1-45fck 
...
Status:
  Conditions:
    Last Transition Time:  2022-06-06T02:57:27Z
    Status:                True
    Type:                  Drainable
    Last Transition Time:  2022-06-06T02:57:27Z
    Message:               Failed to check if machine exists: Error when finding VM by name huliu-n11-kcjhx-1-45fck. error: status: 401 UNAUTHORIZED, error-response: {
  "api_version": "3.1",
  "code": 401,
  "message_list": [
    {
      "details": "Basic realm=\"Intent Gateway Login Required\"",
      "message": "Authentication required.",
      "reason": "AUTHENTICATION_REQUIRED"
    }
  ],
  "state": "ERROR"
}
    Reason:                ErrorCheckingProvider
    Status:                Unknown
    Type:                  InstanceExists
    Last Transition Time:  2022-06-06T02:57:27Z
    Status:                True
    Type:                  Terminable
  Last Updated:            2022-06-06T02:57:27Z
  Phase:                   
Events:                    <none>

Comment 7 errata-xmlrpc 2022-08-10 11:14:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.