Description of problem: Reported by Richard Vanderpool on 5/6/2022: "I am running an install from 4.11.0-0.nightly-2022-05-06-060226 in the CI LTS environment and the compute nodes are failing to create with Message: Failed to check if machine exists: Error when finding VM by name rvanderp5-dev-w2mdn-worker-5mlnl. error: json: cannot unmarshal string into Go struct field MessageResource.message_list.details of type map[string]interface {}{}edit: This was due to an auth error, we need to make the error message more useful in the machine resource." Version-Release number of selected component (if applicable): 4.11.0-0.nightly-2022-05-06-060226 How reproducible: When the Prism Central credentials in the nutanix-credentials secret data is wrong. Steps to Reproduce: 1. 2. 3. Actual results: The Nutanix mapi-machine-controller failed to create machine vms with the misleading error log: error: json: cannot unmarshal string into Go struct field MessageResource.message_list.details of type map[string]interface {}{}edit: Expected results: The Nutanix mapi-machine-controller failed to create machine vms with the meaningful error log of the cause is an authentication error. Additional info:
The root cause of the issue is in the prism-go-client library (https://github.com/nutanix-cloud-native/prism-go-client) error-handling code. It failed to unmarshal the prism-api call error response message to a ErrorResponse structure with the wrong type assumption. The bug is fixed with the version ithub.com/nutanix-cloud-native/prism-go-client.0-20220511213441-cc121d3d3c27
Reproduce the issue on 4.11.0-0.nightly-2022-05-25-193227, waiting for latest nightly build to verify the issue. Steps: 1.Install a fresh cluster on nutanix, should success liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-05-25-193227 True False 12m Cluster version is 4.11.0-0.nightly-2022-05-25-193227 liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api Now using project "openshift-machine-api" on server "https://api.huliu-n10.qe.devcluster.openshift.com:6443". liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-n10-brxw2-master-0 Running 37m huliu-n10-brxw2-master-1 Running 37m huliu-n10-brxw2-master-2 Running 37m huliu-n10-brxw2-worker-575dx Running 33m huliu-n10-brxw2-worker-wxc97 Running 33m 2.Change nutanix-credentials secret to a wrong password liuhuali@Lius-MacBook-Pro huali-test % oc edit secret nutanix-credentials secret/nutanix-credentials edited 3.Create a new machineset liuhuali@Lius-MacBook-Pro huali-test % oc create -f msnutanix.yaml machineset.machine.openshift.io/huliu-n10-brxw2-1 created liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-n10-brxw2-1-slkv5 4s huliu-n10-brxw2-master-0 Running 75m huliu-n10-brxw2-master-1 Running 75m huliu-n10-brxw2-master-2 Running 75m huliu-n10-brxw2-worker-575dx Running 70m huliu-n10-brxw2-worker-wxc97 Running 70m liuhuali@Lius-MacBook-Pro huali-test % oc describe machine huliu-n10-brxw2-1-slkv5 ... Status: Conditions: Last Transition Time: 2022-06-01T09:30:32Z Status: True Type: Drainable Last Transition Time: 2022-06-01T09:30:32Z Message: Failed to check if machine exists: Error when finding VM by name huliu-n10-brxw2-1-slkv5. error: json: cannot unmarshal string into Go struct field MessageResource.message_list.details of type map[string]interface {} Reason: ErrorCheckingProvider Status: Unknown Type: InstanceExists Last Transition Time: 2022-06-01T09:30:32Z Status: True Type: Terminable Last Updated: 2022-06-01T09:30:32Z ...
Verified on 4.11.0-0.nightly-2022-06-04-014713 1.Install a fresh cluster on nutanix, should success liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-04-014713 True False 63m Cluster version is 4.11.0-0.nightly-2022-06-04-014713 liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-n11-kcjhx-master-0 Running 87m huliu-n11-kcjhx-master-1 Running 87m huliu-n11-kcjhx-master-2 Running 87m huliu-n11-kcjhx-worker-bwbjf Running 84m huliu-n11-kcjhx-worker-gvsfp Running 84m 2.Change nutanix-credentials secret to a wrong password liuhuali@Lius-MacBook-Pro huali-test % oc edit secret nutanix-credentials secret/nutanix-credentials edited 3.Create a new machineset liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml machineset.machine.openshift.io/huliu-n11-kcjhx-1 created liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-n11-kcjhx-1-45fck 61s huliu-n11-kcjhx-master-0 Running 90m huliu-n11-kcjhx-master-1 Running 90m huliu-n11-kcjhx-master-2 Running 90m huliu-n11-kcjhx-worker-bwbjf Running 86m huliu-n11-kcjhx-worker-gvsfp Running 86m liuhuali@Lius-MacBook-Pro huali-test % oc describe machine huliu-n11-kcjhx-1-45fck ... Status: Conditions: Last Transition Time: 2022-06-06T02:57:27Z Status: True Type: Drainable Last Transition Time: 2022-06-06T02:57:27Z Message: Failed to check if machine exists: Error when finding VM by name huliu-n11-kcjhx-1-45fck. error: status: 401 UNAUTHORIZED, error-response: { "api_version": "3.1", "code": 401, "message_list": [ { "details": "Basic realm=\"Intent Gateway Login Required\"", "message": "Authentication required.", "reason": "AUTHENTICATION_REQUIRED" } ], "state": "ERROR" } Reason: ErrorCheckingProvider Status: Unknown Type: InstanceExists Last Transition Time: 2022-06-06T02:57:27Z Status: True Type: Terminable Last Updated: 2022-06-06T02:57:27Z Phase: Events: <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069