Bug 2090359
| Summary: | Nutanix mapi-controller: misleading error message when the failure is caused by wrong credentials | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Yanhua Li <yanhli> |
| Component: | Cloud Compute | Assignee: | Yanhua Li <yanhli> |
| Cloud Compute sub component: | Other Providers | QA Contact: | Huali Liu <huliu> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | low | ||
| Priority: | low | ||
| Version: | 4.11 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 11:14:11 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The root cause of the issue is in the prism-go-client library (https://github.com/nutanix-cloud-native/prism-go-client) error-handling code. It failed to unmarshal the prism-api call error response message to a ErrorResponse structure with the wrong type assumption. The bug is fixed with the version ithub.com/nutanix-cloud-native/prism-go-client.0-20220511213441-cc121d3d3c27 Reproduce the issue on 4.11.0-0.nightly-2022-05-25-193227, waiting for latest nightly build to verify the issue. Steps: 1.Install a fresh cluster on nutanix, should success liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-05-25-193227 True False 12m Cluster version is 4.11.0-0.nightly-2022-05-25-193227 liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api Now using project "openshift-machine-api" on server "https://api.huliu-n10.qe.devcluster.openshift.com:6443". liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-n10-brxw2-master-0 Running 37m huliu-n10-brxw2-master-1 Running 37m huliu-n10-brxw2-master-2 Running 37m huliu-n10-brxw2-worker-575dx Running 33m huliu-n10-brxw2-worker-wxc97 Running 33m 2.Change nutanix-credentials secret to a wrong password liuhuali@Lius-MacBook-Pro huali-test % oc edit secret nutanix-credentials secret/nutanix-credentials edited 3.Create a new machineset liuhuali@Lius-MacBook-Pro huali-test % oc create -f msnutanix.yaml machineset.machine.openshift.io/huliu-n10-brxw2-1 created liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-n10-brxw2-1-slkv5 4s huliu-n10-brxw2-master-0 Running 75m huliu-n10-brxw2-master-1 Running 75m huliu-n10-brxw2-master-2 Running 75m huliu-n10-brxw2-worker-575dx Running 70m huliu-n10-brxw2-worker-wxc97 Running 70m liuhuali@Lius-MacBook-Pro huali-test % oc describe machine huliu-n10-brxw2-1-slkv5 ... Status: Conditions: Last Transition Time: 2022-06-01T09:30:32Z Status: True Type: Drainable Last Transition Time: 2022-06-01T09:30:32Z Message: Failed to check if machine exists: Error when finding VM by name huliu-n10-brxw2-1-slkv5. error: json: cannot unmarshal string into Go struct field MessageResource.message_list.details of type map[string]interface {} Reason: ErrorCheckingProvider Status: Unknown Type: InstanceExists Last Transition Time: 2022-06-01T09:30:32Z Status: True Type: Terminable Last Updated: 2022-06-01T09:30:32Z ... Verified on 4.11.0-0.nightly-2022-06-04-014713
1.Install a fresh cluster on nutanix, should success
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-06-04-014713 True False 63m Cluster version is 4.11.0-0.nightly-2022-06-04-014713
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
huliu-n11-kcjhx-master-0 Running 87m
huliu-n11-kcjhx-master-1 Running 87m
huliu-n11-kcjhx-master-2 Running 87m
huliu-n11-kcjhx-worker-bwbjf Running 84m
huliu-n11-kcjhx-worker-gvsfp Running 84m
2.Change nutanix-credentials secret to a wrong password
liuhuali@Lius-MacBook-Pro huali-test % oc edit secret nutanix-credentials
secret/nutanix-credentials edited
3.Create a new machineset
liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml
machineset.machine.openshift.io/huliu-n11-kcjhx-1 created
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
huliu-n11-kcjhx-1-45fck 61s
huliu-n11-kcjhx-master-0 Running 90m
huliu-n11-kcjhx-master-1 Running 90m
huliu-n11-kcjhx-master-2 Running 90m
huliu-n11-kcjhx-worker-bwbjf Running 86m
huliu-n11-kcjhx-worker-gvsfp Running 86m
liuhuali@Lius-MacBook-Pro huali-test % oc describe machine huliu-n11-kcjhx-1-45fck
...
Status:
Conditions:
Last Transition Time: 2022-06-06T02:57:27Z
Status: True
Type: Drainable
Last Transition Time: 2022-06-06T02:57:27Z
Message: Failed to check if machine exists: Error when finding VM by name huliu-n11-kcjhx-1-45fck. error: status: 401 UNAUTHORIZED, error-response: {
"api_version": "3.1",
"code": 401,
"message_list": [
{
"details": "Basic realm=\"Intent Gateway Login Required\"",
"message": "Authentication required.",
"reason": "AUTHENTICATION_REQUIRED"
}
],
"state": "ERROR"
}
Reason: ErrorCheckingProvider
Status: Unknown
Type: InstanceExists
Last Transition Time: 2022-06-06T02:57:27Z
Status: True
Type: Terminable
Last Updated: 2022-06-06T02:57:27Z
Phase:
Events: <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |
Description of problem: Reported by Richard Vanderpool on 5/6/2022: "I am running an install from 4.11.0-0.nightly-2022-05-06-060226 in the CI LTS environment and the compute nodes are failing to create with Message: Failed to check if machine exists: Error when finding VM by name rvanderp5-dev-w2mdn-worker-5mlnl. error: json: cannot unmarshal string into Go struct field MessageResource.message_list.details of type map[string]interface {}{}edit: This was due to an auth error, we need to make the error message more useful in the machine resource." Version-Release number of selected component (if applicable): 4.11.0-0.nightly-2022-05-06-060226 How reproducible: When the Prism Central credentials in the nutanix-credentials secret data is wrong. Steps to Reproduce: 1. 2. 3. Actual results: The Nutanix mapi-machine-controller failed to create machine vms with the misleading error log: error: json: cannot unmarshal string into Go struct field MessageResource.message_list.details of type map[string]interface {}{}edit: Expected results: The Nutanix mapi-machine-controller failed to create machine vms with the meaningful error log of the cause is an authentication error. Additional info: