Bug 1884247
| Summary: | Master node machine's gone Phase Failed | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Lorenzo Dalrio <lorenzo.dalrio> |
| Component: | Cloud Compute | Assignee: | Alberto <agarcial> |
| Cloud Compute sub component: | Other Providers | QA Contact: | sunzhaohua <zhsun> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | unspecified | ||
| Version: | 4.5 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-01 12:53:37 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This seems to be pretty much identical to https://bugzilla.redhat.com/show_bug.cgi?id=1882169, are you happy to mark this as a duplicate? (I think the phase transition is because we are seeing some other state than the ones that are currently allowed) I agree with you, closing as a duplicate of #1882169. *** This bug has been marked as a duplicate of bug 1882169 *** |
User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36 Build Identifier: After a failed VM restart from the Azure console, machine-api reports one of the master nodes as Phase Failed: $ oc get machine -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE ocp-dev-westeu-9qm9r-master-0 Failed Standard_D4s_v3 westeurope 1 309d ocp-dev-westeu-9qm9r-master-1 Running Standard_D4s_v3 westeurope 3 309d ocp-dev-westeu-9qm9r-master-2 Running Standard_D4s_v3 westeurope 2 309d ocp-dev-westeu-9qm9r-worker-westeurope1-mwft7 Running Standard_D4s_v3 westeurope 1 309d ocp-dev-westeu-9qm9r-worker-westeurope2-qk2rc Running Standard_D4s_v3 westeurope 2 309d ocp-dev-westeu-9qm9r-worker-westeurope3-9npcw Running Standard_D4s_v3 westeurope 3 309d On the machine-controller container's log in the openshift-machine-api ns we found this: I1001 11:59:15.041452 1 controller.go:169] ocp-dev-westeu-9qm9r-master-0: reconciling Machine W1001 11:59:15.041551 1 controller.go:266] ocp-dev-westeu-9qm9r-master-0: machine has gone "Failed" phase. It won't reconcile I1001 11:59:15.041776 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"ocp-dev-westeu-9qm9r-master-0"} The node is working as expected though: $ oc get node NAME STATUS ROLES AGE VERSION ocp-dev-westeu-9qm9r-master-0 Ready master 309d v1.18.3+47c0e71 ocp-dev-westeu-9qm9r-master-1 Ready master 309d v1.18.3+47c0e71 ocp-dev-westeu-9qm9r-master-2 Ready master 309d v1.18.3+47c0e71 ocp-dev-westeu-9qm9r-worker-westeurope1-mwft7 Ready worker 309d v1.18.3+47c0e71 ocp-dev-westeu-9qm9r-worker-westeurope2-qk2rc Ready worker 309d v1.18.3+47c0e71 ocp-dev-westeu-9qm9r-worker-westeurope3-9npcw Ready worker 309d v1.18.3+47c0e71 Reproducible: Always IPI cluster on Azure westeurope region.