Description of problem: IPI in Azure, stopped machine shows "running" when running "oc get machines -o wide" Version-Release number of selected component (if applicable): 4.4.0-0.nightly-2020-02-27-230850 How reproducible: Always Steps to Reproduce: 1. From Azure console, terminate a running instance 2. Check machine status Actual results: $ oc get machine -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE zhsun1-bjb2j-worker-centralus3-s92vx Failed Standard_D2s_v3 centralus 3 7h52m zhsun1-bjb2j-worker-centralus3-s92vx azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun1-bjb2j-rg/providers/Microsoft.Compute/virtualMachines/zhsun1-bjb2j-worker-centralus3-s92vx Running Expected results: State should show same with Azure console Additional info:
SHould be fixed by https://github.com/openshift/cluster-api-provider-azure/commit/06be56f2c5a11019b1c700de4ef72028f014b5e2
Failed QA clusterversion: 4.4.0-0.nightly-2020-03-09-060825 Terminating a running instance from azure console, when running "oc get machines -o wide" state still show "Running" $ oc get machine -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE zhsun77-fxwrh-worker-centralus3-qjv4f Failed Standard_D2s_v3 centralus 3 3h57m zhsun77-fxwrh-worker-centralus3-qjv4f azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun77-fxwrh-rg/providers/Microsoft.Compute/virtualMachines/zhsun77-fxwrh-worker-centralus3-qjv4f Running
Alexander Demicev, sorry, I missed this bug. 1. I stopped the VM in Azure, next reconciliation cycle updated machine state to "Deallocated". $ oc get machine -o wide zhsunazure428-jvwvn-worker-westus-xxjdx Running Standard_D2s_v3 westus 7h17m zhsunazure428-jvwvn-worker-westus-xxjdx azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunazure428-jvwvn-rg/providers/Microsoft.Compute/virtualMachines/zhsunazure428-jvwvn-worker-westus-xxjdx Deallocated 2. I deleted the VM in Azure, machine has gone "Failed" phase. It won't reconcile, machine state stuck in "Running". $ oc get machine -o wide zhsunazure428-jvwvn-worker-westus-i-nzsmq Failed Standard_D2s_v3 westus 170m zhsunazure428-jvwvn-worker-westus-i-nzsmq azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunazure428-jvwvn-rg/providers/Microsoft.Compute/virtualMachines/zhsunazure428-jvwvn-worker-westus-i-nzsmq Running 0428 10:00:08.302934 1 controller.go:166] zhsunazure428-jvwvn-worker-westus-i-nzsmq: reconciling Machine I0428 10:00:08.302961 1 actuator.go:197] Checking if machine zhsunazure428-jvwvn-worker-westus-i-nzsmq exists I0428 10:00:08.318906 1 controller.go:72] controllers/MachineSet "msg"="Reconciling" "machineset"="zhsunazure428-jvwvn-worker-westus-i" "namespace"="openshift-machine-api" I0428 10:00:08.338158 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machineset" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure428-jvwvn-worker-westus-i"} I0428 10:00:13.895161 1 reconciler.go:352] Found vm for machine zhsunazure428-jvwvn-worker-westus-i-nzsmq I0428 10:00:13.895191 1 controller.go:421] zhsunazure428-jvwvn-worker-westus-i-nzsmq: going into phase "Failed" I0428 10:00:13.909739 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure428-jvwvn-worker-westus-i-nzsmq"} I0428 10:00:13.909795 1 controller.go:166] zhsunazure428-jvwvn-worker-westus-i-nzsmq: reconciling Machine W0428 10:00:13.909811 1 controller.go:263] zhsunazure428-jvwvn-worker-westus-i-nzsmq: machine has gone "Failed" phase. It won't reconcile I0428 10:00:13.909830 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure428-jvwvn-worker-westus-i-nzsmq"}
PR to fix: https://github.com/openshift/machine-api-operator/pull/575 When machine is going into Failed phase, it stops reconciliation. This prevents updating VM state in the annotations, and the Running state is stuck forever. The PR claims to fix this by setting "Unknown" state for the VM instance display on a failure.
Failed QA clusterversion: 4.5.0-0.nightly-2020-05-07-144853 1. I stopped the VM in Azure, next reconciliation cycle updated machine state to "Deallocated". $ oc get machine -o wide zhsunazure506-8647p-worker-westus-svnb7 Running Standard_D2s_v3 westus 161m zhsunazure506-8647p-worker-westus-svnb7 azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunazure506-8647p-rg/providers/Microsoft.Compute/virtualMachines/zhsunazure506-8647p-worker-westus-svnb7 Updating 2. I deleted the VM in Azure, machine has gone "Failed" phase. It won't reconcile, machine state stuck in "Running". $ oc get machine -o wide zhsunazure506-8647p-worker-westus-qvwts Failed Standard_D2s_v3 westus 47h zhsunazure506-8647p-worker-westus-qvwts azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunazure506-8647p-rg/providers/Microsoft.Compute/virtualMachines/zhsunazure506-8647p-worker-westus-qvwts Running I0508 05:24:35.561210 1 controller.go:166] zhsunazure506-8647p-worker-westus-svnb7: reconciling Machine I0508 05:24:35.563594 1 actuator.go:201] Checking if machine zhsunazure506-8647p-worker-westus-svnb7 exists I0508 05:24:35.594448 1 controller.go:72] controllers/MachineSet "msg"="Reconciling" "machineset"="zhsunazure506-8647p-worker-westus" "namespace"="openshift-machine-api" I0508 05:24:35.626051 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machineset" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure506-8647p-worker-westus"} I0508 05:24:41.092399 1 controller.go:72] controllers/MachineSet "msg"="Reconciling" "machineset"="zhsunazure506-8647p-worker-westus" "namespace"="openshift-machine-api" I0508 05:24:41.115076 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machineset" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure506-8647p-worker-westus"} I0508 05:24:46.155333 1 reconciler.go:353] Found vm for machine zhsunazure506-8647p-worker-westus-svnb7 I0508 05:24:46.155373 1 reconciler.go:375] Machine 64c4f50f-1f25-4a88-8980-318ba53b4d6a is updating I0508 05:24:46.155389 1 controller.go:274] zhsunazure506-8647p-worker-westus-svnb7: reconciling machine triggers idempotent update I0508 05:24:46.155397 1 actuator.go:168] Updating machine zhsunazure506-8647p-worker-westus-svnb7 I0508 05:24:46.602311 1 machine_scope.go:157] zhsunazure506-8647p-worker-westus-svnb7: patching machine I0508 05:24:46.626594 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure506-8647p-worker-westus-svnb7"} I0508 05:24:46.626742 1 recorder.go:52] controller-runtime/manager/events "msg"="Normal" "message"="Updated machine \"zhsunazure506-8647p-worker-westus-svnb7\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsunazure506-8647p-worker-westus-svnb7","uid":"32c357f6-bfdf-4c90-8f09-13bf553f3f12","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"1076619"} "reason"="Updated" I0508 05:24:46.626779 1 controller.go:166] zhsunazure506-8647p-worker-westus-qvwts: reconciling Machine I0508 05:24:46.626890 1 actuator.go:201] Checking if machine zhsunazure506-8647p-worker-westus-qvwts exists I0508 05:24:46.975076 1 reconciler.go:353] Found vm for machine zhsunazure506-8647p-worker-westus-qvwts I0508 05:24:46.975106 1 controller.go:421] zhsunazure506-8647p-worker-westus-qvwts: going into phase "Failed" I0508 05:24:46.987850 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure506-8647p-worker-westus-qvwts"} I0508 05:24:46.987999 1 controller.go:166] zhsunazure506-8647p-worker-westus-svnb7: reconciling Machine I0508 05:24:46.988012 1 actuator.go:201] Checking if machine zhsunazure506-8647p-worker-westus-svnb7 exists I0508 05:24:47.412379 1 reconciler.go:353] Found vm for machine zhsunazure506-8647p-worker-westus-svnb7 I0508 05:24:47.412407 1 reconciler.go:375] Machine 64c4f50f-1f25-4a88-8980-318ba53b4d6a is updating I0508 05:24:47.412417 1 controller.go:274] zhsunazure506-8647p-worker-westus-svnb7: reconciling machine triggers idempotent update I0508 05:24:47.412422 1 actuator.go:168] Updating machine zhsunazure506-8647p-worker-westus-svnb7 I0508 05:24:48.113432 1 machine_scope.go:141] zhsunazure506-8647p-worker-westus-svnb7: status unchanged I0508 05:24:48.113725 1 machine_scope.go:141] zhsunazure506-8647p-worker-westus-svnb7: status unchanged I0508 05:24:48.113751 1 machine_scope.go:157] zhsunazure506-8647p-worker-westus-svnb7: patching machine I0508 05:24:48.130862 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure506-8647p-worker-westus-svnb7"} I0508 05:24:48.130912 1 controller.go:166] zhsunazure506-8647p-worker-westus-qvwts: reconciling Machine W0508 05:24:48.130924 1 controller.go:263] zhsunazure506-8647p-worker-westus-qvwts: machine has gone "Failed" phase. It won't reconcile I0508 05:24:48.130938 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure506-8647p-worker-westus-qvwts"} I0508 05:24:48.130971 1 recorder.go:52] controller-runtime/manager/events "msg"="Normal" "message"="Updated machine \"zhsunazure506-8647p-worker-westus-svnb7\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsunazure506-8647p-worker-westus-svnb7","uid":"32c357f6-bfdf-4c90-8f09-13bf553f3f12","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"1076830"} "reason"="Updated"
Waitting for pr merge https://github.com/openshift/cluster-api-provider-azure/pull/130
https://github.com/openshift/cluster-api-provider-azure/pull/130 is merged, moving to modified manually as it was incorrectly labelled and broke the bz robot workflow.
Verified Tested on aws azure and gcp deleted instance in web console, machine state shows "unknow". stop instance in web console, machine state shows "stoped " "TERMINATED" "Deallocated" in aws gcp and azure but deleted instance in web console, machine phase should show "Failed", will open a new bug to trace. $ oc get machine -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE zhsunaws525-qtlbn-master-0 Running m4.xlarge us-east-2 us-east-2a 23h ip-10-0-132-252.us-east-2.compute.internal aws:///us-east-2a/i-0853c407eef01db2d running zhsunaws525-qtlbn-master-1 Running m4.xlarge us-east-2 us-east-2b 23h ip-10-0-172-96.us-east-2.compute.internal aws:///us-east-2b/i-04f8bd514ff1bfa86 running zhsunaws525-qtlbn-master-2 Running m4.xlarge us-east-2 us-east-2c 23h ip-10-0-215-247.us-east-2.compute.internal aws:///us-east-2c/i-07cfd6d19592182b6 running zhsunaws525-qtlbn-worker-us-east-2a-wbkws Running m4.large us-east-2 us-east-2a 23h ip-10-0-152-19.us-east-2.compute.internal aws:///us-east-2a/i-0b2f1f8b6b1fdc6a6 running zhsunaws525-qtlbn-worker-us-east-2b-h8pq2 Running m4.large us-east-2 us-east-2b 23h ip-10-0-179-126.us-east-2.compute.internal aws:///us-east-2b/i-0f1ea8865fd3e68f5 Unknown zhsunaws525-qtlbn-worker-us-east-2c-h7ftz Running m4.large us-east-2 us-east-2c 23h ip-10-0-211-47.us-east-2.compute.internal aws:///us-east-2c/i-0af1b18fd5a2ba18c stopped $ oc get machine -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE zhsun4526gcp-b2clw-master-0 Running n1-standard-4 us-central1 us-central1-a 129m zhsun4526gcp-b2clw-master-0.c.openshift-qe.internal gce://openshift-qe/us-central1-a/zhsun4526gcp-b2clw-master-0 RUNNING zhsun4526gcp-b2clw-master-1 Running n1-standard-4 us-central1 us-central1-b 129m zhsun4526gcp-b2clw-master-1.c.openshift-qe.internal gce://openshift-qe/us-central1-b/zhsun4526gcp-b2clw-master-1 RUNNING zhsun4526gcp-b2clw-master-2 Running n1-standard-4 us-central1 us-central1-c 129m zhsun4526gcp-b2clw-master-2.c.openshift-qe.internal gce://openshift-qe/us-central1-c/zhsun4526gcp-b2clw-master-2 RUNNING zhsun4526gcp-b2clw-worker-a-264sv Running n1-standard-4 us-central1 us-central1-a 120m zhsun4526gcp-b2clw-worker-a-264sv.c.openshift-qe.internal gce://openshift-qe/us-central1-a/zhsun4526gcp-b2clw-worker-a-264sv RUNNING zhsun4526gcp-b2clw-worker-b-s52b7 Running n1-standard-4 us-central1 us-central1-b 120m zhsun4526gcp-b2clw-worker-b-s52b7.c.openshift-qe.internal gce://openshift-qe/us-central1-b/zhsun4526gcp-b2clw-worker-b-s52b7 Unknown zhsun4526gcp-b2clw-worker-c-6f7ss Running n1-standard-4 us-central1 us-central1-c 120m zhsun4526gcp-b2clw-worker-c-6f7ss.c.openshift-qe.internal gce://openshift-qe/us-central1-c/zhsun4526gcp-b2clw-worker-c-6f7ss TERMINATED $ oc get machine -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE zhsun4526azure-7pxft-master-0 Running Standard_D8s_v3 westus 114m zhsun4526azure-7pxft-master-0 azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun4526azure-7pxft-rg/providers/Microsoft.Compute/virtualMachines/zhsun4526azure-7pxft-master-0 Running zhsun4526azure-7pxft-master-1 Running Standard_D8s_v3 westus 114m zhsun4526azure-7pxft-master-1 azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun4526azure-7pxft-rg/providers/Microsoft.Compute/virtualMachines/zhsun4526azure-7pxft-master-1 Running zhsun4526azure-7pxft-master-2 Running Standard_D8s_v3 westus 114m zhsun4526azure-7pxft-master-2 azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun4526azure-7pxft-rg/providers/Microsoft.Compute/virtualMachines/zhsun4526azure-7pxft-master-2 Running zhsun4526azure-7pxft-worker-westus-bpbbr Running Standard_D2s_v3 westus 101m zhsun4526azure-7pxft-worker-westus-bpbbr azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun4526azure-7pxft-rg/providers/Microsoft.Compute/virtualMachines/zhsun4526azure-7pxft-worker-westus-bpbbr Running zhsun4526azure-7pxft-worker-westus-d6npp Running Standard_D2s_v3 westus 101m zhsun4526azure-7pxft-worker-westus-d6npp azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun4526azure-7pxft-rg/providers/Microsoft.Compute/virtualMachines/zhsun4526azure-7pxft-worker-westus-d6npp Unknown zhsun4526azure-7pxft-worker-westus-vwml7 Running Standard_D2s_v3 westus 101m zhsun4526azure-7pxft-worker-westus-vwml7 azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun4526azure-7pxft-rg/providers/Microsoft.Compute/virtualMachines/zhsun4526azure-7pxft-worker-westus-vwml7 Deallocated
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409