Bug 1808971 - Machine status shows "running" when an instance was terminated
Summary: Machine status shows "running" when an instance was terminated
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Danil Grigorev
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-02 07:18 UTC by sunzhaohua
Modified: 2020-08-04 18:02 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-04 18:02:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-api-provider-aws pull 321 0 None closed Bug 1808971: Machine status shows "running" when an instance was terminated 2021-01-21 21:08:26 UTC
Github openshift cluster-api-provider-azure pull 130 0 None closed Bug 1808971: Machine status shows "running" when an instance was terminated 2021-01-21 21:07:44 UTC
Github openshift cluster-api-provider-gcp pull 89 0 None closed Bug 1808971: Machine status shows "running" when an instance was terminated 2021-01-21 21:07:44 UTC
Github openshift machine-api-operator pull 575 0 None closed Bug 1808971: Machine status shows "running" when an instance was terminated 2021-01-21 21:08:25 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-08-04 18:02:48 UTC

Description sunzhaohua 2020-03-02 07:18:06 UTC
Description of problem:
IPI in Azure, stopped machine shows "running" when running "oc get machines -o wide" 

Version-Release number of selected component (if applicable):
4.4.0-0.nightly-2020-02-27-230850

How reproducible:
Always

Steps to Reproduce:
1. From Azure console, terminate a running instance
2. Check machine status


Actual results:
$ oc get machine -o wide
NAME                                   PHASE     TYPE              REGION      ZONE   AGE     NODE                                   PROVIDERID                                                                                                                                                                    STATE
zhsun1-bjb2j-worker-centralus3-s92vx   Failed    Standard_D2s_v3   centralus   3      7h52m   zhsun1-bjb2j-worker-centralus3-s92vx   azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun1-bjb2j-rg/providers/Microsoft.Compute/virtualMachines/zhsun1-bjb2j-worker-centralus3-s92vx   Running


Expected results:
State should show same with Azure console

Additional info:

Comment 4 sunzhaohua 2020-03-09 15:03:24 UTC
Failed QA
clusterversion: 4.4.0-0.nightly-2020-03-09-060825

Terminating a running instance from azure console, when running "oc get machines -o wide" state still show "Running"
$ oc get machine -o wide
NAME                                    PHASE     TYPE              REGION      ZONE   AGE     NODE                                    PROVIDERID                                                                                                                                                                      STATE
zhsun77-fxwrh-worker-centralus3-qjv4f   Failed    Standard_D2s_v3   centralus   3      3h57m   zhsun77-fxwrh-worker-centralus3-qjv4f   azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun77-fxwrh-rg/providers/Microsoft.Compute/virtualMachines/zhsun77-fxwrh-worker-centralus3-qjv4f   Running

Comment 8 sunzhaohua 2020-04-28 10:49:03 UTC
Alexander Demicev, sorry, I missed this bug.
1. I stopped the VM in Azure, next reconciliation cycle updated machine state to "Deallocated".
$ oc get machine -o wide
zhsunazure428-jvwvn-worker-westus-xxjdx      Running    Standard_D2s_v3   westus          7h17m   zhsunazure428-jvwvn-worker-westus-xxjdx     azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunazure428-jvwvn-rg/providers/Microsoft.Compute/virtualMachines/zhsunazure428-jvwvn-worker-westus-xxjdx     Deallocated
2. I deleted the VM in Azure, machine has gone "Failed" phase. It won't reconcile, machine state stuck in  "Running".
$ oc get machine -o wide
zhsunazure428-jvwvn-worker-westus-i-nzsmq    Failed     Standard_D2s_v3   westus          170m    zhsunazure428-jvwvn-worker-westus-i-nzsmq   azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunazure428-jvwvn-rg/providers/Microsoft.Compute/virtualMachines/zhsunazure428-jvwvn-worker-westus-i-nzsmq   Running

0428 10:00:08.302934       1 controller.go:166] zhsunazure428-jvwvn-worker-westus-i-nzsmq: reconciling Machine
I0428 10:00:08.302961       1 actuator.go:197] Checking if machine zhsunazure428-jvwvn-worker-westus-i-nzsmq exists
I0428 10:00:08.318906       1 controller.go:72] controllers/MachineSet "msg"="Reconciling" "machineset"="zhsunazure428-jvwvn-worker-westus-i" "namespace"="openshift-machine-api" 
I0428 10:00:08.338158       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machineset" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure428-jvwvn-worker-westus-i"}
I0428 10:00:13.895161       1 reconciler.go:352] Found vm for machine zhsunazure428-jvwvn-worker-westus-i-nzsmq
I0428 10:00:13.895191       1 controller.go:421] zhsunazure428-jvwvn-worker-westus-i-nzsmq: going into phase "Failed"
I0428 10:00:13.909739       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure428-jvwvn-worker-westus-i-nzsmq"}
I0428 10:00:13.909795       1 controller.go:166] zhsunazure428-jvwvn-worker-westus-i-nzsmq: reconciling Machine
W0428 10:00:13.909811       1 controller.go:263] zhsunazure428-jvwvn-worker-westus-i-nzsmq: machine has gone "Failed" phase. It won't reconcile
I0428 10:00:13.909830       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure428-jvwvn-worker-westus-i-nzsmq"}

Comment 10 Danil Grigorev 2020-04-29 13:35:05 UTC
PR to fix: https://github.com/openshift/machine-api-operator/pull/575

When machine is going into Failed phase, it stops reconciliation. This prevents updating VM state in the annotations, and the Running state is stuck forever. The PR claims to fix this by setting "Unknown" state for the VM instance display on a failure.

Comment 13 sunzhaohua 2020-05-08 05:44:55 UTC
Failed QA
clusterversion: 4.5.0-0.nightly-2020-05-07-144853

1. I stopped the VM in Azure, next reconciliation cycle updated machine state to "Deallocated".
$ oc get machine -o wide
zhsunazure506-8647p-worker-westus-svnb7   Running   Standard_D2s_v3   westus          161m   zhsunazure506-8647p-worker-westus-svnb7   azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunazure506-8647p-rg/providers/Microsoft.Compute/virtualMachines/zhsunazure506-8647p-worker-westus-svnb7   Updating

2. I deleted the VM in Azure, machine has gone "Failed" phase. It won't reconcile, machine state stuck in  "Running".
$ oc get machine -o wide
zhsunazure506-8647p-worker-westus-qvwts   Failed    Standard_D2s_v3   westus          47h    zhsunazure506-8647p-worker-westus-qvwts   azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunazure506-8647p-rg/providers/Microsoft.Compute/virtualMachines/zhsunazure506-8647p-worker-westus-qvwts   Running

I0508 05:24:35.561210       1 controller.go:166] zhsunazure506-8647p-worker-westus-svnb7: reconciling Machine
I0508 05:24:35.563594       1 actuator.go:201] Checking if machine zhsunazure506-8647p-worker-westus-svnb7 exists
I0508 05:24:35.594448       1 controller.go:72] controllers/MachineSet "msg"="Reconciling" "machineset"="zhsunazure506-8647p-worker-westus" "namespace"="openshift-machine-api" 
I0508 05:24:35.626051       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machineset" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure506-8647p-worker-westus"}
I0508 05:24:41.092399       1 controller.go:72] controllers/MachineSet "msg"="Reconciling" "machineset"="zhsunazure506-8647p-worker-westus" "namespace"="openshift-machine-api" 
I0508 05:24:41.115076       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machineset" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure506-8647p-worker-westus"}
I0508 05:24:46.155333       1 reconciler.go:353] Found vm for machine zhsunazure506-8647p-worker-westus-svnb7
I0508 05:24:46.155373       1 reconciler.go:375] Machine 64c4f50f-1f25-4a88-8980-318ba53b4d6a is updating
I0508 05:24:46.155389       1 controller.go:274] zhsunazure506-8647p-worker-westus-svnb7: reconciling machine triggers idempotent update
I0508 05:24:46.155397       1 actuator.go:168] Updating machine zhsunazure506-8647p-worker-westus-svnb7
I0508 05:24:46.602311       1 machine_scope.go:157] zhsunazure506-8647p-worker-westus-svnb7: patching machine
I0508 05:24:46.626594       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure506-8647p-worker-westus-svnb7"}
I0508 05:24:46.626742       1 recorder.go:52] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"zhsunazure506-8647p-worker-westus-svnb7\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsunazure506-8647p-worker-westus-svnb7","uid":"32c357f6-bfdf-4c90-8f09-13bf553f3f12","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"1076619"} "reason"="Updated"
I0508 05:24:46.626779       1 controller.go:166] zhsunazure506-8647p-worker-westus-qvwts: reconciling Machine
I0508 05:24:46.626890       1 actuator.go:201] Checking if machine zhsunazure506-8647p-worker-westus-qvwts exists
I0508 05:24:46.975076       1 reconciler.go:353] Found vm for machine zhsunazure506-8647p-worker-westus-qvwts
I0508 05:24:46.975106       1 controller.go:421] zhsunazure506-8647p-worker-westus-qvwts: going into phase "Failed"
I0508 05:24:46.987850       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure506-8647p-worker-westus-qvwts"}
I0508 05:24:46.987999       1 controller.go:166] zhsunazure506-8647p-worker-westus-svnb7: reconciling Machine
I0508 05:24:46.988012       1 actuator.go:201] Checking if machine zhsunazure506-8647p-worker-westus-svnb7 exists
I0508 05:24:47.412379       1 reconciler.go:353] Found vm for machine zhsunazure506-8647p-worker-westus-svnb7
I0508 05:24:47.412407       1 reconciler.go:375] Machine 64c4f50f-1f25-4a88-8980-318ba53b4d6a is updating
I0508 05:24:47.412417       1 controller.go:274] zhsunazure506-8647p-worker-westus-svnb7: reconciling machine triggers idempotent update
I0508 05:24:47.412422       1 actuator.go:168] Updating machine zhsunazure506-8647p-worker-westus-svnb7
I0508 05:24:48.113432       1 machine_scope.go:141] zhsunazure506-8647p-worker-westus-svnb7: status unchanged
I0508 05:24:48.113725       1 machine_scope.go:141] zhsunazure506-8647p-worker-westus-svnb7: status unchanged
I0508 05:24:48.113751       1 machine_scope.go:157] zhsunazure506-8647p-worker-westus-svnb7: patching machine
I0508 05:24:48.130862       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure506-8647p-worker-westus-svnb7"}
I0508 05:24:48.130912       1 controller.go:166] zhsunazure506-8647p-worker-westus-qvwts: reconciling Machine
W0508 05:24:48.130924       1 controller.go:263] zhsunazure506-8647p-worker-westus-qvwts: machine has gone "Failed" phase. It won't reconcile
I0508 05:24:48.130938       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure506-8647p-worker-westus-qvwts"}
I0508 05:24:48.130971       1 recorder.go:52] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"zhsunazure506-8647p-worker-westus-svnb7\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsunazure506-8647p-worker-westus-svnb7","uid":"32c357f6-bfdf-4c90-8f09-13bf553f3f12","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"1076830"} "reason"="Updated"

Comment 15 sunzhaohua 2020-05-18 08:35:06 UTC
Waitting for pr merge https://github.com/openshift/cluster-api-provider-azure/pull/130

Comment 16 Danil Grigorev 2020-05-20 16:26:30 UTC
 https://github.com/openshift/cluster-api-provider-azure/pull/130 is merged, moving to modified manually as it was incorrectly labelled and broke the bz robot workflow.

Comment 18 sunzhaohua 2020-05-26 02:05:01 UTC
Verified
Tested on aws azure and gcp
deleted instance in web console,  machine state shows  "unknow".
stop instance in web console, machine state shows "stoped " "TERMINATED" "Deallocated" in aws gcp and azure
but deleted instance in web console,  machine phase should show  "Failed", will open a new bug to trace.

$ oc get machine -o wide
NAME                                        PHASE     TYPE        REGION      ZONE         AGE   NODE                                         PROVIDERID                              STATE
zhsunaws525-qtlbn-master-0                  Running   m4.xlarge   us-east-2   us-east-2a   23h   ip-10-0-132-252.us-east-2.compute.internal   aws:///us-east-2a/i-0853c407eef01db2d   running
zhsunaws525-qtlbn-master-1                  Running   m4.xlarge   us-east-2   us-east-2b   23h   ip-10-0-172-96.us-east-2.compute.internal    aws:///us-east-2b/i-04f8bd514ff1bfa86   running
zhsunaws525-qtlbn-master-2                  Running   m4.xlarge   us-east-2   us-east-2c   23h   ip-10-0-215-247.us-east-2.compute.internal   aws:///us-east-2c/i-07cfd6d19592182b6   running
zhsunaws525-qtlbn-worker-us-east-2a-wbkws   Running   m4.large    us-east-2   us-east-2a   23h   ip-10-0-152-19.us-east-2.compute.internal    aws:///us-east-2a/i-0b2f1f8b6b1fdc6a6   running
zhsunaws525-qtlbn-worker-us-east-2b-h8pq2   Running   m4.large    us-east-2   us-east-2b   23h   ip-10-0-179-126.us-east-2.compute.internal   aws:///us-east-2b/i-0f1ea8865fd3e68f5   Unknown
zhsunaws525-qtlbn-worker-us-east-2c-h7ftz   Running   m4.large    us-east-2   us-east-2c   23h   ip-10-0-211-47.us-east-2.compute.internal    aws:///us-east-2c/i-0af1b18fd5a2ba18c   stopped

$ oc get machine -o wide
NAME                                PHASE     TYPE            REGION        ZONE            AGE    NODE                                                        PROVIDERID                                                           STATE
zhsun4526gcp-b2clw-master-0         Running   n1-standard-4   us-central1   us-central1-a   129m   zhsun4526gcp-b2clw-master-0.c.openshift-qe.internal         gce://openshift-qe/us-central1-a/zhsun4526gcp-b2clw-master-0         RUNNING
zhsun4526gcp-b2clw-master-1         Running   n1-standard-4   us-central1   us-central1-b   129m   zhsun4526gcp-b2clw-master-1.c.openshift-qe.internal         gce://openshift-qe/us-central1-b/zhsun4526gcp-b2clw-master-1         RUNNING
zhsun4526gcp-b2clw-master-2         Running   n1-standard-4   us-central1   us-central1-c   129m   zhsun4526gcp-b2clw-master-2.c.openshift-qe.internal         gce://openshift-qe/us-central1-c/zhsun4526gcp-b2clw-master-2         RUNNING
zhsun4526gcp-b2clw-worker-a-264sv   Running   n1-standard-4   us-central1   us-central1-a   120m   zhsun4526gcp-b2clw-worker-a-264sv.c.openshift-qe.internal   gce://openshift-qe/us-central1-a/zhsun4526gcp-b2clw-worker-a-264sv   RUNNING
zhsun4526gcp-b2clw-worker-b-s52b7   Running   n1-standard-4   us-central1   us-central1-b   120m   zhsun4526gcp-b2clw-worker-b-s52b7.c.openshift-qe.internal   gce://openshift-qe/us-central1-b/zhsun4526gcp-b2clw-worker-b-s52b7   Unknown
zhsun4526gcp-b2clw-worker-c-6f7ss   Running   n1-standard-4   us-central1   us-central1-c   120m   zhsun4526gcp-b2clw-worker-c-6f7ss.c.openshift-qe.internal   gce://openshift-qe/us-central1-c/zhsun4526gcp-b2clw-worker-c-6f7ss   TERMINATED


$ oc get machine -o wide
NAME                                       PHASE     TYPE              REGION   ZONE   AGE    NODE                                       PROVIDERID                                                                                                                                                                                STATE
zhsun4526azure-7pxft-master-0              Running   Standard_D8s_v3   westus          114m   zhsun4526azure-7pxft-master-0              azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun4526azure-7pxft-rg/providers/Microsoft.Compute/virtualMachines/zhsun4526azure-7pxft-master-0              Running
zhsun4526azure-7pxft-master-1              Running   Standard_D8s_v3   westus          114m   zhsun4526azure-7pxft-master-1              azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun4526azure-7pxft-rg/providers/Microsoft.Compute/virtualMachines/zhsun4526azure-7pxft-master-1              Running
zhsun4526azure-7pxft-master-2              Running   Standard_D8s_v3   westus          114m   zhsun4526azure-7pxft-master-2              azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun4526azure-7pxft-rg/providers/Microsoft.Compute/virtualMachines/zhsun4526azure-7pxft-master-2              Running
zhsun4526azure-7pxft-worker-westus-bpbbr   Running   Standard_D2s_v3   westus          101m   zhsun4526azure-7pxft-worker-westus-bpbbr   azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun4526azure-7pxft-rg/providers/Microsoft.Compute/virtualMachines/zhsun4526azure-7pxft-worker-westus-bpbbr   Running
zhsun4526azure-7pxft-worker-westus-d6npp   Running   Standard_D2s_v3   westus          101m   zhsun4526azure-7pxft-worker-westus-d6npp   azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun4526azure-7pxft-rg/providers/Microsoft.Compute/virtualMachines/zhsun4526azure-7pxft-worker-westus-d6npp   Unknown
zhsun4526azure-7pxft-worker-westus-vwml7   Running   Standard_D2s_v3   westus          101m   zhsun4526azure-7pxft-worker-westus-vwml7   azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsun4526azure-7pxft-rg/providers/Microsoft.Compute/virtualMachines/zhsun4526azure-7pxft-worker-westus-vwml7   Deallocated

Comment 20 errata-xmlrpc 2020-08-04 18:02:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.