Description of problem: Terminate a running instance from aws/azure/gcp web console, then check its machine phase shows "running" Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-05-25-052746 How reproducible: Always Steps to Reproduce: 1. Terminate a running instance from aws/azure/gcp web console 2. Check machine phase 3. Actual results: Machine phase still is Running. $ oc get machine -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE zhsunaws525-qtlbn-master-0 Running m4.xlarge us-east-2 us-east-2a 23h ip-10-0-132-252.us-east-2.compute.internal aws:///us-east-2a/i-0853c407eef01db2d running zhsunaws525-qtlbn-master-1 Running m4.xlarge us-east-2 us-east-2b 23h ip-10-0-172-96.us-east-2.compute.internal aws:///us-east-2b/i-04f8bd514ff1bfa86 running zhsunaws525-qtlbn-master-2 Running m4.xlarge us-east-2 us-east-2c 23h ip-10-0-215-247.us-east-2.compute.internal aws:///us-east-2c/i-07cfd6d19592182b6 running zhsunaws525-qtlbn-worker-us-east-2a-wbkws Running m4.large us-east-2 us-east-2a 23h ip-10-0-152-19.us-east-2.compute.internal aws:///us-east-2a/i-0b2f1f8b6b1fdc6a6 running zhsunaws525-qtlbn-worker-us-east-2b-h8pq2 Running m4.large us-east-2 us-east-2b 23h ip-10-0-179-126.us-east-2.compute.internal aws:///us-east-2b/i-0f1ea8865fd3e68f5 Unknown I0526 01:19:28.695088 1 controller.go:169] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: reconciling Machine I0526 01:19:28.695101 1 actuator.go:100] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: actuator checking if machine exists W0526 01:19:28.756428 1 reconciler.go:364] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: Failed to find existing instance by id i-0f1ea8865fd3e68f5: instance i-0f1ea8865fd3e68f5 state "terminated" is not in running, pending, stopped, stopping, shutting-down E0526 01:19:28.810651 1 utils.go:166] Excluding instance matching zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: instance i-0f1ea8865fd3e68f5 state "terminated" is not in running, pending, stopped, stopping, shutting-down I0526 01:19:28.810674 1 reconciler.go:210] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: Instance does not exist I0526 01:19:28.810682 1 controller.go:424] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: going into phase "Failed" I0526 01:19:28.842111 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunaws525-qtlbn-worker-us-east-2b-h8pq2"} I0526 01:19:28.842158 1 controller.go:169] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: reconciling Machine I0526 01:19:28.842166 1 actuator.go:100] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: actuator checking if machine exists W0526 01:19:28.898814 1 reconciler.go:364] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: Failed to find existing instance by id i-0f1ea8865fd3e68f5: instance i-0f1ea8865fd3e68f5 state "terminated" is not in running, pending, stopped, stopping, shutting-down E0526 01:19:28.953888 1 utils.go:166] Excluding instance matching zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: instance i-0f1ea8865fd3e68f5 state "terminated" is not in running, pending, stopped, stopping, shutting-down I0526 01:19:28.953921 1 reconciler.go:210] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: Instance does not exist I0526 01:19:28.953932 1 controller.go:424] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: going into phase "Failed" status: addresses: - address: 10.0.179.126 type: InternalIP - address: ip-10-0-179-126.us-east-2.compute.internal type: InternalDNS - address: ip-10-0-179-126.us-east-2.compute.internal type: Hostname errorMessage: Can't find created instance. lastUpdated: "2020-05-26T01:14:07Z" nodeRef: kind: Node name: ip-10-0-179-126.us-east-2.compute.internal uid: 43cee894-bb51-4dcc-a304-28a948fe6e67 phase: Running providerStatus: conditions: - lastProbeTime: "2020-05-25T01:38:59Z" lastTransitionTime: "2020-05-25T01:38:59Z" message: Machine successfully created reason: MachineCreationSucceeded status: "True" type: MachineCreation instanceId: i-0f1ea8865fd3e68f5 instanceState: running Expected results: Machine status.phase should become 'Failed' Additional info:
This is expected. Once a machine is given a node is considering in "running" phase. The particular cloud state is reflected in STATE: Unknown. https://github.com/openshift/enhancements/blob/master/enhancements/machine-api/machine-instance-lifecycle.md We should come up with a more meaningful name to show for the phase similar to what we do for the console. This does not result trivial without disrupting potential existing clients
Please ignore my comment in https://bugzilla.redhat.com/show_bug.cgi?id=1839952#c1. I miss read the description. The machine should indeed go failed if the underlying instance is deleted. This should be fixed by https://github.com/openshift/cluster-api-provider-aws/pull/325
Verified tested on azure, clusterversion: 4.5.0-0.nightly-2020-06-03-013823, delete an instance from azure web console. $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsun63azure-7h44z-master-0 Running Standard_D8s_v3 westus 18h zhsun63azure-7h44z-master-1 Running Standard_D8s_v3 westus 18h zhsun63azure-7h44z-master-2 Running Standard_D8s_v3 westus 18h zhsun63azure-7h44z-worker-westus-4cmjd Running Standard_D2s_v3 westus 17h zhsun63azure-7h44z-worker-westus-hv647 Running Standard_D2s_v3 westus 17h zhsun63azure-7h44z-worker-westus-wtz6j Failed Standard_D2s_v3 westus 17h
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196