Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1839952

Summary:	Machines phase should become 'Failed' when its instance is deleted
Product:	OpenShift Container Platform	Reporter:	sunzhaohua <zhsun>
Component:	Cloud Compute	Assignee:	Alberto <agarcial>
Cloud Compute sub component:	Other Providers	QA Contact:	sunzhaohua <zhsun>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	medium
Version:	4.5
Target Milestone:	---
Target Release:	4.6.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-10-27 16:01:02 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description sunzhaohua 2020-05-26 05:51:23 UTC

Description of problem:
Terminate a running instance from aws/azure/gcp web console, then check its machine phase shows "running" 

Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-05-25-052746

How reproducible:
Always

Steps to Reproduce:
1. Terminate a running instance from aws/azure/gcp web console
2. Check machine phase
3.

Actual results:
Machine phase still is Running.
$ oc get machine -o wide
NAME                                        PHASE     TYPE        REGION      ZONE         AGE   NODE                                         PROVIDERID                              STATE
zhsunaws525-qtlbn-master-0                  Running   m4.xlarge   us-east-2   us-east-2a   23h   ip-10-0-132-252.us-east-2.compute.internal   aws:///us-east-2a/i-0853c407eef01db2d   running
zhsunaws525-qtlbn-master-1                  Running   m4.xlarge   us-east-2   us-east-2b   23h   ip-10-0-172-96.us-east-2.compute.internal    aws:///us-east-2b/i-04f8bd514ff1bfa86   running
zhsunaws525-qtlbn-master-2                  Running   m4.xlarge   us-east-2   us-east-2c   23h   ip-10-0-215-247.us-east-2.compute.internal   aws:///us-east-2c/i-07cfd6d19592182b6   running
zhsunaws525-qtlbn-worker-us-east-2a-wbkws   Running   m4.large    us-east-2   us-east-2a   23h   ip-10-0-152-19.us-east-2.compute.internal    aws:///us-east-2a/i-0b2f1f8b6b1fdc6a6   running
zhsunaws525-qtlbn-worker-us-east-2b-h8pq2   Running   m4.large    us-east-2   us-east-2b   23h   ip-10-0-179-126.us-east-2.compute.internal   aws:///us-east-2b/i-0f1ea8865fd3e68f5   Unknown


I0526 01:19:28.695088       1 controller.go:169] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: reconciling Machine
I0526 01:19:28.695101       1 actuator.go:100] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: actuator checking if machine exists
W0526 01:19:28.756428       1 reconciler.go:364] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: Failed to find existing instance by id i-0f1ea8865fd3e68f5: instance i-0f1ea8865fd3e68f5 state "terminated" is not in running, pending, stopped, stopping, shutting-down
E0526 01:19:28.810651       1 utils.go:166] Excluding instance matching zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: instance i-0f1ea8865fd3e68f5 state "terminated" is not in running, pending, stopped, stopping, shutting-down
I0526 01:19:28.810674       1 reconciler.go:210] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: Instance does not exist
I0526 01:19:28.810682       1 controller.go:424] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: going into phase "Failed"
I0526 01:19:28.842111       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunaws525-qtlbn-worker-us-east-2b-h8pq2"}
I0526 01:19:28.842158       1 controller.go:169] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: reconciling Machine
I0526 01:19:28.842166       1 actuator.go:100] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: actuator checking if machine exists
W0526 01:19:28.898814       1 reconciler.go:364] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: Failed to find existing instance by id i-0f1ea8865fd3e68f5: instance i-0f1ea8865fd3e68f5 state "terminated" is not in running, pending, stopped, stopping, shutting-down
E0526 01:19:28.953888       1 utils.go:166] Excluding instance matching zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: instance i-0f1ea8865fd3e68f5 state "terminated" is not in running, pending, stopped, stopping, shutting-down
I0526 01:19:28.953921       1 reconciler.go:210] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: Instance does not exist
I0526 01:19:28.953932       1 controller.go:424] zhsunaws525-qtlbn-worker-us-east-2b-h8pq2: going into phase "Failed"


status:
  addresses:
  - address: 10.0.179.126
    type: InternalIP
  - address: ip-10-0-179-126.us-east-2.compute.internal
    type: InternalDNS
  - address: ip-10-0-179-126.us-east-2.compute.internal
    type: Hostname
  errorMessage: Can't find created instance.
  lastUpdated: "2020-05-26T01:14:07Z"
  nodeRef:
    kind: Node
    name: ip-10-0-179-126.us-east-2.compute.internal
    uid: 43cee894-bb51-4dcc-a304-28a948fe6e67
  phase: Running
  providerStatus:
    conditions:
    - lastProbeTime: "2020-05-25T01:38:59Z"
      lastTransitionTime: "2020-05-25T01:38:59Z"
      message: Machine successfully created
      reason: MachineCreationSucceeded
      status: "True"
      type: MachineCreation
    instanceId: i-0f1ea8865fd3e68f5
    instanceState: running

Expected results:
Machine status.phase should become 'Failed'


Additional info:

Comment 1 Alberto 2020-05-26 08:06:13 UTC

This is expected. Once a machine is given a node is considering in "running" phase. The particular cloud state is reflected in STATE: Unknown. https://github.com/openshift/enhancements/blob/master/enhancements/machine-api/machine-instance-lifecycle.md
We should come up with a more meaningful name to show for the phase similar to what we do for the console. This does not result trivial without disrupting potential existing clients

Comment 2 Alberto 2020-05-29 11:16:47 UTC

Please ignore my comment in https://bugzilla.redhat.com/show_bug.cgi?id=1839952#c1. I miss read the description.

The machine should indeed go failed if the underlying instance is deleted. This should be fixed by https://github.com/openshift/cluster-api-provider-aws/pull/325

Comment 5 sunzhaohua 2020-06-04 02:35:30 UTC

Verified
tested on azure, clusterversion: 4.5.0-0.nightly-2020-06-03-013823, delete an instance from azure web console.
$ oc get machine
NAME                                     PHASE     TYPE              REGION   ZONE   AGE
zhsun63azure-7h44z-master-0              Running   Standard_D8s_v3   westus          18h
zhsun63azure-7h44z-master-1              Running   Standard_D8s_v3   westus          18h
zhsun63azure-7h44z-master-2              Running   Standard_D8s_v3   westus          18h
zhsun63azure-7h44z-worker-westus-4cmjd   Running   Standard_D2s_v3   westus          17h
zhsun63azure-7h44z-worker-westus-hv647   Running   Standard_D2s_v3   westus          17h
zhsun63azure-7h44z-worker-westus-wtz6j   Failed    Standard_D2s_v3   westus          17h

Comment 7 errata-xmlrpc 2020-10-27 16:01:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196