Bug 1697278

Summary: [cloud] Machine didn't recreate instance after an instance was deleted at web console
Product: OpenShift Container Platform Reporter: sunzhaohua <zhsun>
Component: Cloud ComputeAssignee: Jan Chaloupka <jchaloup>
Status: CLOSED ERRATA QA Contact: sunzhaohua <zhsun>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1.0CC: agarcial, jhou
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:47:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description sunzhaohua 2019-04-08 09:28:28 UTC
Description of problem:
When manually deleting an instance at aws web console the machine didn't re-create the instance.

Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.11   True        False         5h12m   Cluster version is 4.0.0-0.11

How reproducible:
Always

Steps to Reproduce:
1. Manually delete an instance 
2. Check machine, node and logs
3.

Actual results:
$ oc get machine
NAME                                       INSTANCE              STATE     TYPE        REGION           ZONE              AGE
zhsun-gjs5l-master-0                       i-03e9c7387bb937d6d   running   m4.xlarge   ap-southeast-1   ap-southeast-1a   6h6m
zhsun-gjs5l-master-1                       i-06ddeec845dd62802   running   m4.xlarge   ap-southeast-1   ap-southeast-1b   6h6m
zhsun-gjs5l-master-2                       i-055cce4d0f77dbdb4   running   m4.xlarge   ap-southeast-1   ap-southeast-1c   6h6m
zhsun-gjs5l-worker-ap-southeast-1a-6b8qn   i-0164cf92a8c5b9999   running   m4.large    ap-southeast-1   ap-southeast-1a   6h5m
zhsun-gjs5l-worker-ap-southeast-1b-42qwf   i-099572c57364349e0   running   m4.large    ap-southeast-1   ap-southeast-1b   6h5m
zhsun-gjs5l-worker-ap-southeast-1c-7gmtg   i-00778309293954634   running   m4.large    ap-southeast-1   ap-southeast-1c   39m

$ oc get node
NAME                                                STATUS   ROLES    AGE     VERSION
ip-172-31-136-254.ap-southeast-1.compute.internal   Ready    master   6h8m    v1.12.4+509916ce1
ip-172-31-143-190.ap-southeast-1.compute.internal   Ready    worker   5h56m   v1.12.4+509916ce1
ip-172-31-151-144.ap-southeast-1.compute.internal   Ready    worker   5h57m   v1.12.4+509916ce1
ip-172-31-151-74.ap-southeast-1.compute.internal    Ready    master   6h8m    v1.12.4+509916ce1
ip-172-31-162-203.ap-southeast-1.compute.internal   Ready    master   6h8m    v1.12.4+509916ce1

$ oc describe machine zhsun-gjs5l-worker-ap-southeast-1c-7gmtg
Status:
  Addresses:
    Address:     172.31.167.140
    Type:        InternalIP
    Address:     
    Type:        ExternalDNS
    Address:     ip-172-31-167-140.ap-southeast-1.compute.internal
    Type:        InternalDNS
  Last Updated:  2019-04-08T08:16:15Z
  Node Ref:
    Kind:  Node
    Name:  ip-172-31-167-140.ap-southeast-1.compute.internal
    UID:   22e420d6-59d5-11e9-aef5-0a687a9ce044
  Provider Status:
    API Version:  awsproviderconfig.openshift.io/v1beta1
    Conditions:
      Last Probe Time:       2019-04-08T08:16:15Z
      Last Transition Time:  2019-04-08T07:57:45Z
      Message:               error determining if machine is master: failed to get node from machine zhsun-gjs5l-worker-ap-southeast-1c-7gmtg
      Reason:                MachineCreationFailed
      Status:                True
      Type:                  MachineCreation
    Instance Id:             i-00778309293954634
    Instance State:          running
    Kind:                    AWSMachineProviderStatus
Events:
  Type     Reason        Age                   From            Message
  ----     ------        ----                  ----            -------
  Normal   Created       37m                   aws-controller  Created Machine zhsun-gjs5l-worker-ap-southeast-1c-7gmtg
  Normal   Updated       25m (x7 over 37m)     aws-controller  Updated machine zhsun-gjs5l-worker-ap-southeast-1c-7gmtg
  Warning  FailedCreate  2m24s (x20 over 18m)  aws-controller  CreateError



$ oc logs -f clusterapi-manager-controllers-cc46df86c-ll296 -c machine-controller

I0408 08:32:46.361347       1 controller.go:128] Reconciling Machine "zhsun-gjs5l-worker-ap-southeast-1c-7gmtg"
I0408 08:32:46.361386       1 controller.go:300] Machine "zhsun-gjs5l-worker-ap-southeast-1c-7gmtg" in namespace "cluster.k8s.io/cluster-name" doesn't specify "openshift-machine-api" label, assuming nil cluster
I0408 08:32:46.361404       1 actuator.go:371] Checking if machine exists
I0408 08:32:46.472641       1 actuator.go:379] Instance does not exist
I0408 08:32:46.472667       1 controller.go:249] Reconciling machine object zhsun-gjs5l-worker-ap-southeast-1c-7gmtg triggers idempotent create.
I0408 08:32:46.472675       1 actuator.go:107] creating machine
E0408 08:32:46.472961       1 actuator.go:101] Machine error: error determining if machine is master: failed to get node from machine zhsun-gjs5l-worker-ap-southeast-1c-7gmtg
E0408 08:32:46.472979       1 actuator.go:110] error creating machine: error determining if machine is master: failed to get node from machine zhsun-gjs5l-worker-ap-southeast-1c-7gmtg


Expected results:
Instance could be re-provisioned successful.

Additional info:

Comment 3 sunzhaohua 2019-04-22 10:01:58 UTC
Verified.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-04-22-005054   True        False         3h24m   Cluster version is 4.1.0-0.nightly-2019-04-22-005054

Comment 5 errata-xmlrpc 2019-06-04 10:47:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758