Description of problem: GCP only. with an IPI on GCP, stop the instance of a machine, the machine phase becomes 'Failed'. This behavior is different from AWS/Azure, where the machine phase stays as 'Running'. Version-Release number of selected component (if applicable): 4.3.0-0.nightly-2019-11-12-204120 How reproducible: Always Steps to Reproduce: 1. Stop one worker machine's instance. 2. The machine phase becomes "Failed", got "Error Message: Can't find created instance". The machine can not be recovered when it enters 'Failed' phase, so restarting the instance can only bring back the node. Actual results: After step 2 oc describe machine qe-jho-mdv27-w-a-x69bp Name: qe-jho-mdv27-w-a-x69bp Namespace: openshift-machine-api Labels: machine.openshift.io/cluster-api-cluster=qe-jho-mdv27 machine.openshift.io/cluster-api-machine-role=worker machine.openshift.io/cluster-api-machine-type=worker machine.openshift.io/cluster-api-machineset=qe-jho-mdv27-w-a machine.openshift.io/instance-type=n1-standard-4 machine.openshift.io/region=us-central1 machine.openshift.io/zone=us-central1-a Annotations: machine.openshift.io/instance-state: STOPPING API Version: machine.openshift.io/v1beta1 Kind: Machine Metadata: Creation Timestamp: 2019-11-13T03:56:38Z Finalizers: machine.machine.openshift.io Generate Name: qe-jho-mdv27-w-a- Generation: 2 Owner References: API Version: machine.openshift.io/v1beta1 Block Owner Deletion: true Controller: true Kind: MachineSet Name: qe-jho-mdv27-w-a UID: e3639f27-3eb3-4aad-9441-c486781596f2 Resource Version: 97040 Self Link: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/qe-jho-mdv27-w-a-x69bp UID: c4dbe615-37e3-4ecc-b7f3-60eda291e399 Spec: Metadata: Creation Timestamp: <nil> Provider ID: gce://openshift-qe/us-central1-a/qe-jho-mdv27-w-a-x69bp Provider Spec: Value: API Version: gcpprovider.openshift.io/v1beta1 Can IP Forward: false Credentials Secret: Name: gcp-cloud-credentials Deletion Protection: false Disks: Auto Delete: true Boot: true Image: qe-jho-mdv27-rhcos-image Labels: <nil> Size Gb: 128 Type: pd-ssd Kind: GCPMachineProviderSpec Machine Type: n1-standard-4 Metadata: Creation Timestamp: <nil> Network Interfaces: Network: qe-jho-mdv27-network Subnetwork: qe-jho-mdv27-worker-subnet Project ID: openshift-qe Region: us-central1 Service Accounts: Email: qe-jho-mdv27-w.gserviceaccount.com Scopes: https://www.googleapis.com/auth/cloud-platform Tags: qe-jho-mdv27-worker User Data Secret: Name: worker-user-data Zone: us-central1-a Status: Addresses: Address: 10.0.32.3 Type: InternalIP Address: qe-jho-mdv27-w-a-x69bp.us-central1-a.c.openshift-qe.internal Type: InternalDNS Address: qe-jho-mdv27-w-a-x69bp.c.openshift-qe.internal Type: InternalDNS Error Message: Can't find created instance. Last Updated: 2019-11-13T08:58:28Z Node Ref: Kind: Node Name: qe-jho-mdv27-w-a-x69bp.c.openshift-qe.internal UID: e899ecae-147c-4c19-b964-17ac32090d29 Phase: Failed Provider Status: Conditions: Last Probe Time: 2019-11-13T03:56:46Z Last Transition Time: 2019-11-13T03:56:46Z Message: machine successfully created Reason: MachineCreationSucceeded Status: True Type: MachineCreated Instance Id: qe-jho-mdv27-w-a-x69bp Instance State: STOPPING Metadata: Creation Timestamp: <nil> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedUpdate 32m (x13 over 5h3m) gcpcontroller requeue in: 20s Expected results: Machine phase does not become 'Failed' on GCP Additional info:
Verified this is fixed in 4.3.0-0.nightly-2019-11-13-233341.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062