Bug 1771919 - [GCP] Stopped machine has 'Failed' phase
Summary: [GCP] Stopped machine has 'Failed' phase
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.3.0
Assignee: Alberto
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-13 09:07 UTC by Jianwei Hou
Modified: 2020-01-23 11:12 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-23 11:12:28 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-api-provider-gcp pull 66 'None' closed bug 1771919: Consider an instance to exist regardless the status 2020-05-26 18:52:58 UTC
Red Hat Product Errata RHBA-2020:0062 None None None 2020-01-23 11:12:53 UTC

Description Jianwei Hou 2019-11-13 09:07:20 UTC
Description of problem:
GCP only. with an IPI on GCP, stop the instance of a machine, the machine phase becomes 'Failed'. This behavior is different from AWS/Azure, where the machine phase stays as 'Running'.

Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2019-11-12-204120

How reproducible:
Always

Steps to Reproduce:
1. Stop one worker machine's instance.
2. The machine phase becomes "Failed", got "Error Message:  Can't find created instance". The machine can not be recovered when it enters 'Failed' phase, so restarting the instance can only bring back the node.

Actual results:
After step 2
oc describe machine qe-jho-mdv27-w-a-x69bp
Name:         qe-jho-mdv27-w-a-x69bp
Namespace:    openshift-machine-api
Labels:       machine.openshift.io/cluster-api-cluster=qe-jho-mdv27
              machine.openshift.io/cluster-api-machine-role=worker
              machine.openshift.io/cluster-api-machine-type=worker
              machine.openshift.io/cluster-api-machineset=qe-jho-mdv27-w-a
              machine.openshift.io/instance-type=n1-standard-4
              machine.openshift.io/region=us-central1
              machine.openshift.io/zone=us-central1-a
Annotations:  machine.openshift.io/instance-state: STOPPING
API Version:  machine.openshift.io/v1beta1
Kind:         Machine
Metadata:
  Creation Timestamp:  2019-11-13T03:56:38Z
  Finalizers:
    machine.machine.openshift.io
  Generate Name:  qe-jho-mdv27-w-a-
  Generation:     2
  Owner References:
    API Version:           machine.openshift.io/v1beta1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  MachineSet
    Name:                  qe-jho-mdv27-w-a
    UID:                   e3639f27-3eb3-4aad-9441-c486781596f2
  Resource Version:        97040
  Self Link:               /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/qe-jho-mdv27-w-a-x69bp
  UID:                     c4dbe615-37e3-4ecc-b7f3-60eda291e399
Spec:
  Metadata:
    Creation Timestamp:  <nil>
  Provider ID:           gce://openshift-qe/us-central1-a/qe-jho-mdv27-w-a-x69bp
  Provider Spec:
    Value:
      API Version:     gcpprovider.openshift.io/v1beta1
      Can IP Forward:  false
      Credentials Secret:
        Name:               gcp-cloud-credentials
      Deletion Protection:  false
      Disks:
        Auto Delete:  true
        Boot:         true
        Image:        qe-jho-mdv27-rhcos-image
        Labels:       <nil>
        Size Gb:      128
        Type:         pd-ssd
      Kind:           GCPMachineProviderSpec
      Machine Type:   n1-standard-4
      Metadata:
        Creation Timestamp:  <nil>
      Network Interfaces:
        Network:     qe-jho-mdv27-network
        Subnetwork:  qe-jho-mdv27-worker-subnet
      Project ID:    openshift-qe
      Region:        us-central1
      Service Accounts:
        Email:  qe-jho-mdv27-w@openshift-qe.iam.gserviceaccount.com
        Scopes:
          https://www.googleapis.com/auth/cloud-platform
      Tags:
        qe-jho-mdv27-worker
      User Data Secret:
        Name:  worker-user-data
      Zone:    us-central1-a
Status:
  Addresses:
    Address:      10.0.32.3
    Type:         InternalIP
    Address:      qe-jho-mdv27-w-a-x69bp.us-central1-a.c.openshift-qe.internal
    Type:         InternalDNS
    Address:      qe-jho-mdv27-w-a-x69bp.c.openshift-qe.internal
    Type:         InternalDNS
  Error Message:  Can't find created instance.
  Last Updated:   2019-11-13T08:58:28Z
  Node Ref:
    Kind:  Node
    Name:  qe-jho-mdv27-w-a-x69bp.c.openshift-qe.internal
    UID:   e899ecae-147c-4c19-b964-17ac32090d29
  Phase:   Failed
  Provider Status:
    Conditions:
      Last Probe Time:       2019-11-13T03:56:46Z
      Last Transition Time:  2019-11-13T03:56:46Z
      Message:               machine successfully created
      Reason:                MachineCreationSucceeded
      Status:                True
      Type:                  MachineCreated
    Instance Id:             qe-jho-mdv27-w-a-x69bp
    Instance State:          STOPPING
    Metadata:
      Creation Timestamp:  <nil>
Events:
  Type     Reason        Age                  From           Message
  ----     ------        ----                 ----           -------
  Warning  FailedUpdate  32m (x13 over 5h3m)  gcpcontroller  requeue in: 20s


Expected results:
Machine phase does not become 'Failed' on GCP

Additional info:

Comment 2 Jianwei Hou 2019-11-15 06:10:00 UTC
Verified this is fixed in 4.3.0-0.nightly-2019-11-13-233341.

Comment 4 errata-xmlrpc 2020-01-23 11:12:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.