Bug 1825290 - [GCE] Machine status should be "Failed" with an invalid configuration
Summary: [GCE] Machine status should be "Failed" with an invalid configuration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Joel Speed
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On: 1824497
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-17 15:02 UTC by Joel Speed
Modified: 2020-07-13 17:28 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Errors returned from the cloud-provider actuator no longer matched the expected type due to being wrapped using github.com/pkg/errors Consequence: The Machine controller could not determine that the Machine should be marked as failed Fix: Use error wrapping from the standard library to check the error types Result: Machine controller can now determine when Machines should be marked Failed
Clone Of: 1824497
Environment:
Last Closed: 2020-07-13 17:28:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-api-provider-gcp pull 84 0 None closed BUG 1825290: Switch to Go errors instead of github.com/pkg/errors 2021-02-10 04:49:34 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:28:58 UTC

Description Joel Speed 2020-04-17 15:02:33 UTC
+++ This bug was initially created as a clone of Bug #1824497 +++

Description of problem:
Machine status should be "Failed" when creating a spot instance with price lower than spot instance price
 
Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-04-15-223247

How reproducible:
Always

Steps to Reproduce:
1. Creating a spot instance with price lower than spot instance price
  providerSpec:
    value:
      spotMarketOptions:
        maxPrice: "0.01"
2. Check machines and logs
  
Actual results:
Machine stuck in Provisioning status

$ oc get machine
NAME                                        PHASE          TYPE        REGION      ZONE         AGE
zhsun416aws-rg88g-master-0                  Running        m4.xlarge   us-east-2   us-east-2a   6h53m
zhsun416aws-rg88g-master-1                  Running        m4.xlarge   us-east-2   us-east-2b   6h53m
zhsun416aws-rg88g-master-2                  Running        m4.xlarge   us-east-2   us-east-2c   6h53m
zhsun416aws-rg88g-worker-us-east-2a-txx9k   Running        m4.large    us-east-2   us-east-2a   6h39m
zhsun416aws-rg88g-worker-us-east-2b-9r4rx   Running        m4.large    us-east-2   us-east-2b   6h39m
zhsun416aws-rg88g-worker-us-east-2c-fxxdm   Provisioning                                        4m57s

  lastUpdated: "2020-04-16T09:43:52Z"
  phase: Provisioning
  providerStatus:
    conditions:
    - lastProbeTime: "2020-04-16T09:44:10Z"
      lastTransitionTime: "2020-04-16T09:44:10Z"
      message: 'error launching instance: Your Spot request price of 0.01 is lower
        than the minimum required Spot request fulfillment price of 0.0257.'
      reason: MachineCreationFailed
      status: "False"
      type: MachineCreation

I0416 09:46:31.346461       1 actuator.go:74] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: actuator creating machine
I0416 09:46:31.347178       1 reconciler.go:38] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: creating machine
E0416 09:46:31.347197       1 reconciler.go:221] NodeRef not found in machine zhsun416aws-rg88g-worker-us-east-2c-fxxdm
I0416 09:46:31.372053       1 instances.go:47] No stopped instances found for machine zhsun416aws-rg88g-worker-us-east-2c-fxxdm
I0416 09:46:31.372096       1 instances.go:145] Using AMI ami-0e888b699fa6e37e7
I0416 09:46:31.372108       1 instances.go:77] Describing security groups based on filters
I0416 09:46:31.583386       1 instances.go:122] Describing subnets based on filters
I0416 09:46:32.438067       1 instances.go:331] Error launching instance: SpotMaxPriceTooLow: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
        status code: 400, request id: 3e5331d8-d1e9-4034-833c-15f10ce599f4
E0416 09:46:32.438171       1 reconciler.go:69] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: error creating machine: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
I0416 09:46:32.438187       1 machine_scope.go:134] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: Updating status
I0416 09:46:32.438195       1 machine_scope.go:155] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: finished calculating AWS status
I0416 09:46:32.438215       1 machine_scope.go:80] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: patching machine
E0416 09:46:32.453533       1 actuator.go:65] zhsun416aws-rg88g-worker-us-east-2c-fxxdm error: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
W0416 09:46:32.453594       1 controller.go:311] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: failed to create machine: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
E0416 09:46:32.453654       1 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257."  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsun416aws-rg88g-worker-us-east-2c-fxxdm"}
I0416 09:46:32.453784       1 recorder.go:52] controller-runtime/manager/events "msg"="Warning"  "message"="failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257." "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun416aws-rg88g-worker-us-east-2c-fxxdm","uid":"6dedaf4b-12db-4741-8a26-5555ca8dd11e","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"134275"} "reason"="FailedCreate"


Expected results:
The machine phase is set "Failed"

Additional info:

--- Additional comment from Joel Speed on 2020-04-16 16:37:10 UTC ---

I've tested this with the same build and have been unable to reproduce. Is there any more information you can provide?

--- Additional comment from Joel Speed on 2020-04-17 09:57:00 UTC ---

I believe this issue was introduced by a refactor of the Cluster-API-Provider-AWS in 

Machine's will only go into the failed phase when the returned error is an `InvalidMachineConfigurationError` (see: https://github.com/openshift/machine-api-operator/blob/b9b4aaea428abe021d84477bd62a99f806fb64f2/pkg/controller/machine/controller.go#L312-L317)

The error you are seeing here does return this (https://github.com/openshift/cluster-api-provider-aws/blob/025ec74aa743c3834020f4f6a45ac19c1acb76d2/pkg/actuators/machine/instances.go#L261), however it is then wrapped (https://github.com/openshift/cluster-api-provider-aws/blob/025ec74aa743c3834020f4f6a45ac19c1acb76d2/pkg/actuators/machine/reconciler.go#L73) so that it no longer matches the correct type

The check to see if the error is an InvalidMachineConfigurationError (implemented: https://github.com/openshift/machine-api-operator/blob/b9b4aaea428abe021d84477bd62a99f806fb64f2/pkg/controller/machine/controller.go#L312-L317) does not currently support this wrapping. So it will need to be updated to support the wrapping.

Comment 1 Alberto 2020-04-27 12:56:09 UTC
All PRs are merged. Seems automation failed to update status.
Setting to modified

Comment 5 sunzhaohua 2020-05-08 02:25:44 UTC
verified
clusterversion: 4.5.0-0.nightly-2020-05-07-144853

$ oc get machine
NAME                     PHASE     TYPE            REGION        ZONE            AGE
zhsung-b5bff-m-0         Running   n1-standard-4   us-central1   us-central1-a   39m
zhsung-b5bff-m-1         Running   n1-standard-4   us-central1   us-central1-b   39m
zhsung-b5bff-m-2         Running   n1-standard-4   us-central1   us-central1-c   39m
zhsung-b5bff-w-a-p8zzv   Running   n1-standard-4   us-central1   us-central1-a   24m
zhsung-b5bff-w-b-2wj9k   Running   n1-standard-4   us-central1   us-central1-b   24m
zhsung-b5bff-w-c-l7mxd   Running   n1-standard-4   us-central1   us-central1-c   24m
zhsung-b5bff-w-f-gw25b   Failed                                                  91s

I0508 02:23:41.339369       1 controller.go:82] controllers/MachineSet "msg"="Reconciling" "machineset"="zhsung-b5bff-w-f" "namespace"="openshift-machine-api" 
I0508 02:23:41.386005       1 controller.go:166] zhsung-b5bff-w-f-gw25b: reconciling Machine
I0508 02:23:41.399656       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsung-b5bff-w-f-gw25b"}
I0508 02:23:41.400457       1 controller.go:166] zhsung-b5bff-w-f-gw25b: reconciling Machine
I0508 02:23:41.400818       1 actuator.go:75] zhsung-b5bff-w-f-gw25b: Checking if machine exists
E0508 02:23:41.610579       1 controller.go:105] controllers/MachineSet "msg"="Failed to reconcile MachineSet" "error"="error fetching machine type \"n1-standard-4-invalid\": error fetching machine type \"n1-standard-4-invalid\" in zone \"us-central1-f\": googleapi: Error 404: The resource 'projects/openshift-qe/zones/us-central1-f/machineTypes/n1-standard-4-invalid' was not found, notFound" "machineset"="zhsung-b5bff-w-f" "namespace"="openshift-machine-api" 
I0508 02:23:41.611905       1 recorder.go:52] controller-runtime/manager/events "msg"="Warning"  "message"="error fetching machine type \"n1-standard-4-invalid\": error fetching machine type \"n1-standard-4-invalid\" in zone \"us-central1-f\": googleapi: Error 404: The resource 'projects/openshift-qe/zones/us-central1-f/machineTypes/n1-standard-4-invalid' was not found, notFound" "object"={"kind":"MachineSet","namespace":"openshift-machine-api","name":"zhsung-b5bff-w-f","uid":"afce054e-4822-4116-9011-fe5be3fdb093","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"28001"} "reason"="ReconcileError"
I0508 02:23:41.618821       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machineset" "request"={"Namespace":"openshift-machine-api","Name":"zhsung-b5bff-w-f"}
I0508 02:23:41.618890       1 controller.go:82] controllers/MachineSet "msg"="Reconciling" "machineset"="zhsung-b5bff-w-f" "namespace"="openshift-machine-api" 
E0508 02:23:41.744165       1 controller.go:105] controllers/MachineSet "msg"="Failed to reconcile MachineSet" "error"="error fetching machine type \"n1-standard-4-invalid\": error fetching machine type \"n1-standard-4-invalid\" in zone \"us-central1-f\": googleapi: Error 404: The resource 'projects/openshift-qe/zones/us-central1-f/machineTypes/n1-standard-4-invalid' was not found, notFound" "machineset"="zhsung-b5bff-w-f" "namespace"="openshift-machine-api" 
I0508 02:23:41.744676       1 recorder.go:52] controller-runtime/manager/events "msg"="Warning"  "message"="error fetching machine type \"n1-standard-4-invalid\": error fetching machine type \"n1-standard-4-invalid\" in zone \"us-central1-f\": googleapi: Error 404: The resource 'projects/openshift-qe/zones/us-central1-f/machineTypes/n1-standard-4-invalid' was not found, notFound" "object"={"kind":"MachineSet","namespace":"openshift-machine-api","name":"zhsung-b5bff-w-f","uid":"afce054e-4822-4116-9011-fe5be3fdb093","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"28005"} "reason"="ReconcileError"
I0508 02:23:41.752495       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machineset" "request"={"Namespace":"openshift-machine-api","Name":"zhsung-b5bff-w-f"}
I0508 02:23:41.848534       1 reconciler.go:302] zhsung-b5bff-w-f-gw25b: Machine does not exist
I0508 02:23:41.848745       1 controller.go:421] zhsung-b5bff-w-f-gw25b: going into phase "Provisioning"
I0508 02:23:41.860465       1 controller.go:310] zhsung-b5bff-w-f-gw25b: reconciling machine triggers idempotent create
I0508 02:23:41.860497       1 actuator.go:57] zhsung-b5bff-w-f-gw25b: Creating machine
I0508 02:23:42.418022       1 reconciler.go:168] zhsung-b5bff-w-f-gw25b: Reconciling machine object with cloud state
I0508 02:23:42.418057       1 reconciler.go:147] Error launching instance: googleapi: Error 400: Invalid value for field 'resource.machineType': 'zones/us-central1-f/machineTypes/n1-standard-4-invalid'. Machine type with name 'n1-standard-4-invalid' does not exist in zone 'us-central1-f'., invalid
I0508 02:23:42.418234       1 machine_scope.go:161] "zhsung-b5bff-w-f-gw25b": patching machine
W0508 02:23:42.437368       1 controller.go:312] zhsung-b5bff-w-f-gw25b: failed to create machine: error launching instance: googleapi: Error 400: Invalid value for field 'resource.machineType': 'zones/us-central1-f/machineTypes/n1-standard-4-invalid'. Machine type with name 'n1-standard-4-invalid' does not exist in zone 'us-central1-f'., invalid
I0508 02:23:42.437395       1 controller.go:412] Actuator returned invalid configuration error: error launching instance: googleapi: Error 400: Invalid value for field 'resource.machineType': 'zones/us-central1-f/machineTypes/n1-standard-4-invalid'. Machine type with name 'n1-standard-4-invalid' does not exist in zone 'us-central1-f'., invalid
I0508 02:23:42.437402       1 controller.go:421] zhsung-b5bff-w-f-gw25b: going into phase "Failed"
I0508 02:23:42.438291       1 recorder.go:52] controller-runtime/manager/events "msg"="Warning"  "message"="error launching instance: googleapi: Error 400: Invalid value for field 'resource.machineType': 'zones/us-central1-f/machineTypes/n1-standard-4-invalid'. Machine type with name 'n1-standard-4-invalid' does not exist in zone 'us-central1-f'., invalid" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsung-b5bff-w-f-gw25b","uid":"ed508997-091a-4ad7-90f3-1099d5342ac6","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"28011"} "reason"="FailedCreate"
I0508 02:23:42.446819       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsung-b5bff-w-f-gw25b"}
I0508 02:23:42.447467       1 controller.go:166] zhsung-b5bff-w-f-gw25b: reconciling Machine
W0508 02:23:42.447500       1 controller.go:263] zhsung-b5bff-w-f-gw25b: machine has gone "Failed" phase. It won't reconcile

Comment 6 errata-xmlrpc 2020-07-13 17:28:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.