Bug 1824497

Summary: [aws]Machine status should be "Failed" with an invalid configuration
Product: OpenShift Container Platform Reporter: sunzhaohua <zhsun>
Component: Cloud ComputeAssignee: Joel Speed <jspeed>
Cloud Compute sub component: Other Providers QA Contact: Jianwei Hou <jhou>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: agarcial, jspeed
Version: 4.5   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Errors returned from the cloud-provider actuator no longer matched the expected type due to being wrapped using github.com/pkg/errors Consequence: The Machine controller could not determine that the Machine should be marked as failed Fix: Use error wrapping from the standard library to check the error types Result: Machine controller can now determine when Machines should be marked Failed
Story Points: ---
Clone Of:
: 1825235 1825290 1826017 (view as bug list) Environment:
Last Closed: 2020-07-13 17:27:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1825235, 1825290, 1826017    

Description sunzhaohua 2020-04-16 10:08:05 UTC
Description of problem:
Machine status should be "Failed" when creating a spot instance with price lower than spot instance price
 
Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-04-15-223247

How reproducible:
Always

Steps to Reproduce:
1. Creating a spot instance with price lower than spot instance price
  providerSpec:
    value:
      spotMarketOptions:
        maxPrice: "0.01"
2. Check machines and logs
  
Actual results:
Machine stuck in Provisioning status

$ oc get machine
NAME                                        PHASE          TYPE        REGION      ZONE         AGE
zhsun416aws-rg88g-master-0                  Running        m4.xlarge   us-east-2   us-east-2a   6h53m
zhsun416aws-rg88g-master-1                  Running        m4.xlarge   us-east-2   us-east-2b   6h53m
zhsun416aws-rg88g-master-2                  Running        m4.xlarge   us-east-2   us-east-2c   6h53m
zhsun416aws-rg88g-worker-us-east-2a-txx9k   Running        m4.large    us-east-2   us-east-2a   6h39m
zhsun416aws-rg88g-worker-us-east-2b-9r4rx   Running        m4.large    us-east-2   us-east-2b   6h39m
zhsun416aws-rg88g-worker-us-east-2c-fxxdm   Provisioning                                        4m57s

  lastUpdated: "2020-04-16T09:43:52Z"
  phase: Provisioning
  providerStatus:
    conditions:
    - lastProbeTime: "2020-04-16T09:44:10Z"
      lastTransitionTime: "2020-04-16T09:44:10Z"
      message: 'error launching instance: Your Spot request price of 0.01 is lower
        than the minimum required Spot request fulfillment price of 0.0257.'
      reason: MachineCreationFailed
      status: "False"
      type: MachineCreation

I0416 09:46:31.346461       1 actuator.go:74] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: actuator creating machine
I0416 09:46:31.347178       1 reconciler.go:38] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: creating machine
E0416 09:46:31.347197       1 reconciler.go:221] NodeRef not found in machine zhsun416aws-rg88g-worker-us-east-2c-fxxdm
I0416 09:46:31.372053       1 instances.go:47] No stopped instances found for machine zhsun416aws-rg88g-worker-us-east-2c-fxxdm
I0416 09:46:31.372096       1 instances.go:145] Using AMI ami-0e888b699fa6e37e7
I0416 09:46:31.372108       1 instances.go:77] Describing security groups based on filters
I0416 09:46:31.583386       1 instances.go:122] Describing subnets based on filters
I0416 09:46:32.438067       1 instances.go:331] Error launching instance: SpotMaxPriceTooLow: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
        status code: 400, request id: 3e5331d8-d1e9-4034-833c-15f10ce599f4
E0416 09:46:32.438171       1 reconciler.go:69] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: error creating machine: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
I0416 09:46:32.438187       1 machine_scope.go:134] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: Updating status
I0416 09:46:32.438195       1 machine_scope.go:155] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: finished calculating AWS status
I0416 09:46:32.438215       1 machine_scope.go:80] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: patching machine
E0416 09:46:32.453533       1 actuator.go:65] zhsun416aws-rg88g-worker-us-east-2c-fxxdm error: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
W0416 09:46:32.453594       1 controller.go:311] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: failed to create machine: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
E0416 09:46:32.453654       1 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257."  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsun416aws-rg88g-worker-us-east-2c-fxxdm"}
I0416 09:46:32.453784       1 recorder.go:52] controller-runtime/manager/events "msg"="Warning"  "message"="failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257." "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun416aws-rg88g-worker-us-east-2c-fxxdm","uid":"6dedaf4b-12db-4741-8a26-5555ca8dd11e","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"134275"} "reason"="FailedCreate"


Expected results:
The machine phase is set "Failed"

Additional info:

Comment 1 Joel Speed 2020-04-16 16:37:10 UTC
I've tested this with the same build and have been unable to reproduce. Is there any more information you can provide?

Comment 2 Joel Speed 2020-04-17 09:57:00 UTC
I believe this issue was introduced by a refactor of the Cluster-API-Provider-AWS in 

Machine's will only go into the failed phase when the returned error is an `InvalidMachineConfigurationError` (see: https://github.com/openshift/machine-api-operator/blob/b9b4aaea428abe021d84477bd62a99f806fb64f2/pkg/controller/machine/controller.go#L312-L317)

The error you are seeing here does return this (https://github.com/openshift/cluster-api-provider-aws/blob/025ec74aa743c3834020f4f6a45ac19c1acb76d2/pkg/actuators/machine/instances.go#L261), however it is then wrapped (https://github.com/openshift/cluster-api-provider-aws/blob/025ec74aa743c3834020f4f6a45ac19c1acb76d2/pkg/actuators/machine/reconciler.go#L73) so that it no longer matches the correct type

The check to see if the error is an InvalidMachineConfigurationError (implemented: https://github.com/openshift/machine-api-operator/blob/b9b4aaea428abe021d84477bd62a99f806fb64f2/pkg/controller/machine/controller.go#L312-L317) does not currently support this wrapping. So it will need to be updated to support the wrapping.

Comment 3 sunzhaohua 2020-04-26 07:00:05 UTC
clusterversion: 4.5.0-0.nightly-2020-04-25-170442
Machine status didn't become "Failed" with some invalid configuration, for example "invalid credentialsSecret".
1. Create a machine with invalid ami, machine became to Failed.
$ oc get machine
NAME                                        PHASE     TYPE        REGION      ZONE         AGE
zhsunaws426-rn574-master-0                  Running   m4.xlarge   us-east-2   us-east-2a   5h10m
zhsunaws426-rn574-master-1                  Running   m4.xlarge   us-east-2   us-east-2b   5h10m
zhsunaws426-rn574-master-2                  Running   m4.xlarge   us-east-2   us-east-2c   5h10m
zhsunaws426-rn574-worker-us-east-2a-kcpw2   Failed                                         137m
zhsunaws426-rn574-worker-us-east-2b-xr7m6   Running   m4.large    us-east-2   us-east-2b   3h16m
zhsunaws426-rn574-worker-us-east-2c-fttw7   Running   m4.large    us-east-2   us-east-2c   4h57m

2 Create a machine with invalid credentialsSecret, machine PHASE is empty.
          credentialsSecret:
            name: aws-cloud-credentials-invalid

$ oc get machine
NAME                                        PHASE     TYPE        REGION      ZONE         AGE
zhsunaws426-rn574-master-0                  Running   m4.xlarge   us-east-2   us-east-2a   5h50m
zhsunaws426-rn574-master-1                  Running   m4.xlarge   us-east-2   us-east-2b   5h50m
zhsunaws426-rn574-master-2                  Running   m4.xlarge   us-east-2   us-east-2c   5h50m
zhsunaws426-rn574-worker-us-east-2a-cpsrj                                                  91s
zhsunaws426-rn574-worker-us-east-2b-xr7m6   Running   m4.large    us-east-2   us-east-2b   3h56m
zhsunaws426-rn574-worker-us-east-2c-fttw7   Running   m4.large    us-east-2   us-east-2c   5h36m

I0426 06:51:48.974004       1 actuator.go:97] zhsunaws426-rn574-worker-us-east-2a-cpsrj: actuator checking if machine exists
E0426 06:51:48.974666       1 controller.go:269] zhsunaws426-rn574-worker-us-east-2a-cpsrj: failed to check if machine exists: zhsunaws426-rn574-worker-us-east-2a-cpsrj: failed to create scope for machine: failed to create aws client: aws credentials secret openshift-machine-api/aws-cloud-credentials-invalid: Secret "aws-cloud-credentials-invalid" not found not found
E0426 06:51:48.974751       1 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="zhsunaws426-rn574-worker-us-east-2a-cpsrj: failed to create scope for machine: failed to create aws client: aws credentials secret openshift-machine-api/aws-cloud-credentials-invalid: Secret \"aws-cloud-credentials-invalid\" not found not found"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunaws426-rn574-worker-us-east-2a-cpsrj"}

Comment 4 Alberto 2020-04-27 07:08:43 UTC
>Machine status didn't become "Failed" with some invalid configuration, for example "invalid credentialsSecret".

Exists() will never succeed in that scenario, therefore requeueing the object right away. This is known and tracked here https://bugzilla.redhat.com/show_bug.cgi?id=1805639

For this BZ we need to reproduce the scenario in the description: "message: 'error launching instance: Your Spot request price of 0.01 is lower
        than the minimum required Spot request fulfillment price of 0.0257.'"

Comment 5 sunzhaohua 2020-04-28 07:06:03 UTC
Verified

clusterversion: 4.5.0-0.nightly-2020-04-28-023400

Creating a spot instance with price lower than spot instance price
$ oc get machine
NAME                                        PHASE     TYPE        REGION      ZONE         AGE
zhsun428aws-5x69z-master-0                  Running   m4.xlarge   us-east-2   us-east-2a   57m
zhsun428aws-5x69z-master-1                  Running   m4.xlarge   us-east-2   us-east-2b   57m
zhsun428aws-5x69z-master-2                  Running   m4.xlarge   us-east-2   us-east-2c   57m
zhsun428aws-5x69z-worker-us-east-2a-zczxx   Running   m4.large    us-east-2   us-east-2a   43m
zhsun428aws-5x69z-worker-us-east-2b-8w79c   Running   m4.large    us-east-2   us-east-2b   43m
zhsun428aws-5x69z-worker-us-east-2c-b2w82   Failed                                         17s


E0428 06:58:23.712222       1 reconciler.go:68] zhsun428aws-5x69z-worker-us-east-2c-b2w82: error creating machine: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241.
I0428 06:58:23.712238       1 machine_scope.go:134] zhsun428aws-5x69z-worker-us-east-2c-b2w82: Updating status
I0428 06:58:23.712246       1 machine_scope.go:155] zhsun428aws-5x69z-worker-us-east-2c-b2w82: finished calculating AWS status
I0428 06:58:23.712261       1 machine_scope.go:80] zhsun428aws-5x69z-worker-us-east-2c-b2w82: patching machine
E0428 06:58:23.729450       1 actuator.go:65] zhsun428aws-5x69z-worker-us-east-2c-b2w82 error: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241.
W0428 06:58:23.729500       1 controller.go:312] zhsun428aws-5x69z-worker-us-east-2c-b2w82: failed to create machine: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241.
I0428 06:58:23.729520       1 controller.go:412] Actuator returned invalid configuration error: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241.
I0428 06:58:23.729531       1 controller.go:421] zhsun428aws-5x69z-worker-us-east-2c-b2w82: going into phase "Failed"
I0428 06:58:23.729907       1 recorder.go:52] controller-runtime/manager/events "msg"="Warning"  "message"="failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241." "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun428aws-5x69z-worker-us-east-2c-b2w82","uid":"feaf35a3-2ca9-4a31-88e8-5bc145ca6d24","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"32428"} "reason"="FailedCreate"
I0428 06:58:23.741457       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsun428aws-5x69z-worker-us-east-2c-b2w82"}
I0428 06:58:23.741521       1 controller.go:166] zhsun428aws-5x69z-worker-us-east-2c-b2w82: reconciling Machine
W0428 06:58:23.741534       1 controller.go:263] zhsun428aws-5x69z-worker-us-east-2c-b2w82: machine has gone "Failed" phase. It won't reconcile
I0428 06:58:23.741552       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsun428aws-5x69z-worker-us-east-2c-b2w82"}

Comment 7 errata-xmlrpc 2020-07-13 17:27:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409