Bug 1824497 - [aws]Machine status should be "Failed" with an invalid configuration
Summary: [aws]Machine status should be "Failed" with an invalid configuration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Joel Speed
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks: 1825235 1825290 1826017
TreeView+ depends on / blocked
 
Reported: 2020-04-16 10:08 UTC by sunzhaohua
Modified: 2020-07-13 17:28 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Errors returned from the cloud-provider actuator no longer matched the expected type due to being wrapped using github.com/pkg/errors Consequence: The Machine controller could not determine that the Machine should be marked as failed Fix: Use error wrapping from the standard library to check the error types Result: Machine controller can now determine when Machines should be marked Failed
Clone Of:
: 1825235 1825290 1826017 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:27:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-api-provider-aws pull 316 0 None closed BUG 1824497: Use Go errors instead of github.com/pkg/errors to wrap errors 2021-01-21 21:08:26 UTC
Github openshift machine-api-operator pull 559 0 None closed BUG 1824497: Enable error checks to unwrap errors 2021-01-21 21:07:46 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:28:23 UTC

Description sunzhaohua 2020-04-16 10:08:05 UTC
Description of problem:
Machine status should be "Failed" when creating a spot instance with price lower than spot instance price
 
Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-04-15-223247

How reproducible:
Always

Steps to Reproduce:
1. Creating a spot instance with price lower than spot instance price
  providerSpec:
    value:
      spotMarketOptions:
        maxPrice: "0.01"
2. Check machines and logs
  
Actual results:
Machine stuck in Provisioning status

$ oc get machine
NAME                                        PHASE          TYPE        REGION      ZONE         AGE
zhsun416aws-rg88g-master-0                  Running        m4.xlarge   us-east-2   us-east-2a   6h53m
zhsun416aws-rg88g-master-1                  Running        m4.xlarge   us-east-2   us-east-2b   6h53m
zhsun416aws-rg88g-master-2                  Running        m4.xlarge   us-east-2   us-east-2c   6h53m
zhsun416aws-rg88g-worker-us-east-2a-txx9k   Running        m4.large    us-east-2   us-east-2a   6h39m
zhsun416aws-rg88g-worker-us-east-2b-9r4rx   Running        m4.large    us-east-2   us-east-2b   6h39m
zhsun416aws-rg88g-worker-us-east-2c-fxxdm   Provisioning                                        4m57s

  lastUpdated: "2020-04-16T09:43:52Z"
  phase: Provisioning
  providerStatus:
    conditions:
    - lastProbeTime: "2020-04-16T09:44:10Z"
      lastTransitionTime: "2020-04-16T09:44:10Z"
      message: 'error launching instance: Your Spot request price of 0.01 is lower
        than the minimum required Spot request fulfillment price of 0.0257.'
      reason: MachineCreationFailed
      status: "False"
      type: MachineCreation

I0416 09:46:31.346461       1 actuator.go:74] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: actuator creating machine
I0416 09:46:31.347178       1 reconciler.go:38] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: creating machine
E0416 09:46:31.347197       1 reconciler.go:221] NodeRef not found in machine zhsun416aws-rg88g-worker-us-east-2c-fxxdm
I0416 09:46:31.372053       1 instances.go:47] No stopped instances found for machine zhsun416aws-rg88g-worker-us-east-2c-fxxdm
I0416 09:46:31.372096       1 instances.go:145] Using AMI ami-0e888b699fa6e37e7
I0416 09:46:31.372108       1 instances.go:77] Describing security groups based on filters
I0416 09:46:31.583386       1 instances.go:122] Describing subnets based on filters
I0416 09:46:32.438067       1 instances.go:331] Error launching instance: SpotMaxPriceTooLow: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
        status code: 400, request id: 3e5331d8-d1e9-4034-833c-15f10ce599f4
E0416 09:46:32.438171       1 reconciler.go:69] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: error creating machine: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
I0416 09:46:32.438187       1 machine_scope.go:134] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: Updating status
I0416 09:46:32.438195       1 machine_scope.go:155] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: finished calculating AWS status
I0416 09:46:32.438215       1 machine_scope.go:80] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: patching machine
E0416 09:46:32.453533       1 actuator.go:65] zhsun416aws-rg88g-worker-us-east-2c-fxxdm error: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
W0416 09:46:32.453594       1 controller.go:311] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: failed to create machine: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
E0416 09:46:32.453654       1 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257."  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsun416aws-rg88g-worker-us-east-2c-fxxdm"}
I0416 09:46:32.453784       1 recorder.go:52] controller-runtime/manager/events "msg"="Warning"  "message"="failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257." "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun416aws-rg88g-worker-us-east-2c-fxxdm","uid":"6dedaf4b-12db-4741-8a26-5555ca8dd11e","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"134275"} "reason"="FailedCreate"


Expected results:
The machine phase is set "Failed"

Additional info:

Comment 1 Joel Speed 2020-04-16 16:37:10 UTC
I've tested this with the same build and have been unable to reproduce. Is there any more information you can provide?

Comment 2 Joel Speed 2020-04-17 09:57:00 UTC
I believe this issue was introduced by a refactor of the Cluster-API-Provider-AWS in 

Machine's will only go into the failed phase when the returned error is an `InvalidMachineConfigurationError` (see: https://github.com/openshift/machine-api-operator/blob/b9b4aaea428abe021d84477bd62a99f806fb64f2/pkg/controller/machine/controller.go#L312-L317)

The error you are seeing here does return this (https://github.com/openshift/cluster-api-provider-aws/blob/025ec74aa743c3834020f4f6a45ac19c1acb76d2/pkg/actuators/machine/instances.go#L261), however it is then wrapped (https://github.com/openshift/cluster-api-provider-aws/blob/025ec74aa743c3834020f4f6a45ac19c1acb76d2/pkg/actuators/machine/reconciler.go#L73) so that it no longer matches the correct type

The check to see if the error is an InvalidMachineConfigurationError (implemented: https://github.com/openshift/machine-api-operator/blob/b9b4aaea428abe021d84477bd62a99f806fb64f2/pkg/controller/machine/controller.go#L312-L317) does not currently support this wrapping. So it will need to be updated to support the wrapping.

Comment 3 sunzhaohua 2020-04-26 07:00:05 UTC
clusterversion: 4.5.0-0.nightly-2020-04-25-170442
Machine status didn't become "Failed" with some invalid configuration, for example "invalid credentialsSecret".
1. Create a machine with invalid ami, machine became to Failed.
$ oc get machine
NAME                                        PHASE     TYPE        REGION      ZONE         AGE
zhsunaws426-rn574-master-0                  Running   m4.xlarge   us-east-2   us-east-2a   5h10m
zhsunaws426-rn574-master-1                  Running   m4.xlarge   us-east-2   us-east-2b   5h10m
zhsunaws426-rn574-master-2                  Running   m4.xlarge   us-east-2   us-east-2c   5h10m
zhsunaws426-rn574-worker-us-east-2a-kcpw2   Failed                                         137m
zhsunaws426-rn574-worker-us-east-2b-xr7m6   Running   m4.large    us-east-2   us-east-2b   3h16m
zhsunaws426-rn574-worker-us-east-2c-fttw7   Running   m4.large    us-east-2   us-east-2c   4h57m

2 Create a machine with invalid credentialsSecret, machine PHASE is empty.
          credentialsSecret:
            name: aws-cloud-credentials-invalid

$ oc get machine
NAME                                        PHASE     TYPE        REGION      ZONE         AGE
zhsunaws426-rn574-master-0                  Running   m4.xlarge   us-east-2   us-east-2a   5h50m
zhsunaws426-rn574-master-1                  Running   m4.xlarge   us-east-2   us-east-2b   5h50m
zhsunaws426-rn574-master-2                  Running   m4.xlarge   us-east-2   us-east-2c   5h50m
zhsunaws426-rn574-worker-us-east-2a-cpsrj                                                  91s
zhsunaws426-rn574-worker-us-east-2b-xr7m6   Running   m4.large    us-east-2   us-east-2b   3h56m
zhsunaws426-rn574-worker-us-east-2c-fttw7   Running   m4.large    us-east-2   us-east-2c   5h36m

I0426 06:51:48.974004       1 actuator.go:97] zhsunaws426-rn574-worker-us-east-2a-cpsrj: actuator checking if machine exists
E0426 06:51:48.974666       1 controller.go:269] zhsunaws426-rn574-worker-us-east-2a-cpsrj: failed to check if machine exists: zhsunaws426-rn574-worker-us-east-2a-cpsrj: failed to create scope for machine: failed to create aws client: aws credentials secret openshift-machine-api/aws-cloud-credentials-invalid: Secret "aws-cloud-credentials-invalid" not found not found
E0426 06:51:48.974751       1 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="zhsunaws426-rn574-worker-us-east-2a-cpsrj: failed to create scope for machine: failed to create aws client: aws credentials secret openshift-machine-api/aws-cloud-credentials-invalid: Secret \"aws-cloud-credentials-invalid\" not found not found"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunaws426-rn574-worker-us-east-2a-cpsrj"}

Comment 4 Alberto 2020-04-27 07:08:43 UTC
>Machine status didn't become "Failed" with some invalid configuration, for example "invalid credentialsSecret".

Exists() will never succeed in that scenario, therefore requeueing the object right away. This is known and tracked here https://bugzilla.redhat.com/show_bug.cgi?id=1805639

For this BZ we need to reproduce the scenario in the description: "message: 'error launching instance: Your Spot request price of 0.01 is lower
        than the minimum required Spot request fulfillment price of 0.0257.'"

Comment 5 sunzhaohua 2020-04-28 07:06:03 UTC
Verified

clusterversion: 4.5.0-0.nightly-2020-04-28-023400

Creating a spot instance with price lower than spot instance price
$ oc get machine
NAME                                        PHASE     TYPE        REGION      ZONE         AGE
zhsun428aws-5x69z-master-0                  Running   m4.xlarge   us-east-2   us-east-2a   57m
zhsun428aws-5x69z-master-1                  Running   m4.xlarge   us-east-2   us-east-2b   57m
zhsun428aws-5x69z-master-2                  Running   m4.xlarge   us-east-2   us-east-2c   57m
zhsun428aws-5x69z-worker-us-east-2a-zczxx   Running   m4.large    us-east-2   us-east-2a   43m
zhsun428aws-5x69z-worker-us-east-2b-8w79c   Running   m4.large    us-east-2   us-east-2b   43m
zhsun428aws-5x69z-worker-us-east-2c-b2w82   Failed                                         17s


E0428 06:58:23.712222       1 reconciler.go:68] zhsun428aws-5x69z-worker-us-east-2c-b2w82: error creating machine: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241.
I0428 06:58:23.712238       1 machine_scope.go:134] zhsun428aws-5x69z-worker-us-east-2c-b2w82: Updating status
I0428 06:58:23.712246       1 machine_scope.go:155] zhsun428aws-5x69z-worker-us-east-2c-b2w82: finished calculating AWS status
I0428 06:58:23.712261       1 machine_scope.go:80] zhsun428aws-5x69z-worker-us-east-2c-b2w82: patching machine
E0428 06:58:23.729450       1 actuator.go:65] zhsun428aws-5x69z-worker-us-east-2c-b2w82 error: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241.
W0428 06:58:23.729500       1 controller.go:312] zhsun428aws-5x69z-worker-us-east-2c-b2w82: failed to create machine: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241.
I0428 06:58:23.729520       1 controller.go:412] Actuator returned invalid configuration error: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241.
I0428 06:58:23.729531       1 controller.go:421] zhsun428aws-5x69z-worker-us-east-2c-b2w82: going into phase "Failed"
I0428 06:58:23.729907       1 recorder.go:52] controller-runtime/manager/events "msg"="Warning"  "message"="failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241." "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun428aws-5x69z-worker-us-east-2c-b2w82","uid":"feaf35a3-2ca9-4a31-88e8-5bc145ca6d24","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"32428"} "reason"="FailedCreate"
I0428 06:58:23.741457       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsun428aws-5x69z-worker-us-east-2c-b2w82"}
I0428 06:58:23.741521       1 controller.go:166] zhsun428aws-5x69z-worker-us-east-2c-b2w82: reconciling Machine
W0428 06:58:23.741534       1 controller.go:263] zhsun428aws-5x69z-worker-us-east-2c-b2w82: machine has gone "Failed" phase. It won't reconcile
I0428 06:58:23.741552       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsun428aws-5x69z-worker-us-east-2c-b2w82"}

Comment 7 errata-xmlrpc 2020-07-13 17:27:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.