Bug 1825235 - [azure] Machine status should be "Failed" with an invalid configuration
Summary: [azure] Machine status should be "Failed" with an invalid configuration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Joel Speed
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On: 1824497
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-17 13:06 UTC by Joel Speed
Modified: 2020-07-13 17:28 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Errors returned from the cloud-provider actuator no longer matched the expected type due to being wrapped using github.com/pkg/errors Consequence: The Machine controller could not determine that the Machine should be marked as failed Fix: Use error wrapping from the standard library to check the error types Result: Machine controller can now determine when Machines should be marked Failed
Clone Of: 1824497
Environment:
Last Closed: 2020-07-13 17:28:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-api-provider-azure pull 124 0 None closed BUG 1825235: Switch to Go errors instead of github.com/pkg/errors 2020-07-16 18:00:02 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:28:54 UTC

Description Joel Speed 2020-04-17 13:06:10 UTC
+++ This bug was initially created as a clone of Bug #1824497 +++

Description of problem:
Machine status should be "Failed" when creating a spot instance with price lower than spot instance price
 
Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-04-15-223247

How reproducible:
Always

Steps to Reproduce:
1. Creating a spot instance with price lower than spot instance price
  providerSpec:
    value:
      spotMarketOptions:
        maxPrice: "0.01"
2. Check machines and logs
  
Actual results:
Machine stuck in Provisioning status

$ oc get machine
NAME                                        PHASE          TYPE        REGION      ZONE         AGE
zhsun416aws-rg88g-master-0                  Running        m4.xlarge   us-east-2   us-east-2a   6h53m
zhsun416aws-rg88g-master-1                  Running        m4.xlarge   us-east-2   us-east-2b   6h53m
zhsun416aws-rg88g-master-2                  Running        m4.xlarge   us-east-2   us-east-2c   6h53m
zhsun416aws-rg88g-worker-us-east-2a-txx9k   Running        m4.large    us-east-2   us-east-2a   6h39m
zhsun416aws-rg88g-worker-us-east-2b-9r4rx   Running        m4.large    us-east-2   us-east-2b   6h39m
zhsun416aws-rg88g-worker-us-east-2c-fxxdm   Provisioning                                        4m57s

  lastUpdated: "2020-04-16T09:43:52Z"
  phase: Provisioning
  providerStatus:
    conditions:
    - lastProbeTime: "2020-04-16T09:44:10Z"
      lastTransitionTime: "2020-04-16T09:44:10Z"
      message: 'error launching instance: Your Spot request price of 0.01 is lower
        than the minimum required Spot request fulfillment price of 0.0257.'
      reason: MachineCreationFailed
      status: "False"
      type: MachineCreation

I0416 09:46:31.346461       1 actuator.go:74] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: actuator creating machine
I0416 09:46:31.347178       1 reconciler.go:38] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: creating machine
E0416 09:46:31.347197       1 reconciler.go:221] NodeRef not found in machine zhsun416aws-rg88g-worker-us-east-2c-fxxdm
I0416 09:46:31.372053       1 instances.go:47] No stopped instances found for machine zhsun416aws-rg88g-worker-us-east-2c-fxxdm
I0416 09:46:31.372096       1 instances.go:145] Using AMI ami-0e888b699fa6e37e7
I0416 09:46:31.372108       1 instances.go:77] Describing security groups based on filters
I0416 09:46:31.583386       1 instances.go:122] Describing subnets based on filters
I0416 09:46:32.438067       1 instances.go:331] Error launching instance: SpotMaxPriceTooLow: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
        status code: 400, request id: 3e5331d8-d1e9-4034-833c-15f10ce599f4
E0416 09:46:32.438171       1 reconciler.go:69] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: error creating machine: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
I0416 09:46:32.438187       1 machine_scope.go:134] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: Updating status
I0416 09:46:32.438195       1 machine_scope.go:155] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: finished calculating AWS status
I0416 09:46:32.438215       1 machine_scope.go:80] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: patching machine
E0416 09:46:32.453533       1 actuator.go:65] zhsun416aws-rg88g-worker-us-east-2c-fxxdm error: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
W0416 09:46:32.453594       1 controller.go:311] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: failed to create machine: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.
E0416 09:46:32.453654       1 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257."  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsun416aws-rg88g-worker-us-east-2c-fxxdm"}
I0416 09:46:32.453784       1 recorder.go:52] controller-runtime/manager/events "msg"="Warning"  "message"="failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257." "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun416aws-rg88g-worker-us-east-2c-fxxdm","uid":"6dedaf4b-12db-4741-8a26-5555ca8dd11e","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"134275"} "reason"="FailedCreate"


Expected results:
The machine phase is set "Failed"

Additional info:

--- Additional comment from Joel Speed on 2020-04-16 16:37:10 UTC ---

I've tested this with the same build and have been unable to reproduce. Is there any more information you can provide?

--- Additional comment from Joel Speed on 2020-04-17 09:57:00 UTC ---

I believe this issue was introduced by a refactor of the Cluster-API-Provider-AWS in 

Machine's will only go into the failed phase when the returned error is an `InvalidMachineConfigurationError` (see: https://github.com/openshift/machine-api-operator/blob/b9b4aaea428abe021d84477bd62a99f806fb64f2/pkg/controller/machine/controller.go#L312-L317)

The error you are seeing here does return this (https://github.com/openshift/cluster-api-provider-aws/blob/025ec74aa743c3834020f4f6a45ac19c1acb76d2/pkg/actuators/machine/instances.go#L261), however it is then wrapped (https://github.com/openshift/cluster-api-provider-aws/blob/025ec74aa743c3834020f4f6a45ac19c1acb76d2/pkg/actuators/machine/reconciler.go#L73) so that it no longer matches the correct type

The check to see if the error is an InvalidMachineConfigurationError (implemented: https://github.com/openshift/machine-api-operator/blob/b9b4aaea428abe021d84477bd62a99f806fb64f2/pkg/controller/machine/controller.go#L312-L317) does not currently support this wrapping. So it will need to be updated to support the wrapping.

Comment 1 Joel Speed 2020-04-17 13:07:01 UTC
This issue is also present in the Azure provider, will block this on the AWS provider

Comment 2 Alberto 2020-04-27 12:55:21 UTC
All PRs are merged. Seems automation failed to update status.
Setting to modified

Comment 5 sunzhaohua 2020-04-28 06:48:37 UTC
Verified
Create a machineset with invalid image, machine goes into phase "Failed".
$ oc get machine
NAME                                              PHASE     TYPE              REGION   ZONE   AGE
zhsunazure428-jvwvn-master-0                      Running   Standard_D8s_v3   westus          3h53m
zhsunazure428-jvwvn-master-1                      Running   Standard_D8s_v3   westus          3h53m
zhsunazure428-jvwvn-master-2                      Running   Standard_D8s_v3   westus          3h53m
zhsunazure428-jvwvn-worker-westus-invalid-kw2sc   Failed                                      59s
zhsunazure428-jvwvn-worker-westus-wz74f           Running   Standard_D2s_v3   westus          3h43m
zhsunazure428-jvwvn-worker-westus-xxjdx           Running   Standard_D2s_v3   westus          3h43m

E0428 06:36:28.616435       1 actuator.go:79] Machine error: failed to reconcile machine "zhsunazure428-jvwvn-worker-westus-invalid-kw2sc": compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=404 -- Original Error: Code="NotFound" Message="The Image '/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunazure428-jvwvn-rg/providers/Microsoft.Compute/images/zhsunazure428-jvwvn-invalid' cannot be found in 'westus' region."
W0428 06:36:28.616526       1 controller.go:312] zhsunazure428-jvwvn-worker-westus-invalid-kw2sc: failed to create machine: failed to reconcile machine "zhsunazure428-jvwvn-worker-westus-invalid-kw2sc": compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=404 -- Original Error: Code="NotFound" Message="The Image '/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunazure428-jvwvn-rg/providers/Microsoft.Compute/images/zhsunazure428-jvwvn-invalid' cannot be found in 'westus' region."
I0428 06:36:28.616564       1 controller.go:412] Actuator returned invalid configuration error: failed to reconcile machine "zhsunazure428-jvwvn-worker-westus-invalid-kw2sc": compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=404 -- Original Error: Code="NotFound" Message="The Image '/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunazure428-jvwvn-rg/providers/Microsoft.Compute/images/zhsunazure428-jvwvn-invalid' cannot be found in 'westus' region."
I0428 06:36:28.616581       1 controller.go:421] zhsunazure428-jvwvn-worker-westus-invalid-kw2sc: going into phase "Failed"
I0428 06:36:28.616544       1 recorder.go:52] controller-runtime/manager/events "msg"="Warning"  "message"="InvalidConfiguration: failed to reconcile machine \"zhsunazure428-jvwvn-worker-westus-invalid-kw2sc\": compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=404 -- Original Error: Code=\"NotFound\" Message=\"The Image '/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunazure428-jvwvn-rg/providers/Microsoft.Compute/images/zhsunazure428-jvwvn-invalid' cannot be found in 'westus' region.\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsunazure428-jvwvn-worker-westus-invalid-kw2sc","uid":"eff0a38b-2b3d-4398-b3a8-34e026d75fbc","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"105848"} "reason"="FailedCreate"
I0428 06:36:28.637427       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure428-jvwvn-worker-westus-invalid-kw2sc"}
I0428 06:36:28.637499       1 controller.go:166] zhsunazure428-jvwvn-worker-westus-invalid-kw2sc: reconciling Machine
W0428 06:36:28.637517       1 controller.go:263] zhsunazure428-jvwvn-worker-westus-invalid-kw2sc: machine has gone "Failed" phase. It won't reconcile
I0428 06:36:28.637540       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure428-jvwvn-worker-westus-invalid-kw2sc"}

Comment 6 errata-xmlrpc 2020-07-13 17:28:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.