Bug 1860229 - [azure]Machine status should be "Failed" when the provided max price is lower than the current spot price
Summary: [azure]Machine status should be "Failed" when the provided max price is lower...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Alexander Demicev
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-24 04:18 UTC by sunzhaohua
Modified: 2020-10-27 16:17 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:16:56 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-api-provider-azure pull 155 0 None closed Bug 1860229: Fail machine if create request doesn't succeed 2020-12-16 12:42:55 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:17:14 UTC

Description sunzhaohua 2020-07-24 04:18:10 UTC
Description of problem:
Machine status should be "Failed" when the provided max price is lower than the current spot price
 
Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-07-23-055513

How reproducible:
Always

Steps to Reproduce:
1. Creating a spot instance with price lower than spot instance price
          spotVMOptions:
            maxPrice: 0.01

2. Check machines and logs
  
Actual results:
Machine stuck in Provisioning status

$ oc get machine
NAME                                              PHASE          TYPE              REGION           ZONE   AGE
zhsun723azure-krf9d-master-0                      Running        Standard_D8s_v3   northcentralus          15h
zhsun723azure-krf9d-master-1                      Running        Standard_D8s_v3   northcentralus          15h
zhsun723azure-krf9d-master-2                      Running        Standard_D8s_v3   northcentralus          15h
zhsun723azure-krf9d-worker-northcentralus-kzdv9   Failed                                                   10m
zhsun723azure-krf9d-worker-northcentralus-vzhrp   Running        Standard_D2s_v3   northcentralus          15h
zhsun723azure-krf9d-worker-spot-mqwkn             Provisioning                                             25m

status:
  lastUpdated: "2020-07-24T03:08:19Z"
  phase: Provisioning
  providerStatus:
    conditions:
    - lastProbeTime: "2020-07-24T03:08:24Z"
      lastTransitionTime: "2020-07-24T03:08:24Z"
      message: 'failed to create vm zhsun723azure-krf9d-worker-spot-mqwkn: failed
        to create or get machine: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate:
        Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service
        returned an error. Status=<nil> Code="OperationNotAllowed" Message="Unable
        to perform operation ''Create VM'' since the provided max price ''0.01 USD''
        is lower than the current spot price ''0.01808 USD'' for Azure Spot VM size
        ''Standard_D2s_v3''. For more information, see http://aka.ms/AzureSpot/errormessages."'
      reason: MachineCreationFailed
      status: "True"
      type: MachineCreated
    metadata: {}

E0724 03:33:57.777242       1 actuator.go:78] Machine error: failed to reconcile machine "zhsun723azure-krf9d-worker-spot-mqwkn"s: failed to create vm zhsun723azure-krf9d-worker-spot-mqwkn: failed to create or get machine: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Unable to perform operation 'Create VM' since the provided max price '0.01 USD' is lower than the current spot price '0.01808 USD' for Azure Spot VM size 'Standard_D2s_v3'. For more information, see http://aka.ms/AzureSpot/errormessages."
W0724 03:33:57.777286       1 controller.go:315] zhsun723azure-krf9d-worker-spot-mqwkn: failed to create machine: requeue in: 20s
I0724 03:33:57.777298       1 controller.go:405] Actuator returned requeue-after error: requeue in: 20s
I0724 03:33:57.777381       1 recorder.go:52] controller-runtime/manager/events "msg"="Warning"  "message"="CreateError: failed to reconcile machine \"zhsun723azure-krf9d-worker-spot-mqwkn\"s: failed to create vm zhsun723azure-krf9d-worker-spot-mqwkn: failed to create or get machine: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=\u003cnil\u003e Code=\"OperationNotAllowed\" Message=\"Unable to perform operation 'Create VM' since the provided max price '0.01 USD' is lower than the current spot price '0.01808 USD' for Azure Spot VM size 'Standard_D2s_v3'. For more information, see http://aka.ms/AzureSpot/errormessages.\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun723azure-krf9d-worker-spot-mqwkn","uid":"dbf8ba35-7ff8-4e23-86b0-c4d016b2b4a8","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"937764"} "reason"="FailedCreate"

Expected results:
The machine phase is set "Failed"

Additional info:

Comment 2 Joel Speed 2020-08-11 10:12:57 UTC
Out of interest, what do we do here for AWS? Since it would exhibit the same sort of issue, I think at present we hot loop there and keep trying until the spot price is allowed. This may actually be the desired behaviour.

Comment 3 Joel Speed 2020-08-20 16:44:30 UTC
We need to clarify the behaviour of AWS and make a decision on this, will do so next sprint

Comment 4 Joel Speed 2020-09-07 10:05:25 UTC
On AWS, we fail immediately, therefore we should mimic that behaviour here as is done in the linked PR

Comment 7 sunzhaohua 2020-09-10 04:55:57 UTC
Verified
clusterversion: 4.6.0-0.nightly-2020-09-10-031249

      providerSpec:
        value:
          spotVMOptions:
            maxPrice: 0.01

$ oc get machine
NAME                                              PHASE     TYPE              REGION           ZONE   AGE
az-zhsun-0910-grrfc-master-0                      Running   Standard_D8s_v3   northcentralus          59m
az-zhsun-0910-grrfc-master-1                      Running   Standard_D8s_v3   northcentralus          59m
az-zhsun-0910-grrfc-master-2                      Running   Standard_D8s_v3   northcentralus          59m
az-zhsun-0910-grrfc-worker-northcentralus-dlhdt   Running   Standard_D2s_v3   northcentralus          45m
az-zhsun-0910-grrfc-worker-northcentralus-gdrjr   Running   Standard_D2s_v3   northcentralus          45m
az-zhsun-0910-grrfc-worker-northcentralus-gg7qd   Running   Standard_D2s_v3   northcentralus          45m
az-zhsun-0910-grrfc-worker-spot-gjf7v             Failed                                              10s

Comment 9 errata-xmlrpc 2020-10-27 16:16:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.