Bug 1860229

Summary: [azure]Machine status should be "Failed" when the provided max price is lower than the current spot price
Product: OpenShift Container Platform Reporter: sunzhaohua <zhsun>
Component: Cloud ComputeAssignee: Alexander Demicev <ademicev>
Cloud Compute sub component: Other Providers QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium    
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:16:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description sunzhaohua 2020-07-24 04:18:10 UTC
Description of problem:
Machine status should be "Failed" when the provided max price is lower than the current spot price
 
Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-07-23-055513

How reproducible:
Always

Steps to Reproduce:
1. Creating a spot instance with price lower than spot instance price
          spotVMOptions:
            maxPrice: 0.01

2. Check machines and logs
  
Actual results:
Machine stuck in Provisioning status

$ oc get machine
NAME                                              PHASE          TYPE              REGION           ZONE   AGE
zhsun723azure-krf9d-master-0                      Running        Standard_D8s_v3   northcentralus          15h
zhsun723azure-krf9d-master-1                      Running        Standard_D8s_v3   northcentralus          15h
zhsun723azure-krf9d-master-2                      Running        Standard_D8s_v3   northcentralus          15h
zhsun723azure-krf9d-worker-northcentralus-kzdv9   Failed                                                   10m
zhsun723azure-krf9d-worker-northcentralus-vzhrp   Running        Standard_D2s_v3   northcentralus          15h
zhsun723azure-krf9d-worker-spot-mqwkn             Provisioning                                             25m

status:
  lastUpdated: "2020-07-24T03:08:19Z"
  phase: Provisioning
  providerStatus:
    conditions:
    - lastProbeTime: "2020-07-24T03:08:24Z"
      lastTransitionTime: "2020-07-24T03:08:24Z"
      message: 'failed to create vm zhsun723azure-krf9d-worker-spot-mqwkn: failed
        to create or get machine: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate:
        Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service
        returned an error. Status=<nil> Code="OperationNotAllowed" Message="Unable
        to perform operation ''Create VM'' since the provided max price ''0.01 USD''
        is lower than the current spot price ''0.01808 USD'' for Azure Spot VM size
        ''Standard_D2s_v3''. For more information, see http://aka.ms/AzureSpot/errormessages."'
      reason: MachineCreationFailed
      status: "True"
      type: MachineCreated
    metadata: {}

E0724 03:33:57.777242       1 actuator.go:78] Machine error: failed to reconcile machine "zhsun723azure-krf9d-worker-spot-mqwkn"s: failed to create vm zhsun723azure-krf9d-worker-spot-mqwkn: failed to create or get machine: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Unable to perform operation 'Create VM' since the provided max price '0.01 USD' is lower than the current spot price '0.01808 USD' for Azure Spot VM size 'Standard_D2s_v3'. For more information, see http://aka.ms/AzureSpot/errormessages."
W0724 03:33:57.777286       1 controller.go:315] zhsun723azure-krf9d-worker-spot-mqwkn: failed to create machine: requeue in: 20s
I0724 03:33:57.777298       1 controller.go:405] Actuator returned requeue-after error: requeue in: 20s
I0724 03:33:57.777381       1 recorder.go:52] controller-runtime/manager/events "msg"="Warning"  "message"="CreateError: failed to reconcile machine \"zhsun723azure-krf9d-worker-spot-mqwkn\"s: failed to create vm zhsun723azure-krf9d-worker-spot-mqwkn: failed to create or get machine: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=\u003cnil\u003e Code=\"OperationNotAllowed\" Message=\"Unable to perform operation 'Create VM' since the provided max price '0.01 USD' is lower than the current spot price '0.01808 USD' for Azure Spot VM size 'Standard_D2s_v3'. For more information, see http://aka.ms/AzureSpot/errormessages.\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun723azure-krf9d-worker-spot-mqwkn","uid":"dbf8ba35-7ff8-4e23-86b0-c4d016b2b4a8","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"937764"} "reason"="FailedCreate"

Expected results:
The machine phase is set "Failed"

Additional info:

Comment 2 Joel Speed 2020-08-11 10:12:57 UTC
Out of interest, what do we do here for AWS? Since it would exhibit the same sort of issue, I think at present we hot loop there and keep trying until the spot price is allowed. This may actually be the desired behaviour.

Comment 3 Joel Speed 2020-08-20 16:44:30 UTC
We need to clarify the behaviour of AWS and make a decision on this, will do so next sprint

Comment 4 Joel Speed 2020-09-07 10:05:25 UTC
On AWS, we fail immediately, therefore we should mimic that behaviour here as is done in the linked PR

Comment 7 sunzhaohua 2020-09-10 04:55:57 UTC
Verified
clusterversion: 4.6.0-0.nightly-2020-09-10-031249

      providerSpec:
        value:
          spotVMOptions:
            maxPrice: 0.01

$ oc get machine
NAME                                              PHASE     TYPE              REGION           ZONE   AGE
az-zhsun-0910-grrfc-master-0                      Running   Standard_D8s_v3   northcentralus          59m
az-zhsun-0910-grrfc-master-1                      Running   Standard_D8s_v3   northcentralus          59m
az-zhsun-0910-grrfc-master-2                      Running   Standard_D8s_v3   northcentralus          59m
az-zhsun-0910-grrfc-worker-northcentralus-dlhdt   Running   Standard_D2s_v3   northcentralus          45m
az-zhsun-0910-grrfc-worker-northcentralus-gdrjr   Running   Standard_D2s_v3   northcentralus          45m
az-zhsun-0910-grrfc-worker-northcentralus-gg7qd   Running   Standard_D2s_v3   northcentralus          45m
az-zhsun-0910-grrfc-worker-spot-gjf7v             Failed                                              10s

Comment 9 errata-xmlrpc 2020-10-27 16:16:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196