Description of problem: Machine status should be "Failed" when the provided max price is lower than the current spot price Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-07-23-055513 How reproducible: Always Steps to Reproduce: 1. Creating a spot instance with price lower than spot instance price spotVMOptions: maxPrice: 0.01 2. Check machines and logs Actual results: Machine stuck in Provisioning status $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsun723azure-krf9d-master-0 Running Standard_D8s_v3 northcentralus 15h zhsun723azure-krf9d-master-1 Running Standard_D8s_v3 northcentralus 15h zhsun723azure-krf9d-master-2 Running Standard_D8s_v3 northcentralus 15h zhsun723azure-krf9d-worker-northcentralus-kzdv9 Failed 10m zhsun723azure-krf9d-worker-northcentralus-vzhrp Running Standard_D2s_v3 northcentralus 15h zhsun723azure-krf9d-worker-spot-mqwkn Provisioning 25m status: lastUpdated: "2020-07-24T03:08:19Z" phase: Provisioning providerStatus: conditions: - lastProbeTime: "2020-07-24T03:08:24Z" lastTransitionTime: "2020-07-24T03:08:24Z" message: 'failed to create vm zhsun723azure-krf9d-worker-spot-mqwkn: failed to create or get machine: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Unable to perform operation ''Create VM'' since the provided max price ''0.01 USD'' is lower than the current spot price ''0.01808 USD'' for Azure Spot VM size ''Standard_D2s_v3''. For more information, see http://aka.ms/AzureSpot/errormessages."' reason: MachineCreationFailed status: "True" type: MachineCreated metadata: {} E0724 03:33:57.777242 1 actuator.go:78] Machine error: failed to reconcile machine "zhsun723azure-krf9d-worker-spot-mqwkn"s: failed to create vm zhsun723azure-krf9d-worker-spot-mqwkn: failed to create or get machine: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Unable to perform operation 'Create VM' since the provided max price '0.01 USD' is lower than the current spot price '0.01808 USD' for Azure Spot VM size 'Standard_D2s_v3'. For more information, see http://aka.ms/AzureSpot/errormessages." W0724 03:33:57.777286 1 controller.go:315] zhsun723azure-krf9d-worker-spot-mqwkn: failed to create machine: requeue in: 20s I0724 03:33:57.777298 1 controller.go:405] Actuator returned requeue-after error: requeue in: 20s I0724 03:33:57.777381 1 recorder.go:52] controller-runtime/manager/events "msg"="Warning" "message"="CreateError: failed to reconcile machine \"zhsun723azure-krf9d-worker-spot-mqwkn\"s: failed to create vm zhsun723azure-krf9d-worker-spot-mqwkn: failed to create or get machine: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=\u003cnil\u003e Code=\"OperationNotAllowed\" Message=\"Unable to perform operation 'Create VM' since the provided max price '0.01 USD' is lower than the current spot price '0.01808 USD' for Azure Spot VM size 'Standard_D2s_v3'. For more information, see http://aka.ms/AzureSpot/errormessages.\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun723azure-krf9d-worker-spot-mqwkn","uid":"dbf8ba35-7ff8-4e23-86b0-c4d016b2b4a8","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"937764"} "reason"="FailedCreate" Expected results: The machine phase is set "Failed" Additional info:
Out of interest, what do we do here for AWS? Since it would exhibit the same sort of issue, I think at present we hot loop there and keep trying until the spot price is allowed. This may actually be the desired behaviour.
We need to clarify the behaviour of AWS and make a decision on this, will do so next sprint
On AWS, we fail immediately, therefore we should mimic that behaviour here as is done in the linked PR
Verified clusterversion: 4.6.0-0.nightly-2020-09-10-031249 providerSpec: value: spotVMOptions: maxPrice: 0.01 $ oc get machine NAME PHASE TYPE REGION ZONE AGE az-zhsun-0910-grrfc-master-0 Running Standard_D8s_v3 northcentralus 59m az-zhsun-0910-grrfc-master-1 Running Standard_D8s_v3 northcentralus 59m az-zhsun-0910-grrfc-master-2 Running Standard_D8s_v3 northcentralus 59m az-zhsun-0910-grrfc-worker-northcentralus-dlhdt Running Standard_D2s_v3 northcentralus 45m az-zhsun-0910-grrfc-worker-northcentralus-gdrjr Running Standard_D2s_v3 northcentralus 45m az-zhsun-0910-grrfc-worker-northcentralus-gg7qd Running Standard_D2s_v3 northcentralus 45m az-zhsun-0910-grrfc-worker-spot-gjf7v Failed 10s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196