+++ This bug was initially created as a clone of Bug #1824497 +++ Description of problem: Machine status should be "Failed" when creating a spot instance with price lower than spot instance price Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-04-15-223247 How reproducible: Always Steps to Reproduce: 1. Creating a spot instance with price lower than spot instance price providerSpec: value: spotMarketOptions: maxPrice: "0.01" 2. Check machines and logs Actual results: Machine stuck in Provisioning status $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsun416aws-rg88g-master-0 Running m4.xlarge us-east-2 us-east-2a 6h53m zhsun416aws-rg88g-master-1 Running m4.xlarge us-east-2 us-east-2b 6h53m zhsun416aws-rg88g-master-2 Running m4.xlarge us-east-2 us-east-2c 6h53m zhsun416aws-rg88g-worker-us-east-2a-txx9k Running m4.large us-east-2 us-east-2a 6h39m zhsun416aws-rg88g-worker-us-east-2b-9r4rx Running m4.large us-east-2 us-east-2b 6h39m zhsun416aws-rg88g-worker-us-east-2c-fxxdm Provisioning 4m57s lastUpdated: "2020-04-16T09:43:52Z" phase: Provisioning providerStatus: conditions: - lastProbeTime: "2020-04-16T09:44:10Z" lastTransitionTime: "2020-04-16T09:44:10Z" message: 'error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.' reason: MachineCreationFailed status: "False" type: MachineCreation I0416 09:46:31.346461 1 actuator.go:74] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: actuator creating machine I0416 09:46:31.347178 1 reconciler.go:38] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: creating machine E0416 09:46:31.347197 1 reconciler.go:221] NodeRef not found in machine zhsun416aws-rg88g-worker-us-east-2c-fxxdm I0416 09:46:31.372053 1 instances.go:47] No stopped instances found for machine zhsun416aws-rg88g-worker-us-east-2c-fxxdm I0416 09:46:31.372096 1 instances.go:145] Using AMI ami-0e888b699fa6e37e7 I0416 09:46:31.372108 1 instances.go:77] Describing security groups based on filters I0416 09:46:31.583386 1 instances.go:122] Describing subnets based on filters I0416 09:46:32.438067 1 instances.go:331] Error launching instance: SpotMaxPriceTooLow: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257. status code: 400, request id: 3e5331d8-d1e9-4034-833c-15f10ce599f4 E0416 09:46:32.438171 1 reconciler.go:69] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: error creating machine: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257. I0416 09:46:32.438187 1 machine_scope.go:134] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: Updating status I0416 09:46:32.438195 1 machine_scope.go:155] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: finished calculating AWS status I0416 09:46:32.438215 1 machine_scope.go:80] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: patching machine E0416 09:46:32.453533 1 actuator.go:65] zhsun416aws-rg88g-worker-us-east-2c-fxxdm error: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257. W0416 09:46:32.453594 1 controller.go:311] zhsun416aws-rg88g-worker-us-east-2c-fxxdm: failed to create machine: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257. E0416 09:46:32.453654 1 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257." "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsun416aws-rg88g-worker-us-east-2c-fxxdm"} I0416 09:46:32.453784 1 recorder.go:52] controller-runtime/manager/events "msg"="Warning" "message"="failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257." "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun416aws-rg88g-worker-us-east-2c-fxxdm","uid":"6dedaf4b-12db-4741-8a26-5555ca8dd11e","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"134275"} "reason"="FailedCreate" Expected results: The machine phase is set "Failed" Additional info: --- Additional comment from Joel Speed on 2020-04-16 16:37:10 UTC --- I've tested this with the same build and have been unable to reproduce. Is there any more information you can provide? --- Additional comment from Joel Speed on 2020-04-17 09:57:00 UTC --- I believe this issue was introduced by a refactor of the Cluster-API-Provider-AWS in Machine's will only go into the failed phase when the returned error is an `InvalidMachineConfigurationError` (see: https://github.com/openshift/machine-api-operator/blob/b9b4aaea428abe021d84477bd62a99f806fb64f2/pkg/controller/machine/controller.go#L312-L317) The error you are seeing here does return this (https://github.com/openshift/cluster-api-provider-aws/blob/025ec74aa743c3834020f4f6a45ac19c1acb76d2/pkg/actuators/machine/instances.go#L261), however it is then wrapped (https://github.com/openshift/cluster-api-provider-aws/blob/025ec74aa743c3834020f4f6a45ac19c1acb76d2/pkg/actuators/machine/reconciler.go#L73) so that it no longer matches the correct type The check to see if the error is an InvalidMachineConfigurationError (implemented: https://github.com/openshift/machine-api-operator/blob/b9b4aaea428abe021d84477bd62a99f806fb64f2/pkg/controller/machine/controller.go#L312-L317) does not currently support this wrapping. So it will need to be updated to support the wrapping.
Description of problem: Machine status should be "Failed" when creating machineset with invalid configuration Version-Release number of selected component (if applicable): Cluster version is 4.5.0-0.nightly-2020-05-08-015855 How reproducible: Always Step1.:Create a machineset with invalid specs. apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: autoscaling.openshift.io/machineautoscaler: openshift-machine-api/machineautoscaler machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "3" machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "1" creationTimestamp: "2020-05-12T09:18:42Z" generation: 1 labels: machine.openshift.io/cluster-api-cluster: miyadav-12ipi-vhcs5 name: miyadav-12ipi-vhcs5-worker-invalid namespace: openshift-machine-api resourceVersion: "161152" selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinesets/miyadav-12ipi-vhcs5-worker-invalid uid: 0254a43b-637c-45ff-8b83-f910b1ebda9e spec: replicas: 1 selector: matchLabels: machine.openshift.io/cluster-api-cluster: miyadav-12ipi-vhcs5 machine.openshift.io/cluster-api-machineset: miyadav-12ipi-vhcs5-worker template: metadata: labels: machine.openshift.io/cluster-api-cluster: miyadav-12ipi-vhcs5 machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: miyadav-12ipi-vhcs5-worker spec: metadata: {} providerSpec: value: apiVersion: vsphereprovider.openshift.io/v1beta1 credentialsSecret: name: vsphere-cloud-credentials diskGiB: 50 kind: VSphereMachineProviderSpec memoryMiB: 8192 metadata: creationTimestamp: null network: devices: - networkName: VM Network numCPUs: 4 numCoresPerSocket: 1 snapshot: "" spotMarketOptions: maxPrice: "0.01" template: miyadav-12ipi-vhcs5-rhcos userDataSecret: name: worker-user-data workspace: datacenter: dc1 datastore: nvme-ds1 folder: /dc1/vm/miyadav-12ipi-vhcs5 server: vcsa-qe.vmware.devcluster.openshift.com 2.oc create -f <invalid-machineset.yml Actual : machineset created successfully [miyadav@miyadav ManualRun]$ oc get machineset NAME DESIRED CURRENT READY AVAILABLE AGE miyadav-12ipi-vhcs5-worker 2 2 2 2 6h22m miyadav-12ipi-vhcs5-worker-invalid 2 2 5m22s 3.[miyadav@miyadav ManualRun]$ oc get machines NAME PHASE TYPE REGION ZONE AGE miyadav-12ipi-vhcs5-master-0 Running 6h32m miyadav-12ipi-vhcs5-master-1 Running 6h32m miyadav-12ipi-vhcs5-master-2 Running 6h32m miyadav-12ipi-vhcs5-worker-g79rl Running 6h20m miyadav-12ipi-vhcs5-worker-invalid-n29vv Provisioning 15m miyadav-12ipi-vhcs5-worker-qfj77 Running 6h20m Actual : machines not in failed status , but stuck in provisioning Expected : Machine should be in failed status Additional info: machine-controller logs : . . . ta:{}} DeviceName:VM Network UseAutoDetect:<nil>} Network:<nil> InPassthroughMode:<nil>} I0512 09:47:05.144773 1 reconciler.go:523] miyadav-12ipi-vhcs5-worker-invalid-n29vv: running task: task-135147 I0512 09:47:05.144861 1 reconciler.go:620] miyadav-12ipi-vhcs5-worker-invalid-n29vv: Updating provider status I0512 09:47:05.144907 1 machine_scope.go:99] miyadav-12ipi-vhcs5-worker-invalid-n29vv: patching machine I0512 09:47:05.156276 1 controller.go:325] miyadav-12ipi-vhcs5-worker-invalid-n29vv: created instance, requeuing I0512 09:47:05.156429 1 controller.go:169] miyadav-12ipi-vhcs5-worker-invalid-n29vv: reconciling Machine I0512 09:47:05.156462 1 actuator.go:80] miyadav-12ipi-vhcs5-worker-invalid-n29vv: actuator checking if machine exists I0512 09:47:05.163046 1 session.go:114] Find template by instance uuid: a051b036-fffd-4848-bf0f-7dcd5a0e2f7a I0512 09:47:05.181298 1 reconciler.go:155] miyadav-12ipi-vhcs5-worker-invalid-n29vv: does not exist I0512 09:47:05.181405 1 controller.go:313] miyadav-12ipi-vhcs5-worker-invalid-n29vv: reconciling machine triggers idempotent create I0512 09:47:05.181430 1 actuator.go:59] miyadav-12ipi-vhcs5-worker-invalid-n29vv: actuator creating machine I0512 09:47:05.190393 1 reconciler.go:604] task: task-135148, state: error, description-id: VirtualMachine.clone I0512 09:47:05.190475 1 session.go:114] Find template by instance uuid: a051b036-fffd-4848-bf0f-7dcd5a0e2f7a I0512 09:47:05.208697 1 reconciler.go:83] miyadav-12ipi-vhcs5-worker-invalid-n29vv: cloning I0512 09:47:05.208738 1 session.go:111] Invalid UUID for VM "miyadav-12ipi-vhcs5-rhcos": , trying to find by name I0512 09:47:05.227797 1 reconciler.go:399] miyadav-12ipi-vhcs5-worker-invalid-n29vv: no snapshot name provided, getting snapshot using template I0512 09:47:05.249183 1 reconciler.go:478] Getting network devices I0512 09:47:05.249283 1 reconciler.go:555] Adding device: VM Network I0512 09:47:05.254066 1 reconciler.go:584] Adding device: eth card type: vmxnet3, network spec: &{NetworkName:VM Network}, device info: &{VirtualDeviceDeviceBackingInfo:{VirtualDeviceBackingInfo:{DynamicData:{}} DeviceName:VM Network UseAutoDetect:<nil>} Network:<nil> InPassthroughMode:<nil>} I0512 09:47:05.257745 1 reconciler.go:523] miyadav-12ipi-vhcs5-worker-invalid-n29vv: running task: task-135148 I0512 09:47:05.257811 1 reconciler.go:620] miyadav-12ipi-vhcs5-worker-invalid-n29vv: Updating provider status I0512 09:47:05.257844 1 machine_scope.go:99] miyadav-12ipi-vhcs5-worker-invalid-n29vv: patching machine I0512 09:47:05.269717 1 controller.go:325] miyadav-12ipi-vhcs5-worker-invalid-n29vv: created instance, requeuing I0512 09:47:05.269816 1 controller.go:169] miyadav-12ipi-vhcs5-worker-invalid-n29vv: reconciling Machine I0512 09:47:05.269842 1 actuator.go:80] miyadav-12ipi-vhcs5-worker-invalid-n29vv: actuator checking if machine exists I0512 09:47:05.277706 1 session.go:114] Find template by instance uuid: a051b036-fffd-4848-bf0f-7dcd5a0e2f7a I0512 09:47:05.297275 1 reconciler.go:155] miyadav-12ipi-vhcs5-worker-invalid-n29vv: does not exist . . .
This bug does not seem to be directly related to the bug from which it is cloned. Not everything that fails to provision is necessarily going to trigger a 'failure.' Please describe what it is you're attempting to do in more detail.
As with the description , I tried to put invalid configuration , but it seems it drops it gracefully if the options that are invalid and provision the machines , checked with Joel and did below : Removing the machine.openshift.io/cluster-api-cluster from spec.template.metadata.labels After that it never created any machines or logs [miyadav@miyadav bugvsphere]$ oc get machineset NAME DESIRED CURRENT READY AVAILABLE AGE miyadav-11-rdnr9-worker 2 2 2 2 134m miyadav-11-rdnr9-worker-invalid 1 56s miyadav-11-rdnr9-worker-rz 1 1 1 1 26m . . Then manually tried creating a machine that doesn't have the label rather than using a machineset. Getting this : . . . E0513 08:47:29.127626 1 controller.go:173] miyadav-11-rdnr9-worker-sam: machine validation failed: spec.labels: Invalid value: map[string]string{"machine.openshift.io/cluster-api-machine-role":"worker", "machine.openshift.io/cluster-api-machine-type":"worker", "machine.openshift.io/cluster-api-machineset":"miyadav-11-rdnr9-worker", "machine.openshift.io/region":"", "machine.openshift.io/zone":""}: missing machine.openshift.io/cluster-api-cluster label. So , I would need more help to understand the reason to create this bug ..
This will be easier to validate once BZ#1833256 is merged. Perhaps we could add that as a dependency of this and hold off verifying this for now. If that BZ can be verified, then this one is also working
Making this depend on BZ#1833256 as we need it before we can verify this
[miyadav@miyadav bugvsphere]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-05-26-051016 True False 37m Cluster version is 4.5.0-0.nightly-2020-05-26-051016 Validated both scenarios as mentioned above machines never went to provisioning stuck phase . After checking with Joel, As the invalid configuration is being reproduced by the dependent bug mentioned in this one . https://bugzilla.redhat.com/show_bug.cgi?id=1833256 With that being Failed , with valid "invalid configuration" validates this bug as well, as the invalid configurations mentioned in this bug to reproduce are not something which will be created during installation or running cluster.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409