Bug 1824497
Summary: | [aws]Machine status should be "Failed" with an invalid configuration | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | sunzhaohua <zhsun> | |
Component: | Cloud Compute | Assignee: | Joel Speed <jspeed> | |
Cloud Compute sub component: | Other Providers | QA Contact: | Jianwei Hou <jhou> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | medium | CC: | agarcial, jspeed | |
Version: | 4.5 | |||
Target Milestone: | --- | |||
Target Release: | 4.5.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: Errors returned from the cloud-provider actuator no longer matched the expected type due to being wrapped using github.com/pkg/errors
Consequence: The Machine controller could not determine that the Machine should be marked as failed
Fix: Use error wrapping from the standard library to check the error types
Result: Machine controller can now determine when Machines should be marked Failed
|
Story Points: | --- | |
Clone Of: | ||||
: | 1825235 1825290 1826017 (view as bug list) | Environment: | ||
Last Closed: | 2020-07-13 17:27:59 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1825235, 1825290, 1826017 |
Description
sunzhaohua
2020-04-16 10:08:05 UTC
I've tested this with the same build and have been unable to reproduce. Is there any more information you can provide? I believe this issue was introduced by a refactor of the Cluster-API-Provider-AWS in Machine's will only go into the failed phase when the returned error is an `InvalidMachineConfigurationError` (see: https://github.com/openshift/machine-api-operator/blob/b9b4aaea428abe021d84477bd62a99f806fb64f2/pkg/controller/machine/controller.go#L312-L317) The error you are seeing here does return this (https://github.com/openshift/cluster-api-provider-aws/blob/025ec74aa743c3834020f4f6a45ac19c1acb76d2/pkg/actuators/machine/instances.go#L261), however it is then wrapped (https://github.com/openshift/cluster-api-provider-aws/blob/025ec74aa743c3834020f4f6a45ac19c1acb76d2/pkg/actuators/machine/reconciler.go#L73) so that it no longer matches the correct type The check to see if the error is an InvalidMachineConfigurationError (implemented: https://github.com/openshift/machine-api-operator/blob/b9b4aaea428abe021d84477bd62a99f806fb64f2/pkg/controller/machine/controller.go#L312-L317) does not currently support this wrapping. So it will need to be updated to support the wrapping. clusterversion: 4.5.0-0.nightly-2020-04-25-170442 Machine status didn't become "Failed" with some invalid configuration, for example "invalid credentialsSecret". 1. Create a machine with invalid ami, machine became to Failed. $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsunaws426-rn574-master-0 Running m4.xlarge us-east-2 us-east-2a 5h10m zhsunaws426-rn574-master-1 Running m4.xlarge us-east-2 us-east-2b 5h10m zhsunaws426-rn574-master-2 Running m4.xlarge us-east-2 us-east-2c 5h10m zhsunaws426-rn574-worker-us-east-2a-kcpw2 Failed 137m zhsunaws426-rn574-worker-us-east-2b-xr7m6 Running m4.large us-east-2 us-east-2b 3h16m zhsunaws426-rn574-worker-us-east-2c-fttw7 Running m4.large us-east-2 us-east-2c 4h57m 2 Create a machine with invalid credentialsSecret, machine PHASE is empty. credentialsSecret: name: aws-cloud-credentials-invalid $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsunaws426-rn574-master-0 Running m4.xlarge us-east-2 us-east-2a 5h50m zhsunaws426-rn574-master-1 Running m4.xlarge us-east-2 us-east-2b 5h50m zhsunaws426-rn574-master-2 Running m4.xlarge us-east-2 us-east-2c 5h50m zhsunaws426-rn574-worker-us-east-2a-cpsrj 91s zhsunaws426-rn574-worker-us-east-2b-xr7m6 Running m4.large us-east-2 us-east-2b 3h56m zhsunaws426-rn574-worker-us-east-2c-fttw7 Running m4.large us-east-2 us-east-2c 5h36m I0426 06:51:48.974004 1 actuator.go:97] zhsunaws426-rn574-worker-us-east-2a-cpsrj: actuator checking if machine exists E0426 06:51:48.974666 1 controller.go:269] zhsunaws426-rn574-worker-us-east-2a-cpsrj: failed to check if machine exists: zhsunaws426-rn574-worker-us-east-2a-cpsrj: failed to create scope for machine: failed to create aws client: aws credentials secret openshift-machine-api/aws-cloud-credentials-invalid: Secret "aws-cloud-credentials-invalid" not found not found E0426 06:51:48.974751 1 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="zhsunaws426-rn574-worker-us-east-2a-cpsrj: failed to create scope for machine: failed to create aws client: aws credentials secret openshift-machine-api/aws-cloud-credentials-invalid: Secret \"aws-cloud-credentials-invalid\" not found not found" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunaws426-rn574-worker-us-east-2a-cpsrj"} >Machine status didn't become "Failed" with some invalid configuration, for example "invalid credentialsSecret". Exists() will never succeed in that scenario, therefore requeueing the object right away. This is known and tracked here https://bugzilla.redhat.com/show_bug.cgi?id=1805639 For this BZ we need to reproduce the scenario in the description: "message: 'error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0257.'" Verified clusterversion: 4.5.0-0.nightly-2020-04-28-023400 Creating a spot instance with price lower than spot instance price $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsun428aws-5x69z-master-0 Running m4.xlarge us-east-2 us-east-2a 57m zhsun428aws-5x69z-master-1 Running m4.xlarge us-east-2 us-east-2b 57m zhsun428aws-5x69z-master-2 Running m4.xlarge us-east-2 us-east-2c 57m zhsun428aws-5x69z-worker-us-east-2a-zczxx Running m4.large us-east-2 us-east-2a 43m zhsun428aws-5x69z-worker-us-east-2b-8w79c Running m4.large us-east-2 us-east-2b 43m zhsun428aws-5x69z-worker-us-east-2c-b2w82 Failed 17s E0428 06:58:23.712222 1 reconciler.go:68] zhsun428aws-5x69z-worker-us-east-2c-b2w82: error creating machine: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241. I0428 06:58:23.712238 1 machine_scope.go:134] zhsun428aws-5x69z-worker-us-east-2c-b2w82: Updating status I0428 06:58:23.712246 1 machine_scope.go:155] zhsun428aws-5x69z-worker-us-east-2c-b2w82: finished calculating AWS status I0428 06:58:23.712261 1 machine_scope.go:80] zhsun428aws-5x69z-worker-us-east-2c-b2w82: patching machine E0428 06:58:23.729450 1 actuator.go:65] zhsun428aws-5x69z-worker-us-east-2c-b2w82 error: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241. W0428 06:58:23.729500 1 controller.go:312] zhsun428aws-5x69z-worker-us-east-2c-b2w82: failed to create machine: failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241. I0428 06:58:23.729520 1 controller.go:412] Actuator returned invalid configuration error: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241. I0428 06:58:23.729531 1 controller.go:421] zhsun428aws-5x69z-worker-us-east-2c-b2w82: going into phase "Failed" I0428 06:58:23.729907 1 recorder.go:52] controller-runtime/manager/events "msg"="Warning" "message"="failed to launch instance: error launching instance: Your Spot request price of 0.01 is lower than the minimum required Spot request fulfillment price of 0.0241." "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun428aws-5x69z-worker-us-east-2c-b2w82","uid":"feaf35a3-2ca9-4a31-88e8-5bc145ca6d24","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"32428"} "reason"="FailedCreate" I0428 06:58:23.741457 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsun428aws-5x69z-worker-us-east-2c-b2w82"} I0428 06:58:23.741521 1 controller.go:166] zhsun428aws-5x69z-worker-us-east-2c-b2w82: reconciling Machine W0428 06:58:23.741534 1 controller.go:263] zhsun428aws-5x69z-worker-us-east-2c-b2w82: machine has gone "Failed" phase. It won't reconcile I0428 06:58:23.741552 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsun428aws-5x69z-worker-us-east-2c-b2w82"} Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |