Bug 1805639
Summary: | Machine status should be "Failed" when creating a machine with invalid machine configuration | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | sunzhaohua <zhsun> |
Component: | Cloud Compute | Assignee: | Joel Speed <jspeed> |
Cloud Compute sub component: | Other Providers | QA Contact: | sunzhaohua <zhsun> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | dgrigore, mimccune, wking |
Version: | 4.4 | ||
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: The Machine API provided no feedback to users when credentials secrets were invalid
Consequence: It was hard to diagnose when there were issues with the cloud provider credentials
Fix: Provide a warning if the credentials secret does not exist or is in the wrong format
Result: Users are now warned when creating/updating MachineSets that there may be an issue with their credentials
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:10:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
sunzhaohua
2020-02-21 09:19:52 UTC
This a very particular scenario for invalidConfig as it makes the reconciling loop to fail before being able to check the instance existence. It could be the case where the spec is modified after creation for an existing instance. We don't want fail machines in such scenario as that's unrecoverable. We only fail them on creation or when the backed instance is deleted out of band. As we are working defaulting/validation for machines providerSpecs we will explore ways to mitigate this. We haven't prioritise investigating this, still would like to keep it open for now. Tagging upcomingSprint. This will be mitigated by https://github.com/openshift/machine-api-operator/pull/615. Still won't fix it as per https://bugzilla.redhat.com/show_bug.cgi?id=1805639#c1 https://github.com/openshift/machine-api-operator/pull/660#issue-457817017 Adding upcomingSprint label as this will need more testing and PR reviewing before merging The proposed fix is still under discussion, we will hopefully make some progress on this next sprint Tagging to try to reprioritise and either close/fix this during next sprint Bumping this to 4.7 We didn't manage to reach an agreement on how to solve this issue, will review next sprint *** Bug 1881865 has been marked as a duplicate of this bug. *** We discussed this recently, we are going to send a warning when the secret doesn't exist, otherwise we could end up with a race during cluster bootstrap, moving back to assigned to remind myself to update the PR I've made the adjustments on the PR and that's now up for review, moving this back to post there has been some back and forth discussion on the PR for this, it is still under review but should be resolved in the next sprint Verified, warn users when a credentials secret does not exist. clusterversion: 4.7.0-0.nightly-2020-12-14-165231 $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsunaws16-pqhbg-master-0 Running m5.xlarge us-east-2 us-east-2a 7h41m zhsunaws16-pqhbg-master-1 Running m5.xlarge us-east-2 us-east-2b 7h41m zhsunaws16-pqhbg-master-2 Running m5.xlarge us-east-2 us-east-2c 7h41m zhsunaws16-pqhbg-worker-us-east-2a-98bpq Running m5.large us-east-2 us-east-2a 7h31m zhsunaws16-pqhbg-worker-us-east-2b-mgmdl Running m5.large us-east-2 us-east-2b 7h31m zhsunaws16-pqhbg-worker-us-east-2c-cdlg5 84s zhsunaws16-pqhbg-worker-us-east-2c-szt75 Running m5.large us-east-2 us-east-2c 7h31m I1216 08:57:41.713358 1 controller.go:171] zhsunaws16-pqhbg-worker-us-east-2c-cdlg5: reconciling Machine I1216 08:57:41.713379 1 actuator.go:100] zhsunaws16-pqhbg-worker-us-east-2c-cdlg5: actuator checking if machine exists E1216 08:57:41.713839 1 controller.go:274] zhsunaws16-pqhbg-worker-us-east-2c-cdlg5: failed to check if machine exists: zhsunaws16-pqhbg-worker-us-east-2c-cdlg5: failed to create scope for machine: failed to create aws client: aws credentials secret openshift-machine-api/aws-cloud-credentials-invalid: Secret "aws-cloud-credentials-invalid" not found not found E1216 08:57:41.713913 1 controller.go:237] controller "msg"="Reconciler error" "error"="zhsunaws16-pqhbg-worker-us-east-2c-cdlg5: failed to create scope for machine: failed to create aws client: aws credentials secret openshift-machine-api/aws-cloud-credentials-invalid: Secret \"aws-cloud-credentials-invalid\" not found not found" "controller"="machine_controller" "name"="zhsunaws16-pqhbg-worker-us-east-2c-cdlg5" "namespace"="openshift-machine-api" I1216 08:58:18.868090 1 controller.go:58] controllers/MachineSet "msg"="Reconciling" "machineset"="zhsunaws16-pqhbg-worker-us-east-2c" "namespace"="openshift-machine-api" W1216 08:58:18.893532 1 warnings.go:67] providerSpec.credentialsSecret: Invalid value: "aws-cloud-credentials-invalid": not found. Expected CredentialsSecret to exist Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |