Bug 1815219
| Summary: | Encrypted AWS EBS seemingly causing OCP4.3 worker node deployments to fail | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Rob Gregory <rgregory> |
| Component: | Cloud Compute | Assignee: | Joel Speed <jspeed> |
| Cloud Compute sub component: | Other Providers | QA Contact: | Milind Yadav <miyadav> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | unspecified | CC: | agarcial, aygarg, jspeed, yunjiang, zhsun |
| Version: | 4.3.0 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.5.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: The IAM role assigned to worked nodes did not have sufficient permissions to access the KMS key to decrypt the EBS volume on mount
Consequence: EC2 instances would be accepted, but then fail when starting as they could not read from their root drive
Fix: Grant the required permissions for EC2 instances to be able to decrypt KMS encrypted EBS volumes with Customer Managed Keys
Result: When using a Customer Managed Key for encrypting EBS volumes, instances now have the required permissions and start as expected
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-08-04 18:06:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The installer doesn't provide any way to add extra permissions to a user that the cloud credentials operator creates, that said, I managed to reproduce the problem and I have tested https://github.com/JoelSpeed/machine-api-operator/commit/ed9f94b37d0c9584f9c13098d4b2303fee2cdf4f which does solve the issue, so that could be a possible solution to the problem if we are happy deploying those IAM permissions on all clusters > >ambiguous "Client.InternalError: Client error on launch" This is the error that AWS gives when it tries to launch the instance and fails, from this troubleshooting guide https://docs.amazonaws.cn/en_us/autoscaling/ec2/userguide/ts-as-instancelaunchfailure.html#ts-as-instancelaunchfailure-12: > This error can be caused when an Auto Scaling group attempts to launch an instance that has an encrypted EBS volume, but the service-linked role does not have access to the customer managed CMK used to encrypt it. Also worth noting that I believe https://bugzilla.redhat.com/show_bug.cgi?id=1820484 duplicates this bug, they seem to have the same root cause The one thing I didn't manage to work out is where they saw, I couldn't see any messages like this anywhere: > Upon investigating further, I noticed that the error was as given below : > User: arn:aws:iam::<REDACTED>:user/ocp-69v4b-openshift-machine-api-aws-knq6c is not authorized to perform: kms:GenerateDataKeyWithoutPlaintext on resource: arn:aws:kms:us-east-1:<REDACTED>:key/<KEY-ID> *** Bug 1820484 has been marked as a duplicate of this bug. *** Not sure why automation did not set this to modified as per https://github.com/openshift/machine-api-operator/pull/557#issuecomment-618310762 so moving it manually Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |
>ambiguous "Client.InternalError: Client error on launch" Where is that coming from? Could you please share must gather logs and user config input so we can recreate the scenario?