Bug 1815219 - Encrypted AWS EBS seemingly causing OCP4.3 worker node deployments to fail
Summary: Encrypted AWS EBS seemingly causing OCP4.3 worker node deployments to fail
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: Joel Speed
QA Contact: Milind Yadav
URL:
Whiteboard:
: 1820484 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-19 19:15 UTC by Rob Gregory
Modified: 2023-10-06 19:27 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The IAM role assigned to worked nodes did not have sufficient permissions to access the KMS key to decrypt the EBS volume on mount Consequence: EC2 instances would be accepted, but then fail when starting as they could not read from their root drive Fix: Grant the required permissions for EC2 instances to be able to decrypt KMS encrypted EBS volumes with Customer Managed Keys Result: When using a Customer Managed Key for encrypting EBS volumes, instances now have the required permissions and start as expected
Clone Of:
Environment:
Last Closed: 2020-08-04 18:06:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cloud-credential-operator pull 181 0 None closed Bug 1815219: CO-876: allow defining Conditions in AWS CredentialsRequest 2021-02-12 09:34:51 UTC
Github openshift machine-api-operator pull 557 0 None closed BUG 1815219: Allow machines to have encrypted EBS volumes with non-default key 2021-02-12 09:34:51 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-08-04 18:06:14 UTC

Comment 2 Alberto 2020-04-07 10:59:44 UTC
>ambiguous "Client.InternalError: Client error on launch"
Where is that coming from?

Could you please share must gather logs and user config input so we can recreate the scenario?

Comment 4 Joel Speed 2020-04-09 13:23:02 UTC
The installer doesn't provide any way to add extra permissions to a user that the cloud credentials operator creates, that said, I managed to reproduce the problem and I have tested https://github.com/JoelSpeed/machine-api-operator/commit/ed9f94b37d0c9584f9c13098d4b2303fee2cdf4f which does solve the issue, so that could be a possible solution to the problem if we are happy deploying those IAM permissions on all clusters

> >ambiguous "Client.InternalError: Client error on launch"

This is the error that AWS gives when it tries to launch the instance and fails, from this troubleshooting guide https://docs.amazonaws.cn/en_us/autoscaling/ec2/userguide/ts-as-instancelaunchfailure.html#ts-as-instancelaunchfailure-12:
> This error can be caused when an Auto Scaling group attempts to launch an instance that has an encrypted EBS volume, but the service-linked role does not have access to the customer managed CMK used to encrypt it.

Also worth noting that I believe https://bugzilla.redhat.com/show_bug.cgi?id=1820484 duplicates this bug, they seem to have the same root cause

The one thing I didn't manage to work out is where they saw, I couldn't see any messages like this anywhere:
> Upon investigating further, I noticed that the error was as given below :
> User: arn:aws:iam::<REDACTED>:user/ocp-69v4b-openshift-machine-api-aws-knq6c is not authorized to perform: kms:GenerateDataKeyWithoutPlaintext on resource: arn:aws:kms:us-east-1:<REDACTED>:key/<KEY-ID>

Comment 8 Joel Speed 2020-04-20 11:13:10 UTC
*** Bug 1820484 has been marked as a duplicate of this bug. ***

Comment 9 Alberto 2020-04-23 11:41:54 UTC
Not sure why automation did not set this to modified as per https://github.com/openshift/machine-api-operator/pull/557#issuecomment-618310762
so moving it manually

Comment 19 errata-xmlrpc 2020-08-04 18:06:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.