Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1815219

Summary: Encrypted AWS EBS seemingly causing OCP4.3 worker node deployments to fail
Product: OpenShift Container Platform Reporter: Rob Gregory <rgregory>
Component: Cloud ComputeAssignee: Joel Speed <jspeed>
Cloud Compute sub component: Other Providers QA Contact: Milind Yadav <miyadav>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: agarcial, aygarg, jspeed, yunjiang, zhsun
Version: 4.3.0   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The IAM role assigned to worked nodes did not have sufficient permissions to access the KMS key to decrypt the EBS volume on mount Consequence: EC2 instances would be accepted, but then fail when starting as they could not read from their root drive Fix: Grant the required permissions for EC2 instances to be able to decrypt KMS encrypted EBS volumes with Customer Managed Keys Result: When using a Customer Managed Key for encrypting EBS volumes, instances now have the required permissions and start as expected
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-04 18:06:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 Alberto 2020-04-07 10:59:44 UTC
>ambiguous "Client.InternalError: Client error on launch"
Where is that coming from?

Could you please share must gather logs and user config input so we can recreate the scenario?

Comment 4 Joel Speed 2020-04-09 13:23:02 UTC
The installer doesn't provide any way to add extra permissions to a user that the cloud credentials operator creates, that said, I managed to reproduce the problem and I have tested https://github.com/JoelSpeed/machine-api-operator/commit/ed9f94b37d0c9584f9c13098d4b2303fee2cdf4f which does solve the issue, so that could be a possible solution to the problem if we are happy deploying those IAM permissions on all clusters

> >ambiguous "Client.InternalError: Client error on launch"

This is the error that AWS gives when it tries to launch the instance and fails, from this troubleshooting guide https://docs.amazonaws.cn/en_us/autoscaling/ec2/userguide/ts-as-instancelaunchfailure.html#ts-as-instancelaunchfailure-12:
> This error can be caused when an Auto Scaling group attempts to launch an instance that has an encrypted EBS volume, but the service-linked role does not have access to the customer managed CMK used to encrypt it.

Also worth noting that I believe https://bugzilla.redhat.com/show_bug.cgi?id=1820484 duplicates this bug, they seem to have the same root cause

The one thing I didn't manage to work out is where they saw, I couldn't see any messages like this anywhere:
> Upon investigating further, I noticed that the error was as given below :
> User: arn:aws:iam::<REDACTED>:user/ocp-69v4b-openshift-machine-api-aws-knq6c is not authorized to perform: kms:GenerateDataKeyWithoutPlaintext on resource: arn:aws:kms:us-east-1:<REDACTED>:key/<KEY-ID>

Comment 8 Joel Speed 2020-04-20 11:13:10 UTC
*** Bug 1820484 has been marked as a duplicate of this bug. ***

Comment 9 Alberto 2020-04-23 11:41:54 UTC
Not sure why automation did not set this to modified as per https://github.com/openshift/machine-api-operator/pull/557#issuecomment-618310762
so moving it manually

Comment 19 errata-xmlrpc 2020-08-04 18:06:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409