Bug 1945910 - [aws] support byo iam roles for instances
Summary: [aws] support byo iam roles for instances
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.8.0
Assignee: Matthew Staebler
QA Contact: Yunfei Jiang
URL:
Whiteboard:
Depends On:
Blocks: 1945907
TreeView+ depends on / blocked
 
Reported: 2021-04-02 16:15 UTC by Matthew Staebler
Modified: 2021-07-27 22:57 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
This is new functionality in 4.8 covered by an epic.
Clone Of: 1945907
Environment:
Last Closed: 2021-07-27 22:57:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4790 0 None closed Byo IAM roles for IPI install 2021-04-02 16:18:40 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:57:46 UTC

Description Matthew Staebler 2021-04-02 16:15:05 UTC
+++ This bug was initially created as a clone of Bug #1945907 +++

This is a clone of https://issues.redhat.com/browse/CORS-1653.

1. Proposed title of this feature request
    Permit using existing IAM roles for bootstrap, worker and control plane nodes in installer. Implementation should support AWS, Azure, and GCP.
2. What is the nature and description of the request?
    Enhance the installer to allow the customer to pre-create the IAM roles used by the bootstrap, worker and control plane nodes and supply those roles to the installer in IPI mode.
3. Why does the customer need this? (List the business requirements here)
    It is currently impossible to perform an IPI mode installation of OCP in the public cloud with additional restrictions. For instance, some customers require that all roles match a specific naming scheme and/or include a predefined permissions boundary in the role creation process.
4. List any affected packages or components.
    Installer

Comment 2 Yunfei Jiang 2021-04-08 01:18:45 UTC
verified. FAILED.

OCP version: 4.8.0-0.nightly-2021-04-05-174735

Installer skipped user provided IAM role in `platform.aws.iamRole`, it created its own IAM roles (just like a normal IPI) to finish the installation process.

Per [1], it should use existing IAM role in `platform.aws.iamRole` in install-config.yaml.
[1] https://github.com/smrowley/installer/blob/7c54988f0be7cb44822a14cf2d4708adcf72abcb/data/data/install.openshift.io_installconfigs.yaml#L919-L923

Steps to Reproduce:
1. Create install-conifg.yaml

apiVersion: v1
baseDomain: qe.devcluster.openshift.com
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    aws: {}
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform:
    aws: {}
  replicas: 3
metadata:
  creationTimestamp: null
  name: yunjiang-eplat
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: us-east-2
    iamRole: existing_iam_master
publish: External
pullSecret: <HIDDEN>
sshKey: <HIDDEN>

2. Create cluster.

Actual results:

1. after checking aws web consoel and install log, policies and roles were created for bootstrap, master and worker:

time="2021-04-06T04:43:07-04:00" level=debug msg="module.masters.aws_iam_role_policy.master_policy[0]: Creation complete after 1s [id=yunjiang-eplat-cgdm2-master-role:yunjiang-eplat-cgdm2-master-policy]"
time="2021-04-06T04:43:07-04:00" level=debug msg="module.masters.aws_iam_role.master_role[0]: Creation complete after 0s [id=yunjiang-eplat-cgdm2-master-role]"
time="2021-04-06T04:43:07-04:00" level=debug msg="module.masters.aws_iam_role_policy.master_policy[0]: Creation complete after 1s [id=yunjiang-eplat-cgdm2-master-role:yunjiang-eplat-cgdm2-master-policy]"
time="2021-04-06T04:43:07-04:00" level=debug msg="module.iam.aws_iam_role_policy.worker_policy[0]: Creation complete after 1s [id=yunjiang-eplat-cgdm2-worker-role:yunjiang-eplat-cgdm2-worker-policy]"
time="2021-04-06T04:43:07-04:00" level=debug msg="module.iam.aws_iam_role.worker_role[0]: Creation complete after 0s [id=yunjiang-eplat-cgdm2-worker-role]"
time="2021-04-06T04:43:07-04:00" level=debug msg="module.iam.aws_iam_role_policy.worker_policy[0]: Creation complete after 1s [id=yunjiang-eplat-cgdm2-worker-role:yunjiang-eplat-cgdm2-worker-policy]"
time="2021-04-06T04:43:07-04:00" level=debug msg="module.bootstrap.aws_iam_role_policy.bootstrap[0]: Creation complete after 1s [id=yunjiang-eplat-cgdm2-bootstrap-role:yunjiang-eplat-cgdm2-bootstrap-policy]"

2. the profiles attached to cluster instances contain new created roles, instead of existing role existing_iam_master

Expected results:
* No new role were created.
* The profiles attached to cluster instances contain existing role existing_iam_master

Additional info:

following config works as expect:

apiVersion: v1
baseDomain: qe.devcluster.openshift.com
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    aws:
      iamRole: existing_iam_worker
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform:
    aws:
      iamRole: existing_iam_master
  replicas: 3
metadata:
  creationTimestamp: null
  name: yunjiang-e1
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: us-east-2
publish: External
pullSecret: <HIDDEN>
sshKey: <HIDDEN>

Comment 3 Matthew Staebler 2021-04-08 01:53:15 UTC
@yunjiang You are setting the incorrect field. You need to set the `iamRole` field on the machine pool, so `.controlPlane.platform.aws.iamRole` and `.compute[0].platform.aws.iamRole`.


$ openshift-install explain installconfig.controlPlane.platform.aws.iamRole
KIND:     InstallConfig
VERSION:  v1

RESOURCE: <string>
  IAMRole is the name of the IAM Role to use for the instance profile of the machine. Leave unset to have the installer create the IAM Role on your behalf.


$ openshift-install explain installconfig.compute.platform.aws.iamRole
KIND:     InstallConfig
VERSION:  v1

RESOURCE: <string>
  IAMRole is the name of the IAM Role to use for the instance profile of the machine. Leave unset to have the installer create the IAM Role on your behalf.

Comment 4 Matthew Staebler 2021-04-08 01:55:13 UTC
If you want to use the same role for both the control plane and compute nodes, you can set the `.platform.aws.defaultMachinePlatform.iamRole` field.


$ openshift-install explain installconfig.platform.aws.defaultMachinePlatform.iamRole
KIND:     InstallConfig
VERSION:  v1

RESOURCE: <string>
  IAMRole is the name of the IAM Role to use for the instance profile of the machine. Leave unset to have the installer create the IAM Role on your behalf.

Comment 5 Yunfei Jiang 2021-04-08 02:17:37 UTC
Thanks Matthew, yes, the `.controlPlane.platform.aws.iamRole` and `.compute[0].platform.aws.iamRole` works as expect, I will try installconfig.platform.aws.defaultMachinePlatform.iamRole

I also got a question regarding user could provide one role for both mast and worker, this means, the worker has many permissions it doesn’t require, since master role contains all permissions for worker, should we add some warning messages to let user know that the worker instances are using the same role as master?

Comment 6 Matthew Staebler 2021-04-08 04:24:57 UTC
I, personally, don't think that we need warnings. The user is in charge of giving the permissions to the roles. The installer is not checking the permissions given to the roles. So if the user gives more permissions to a role than are required, we will not warn the user. Along those same lines, I don't see why we would warn the user if the used the same role for the masters and workers, whether by giving the same name to the individual machine pools or by using the default machine platform. I would not recommend that user's set the IAM role in the default machine platform. I do not think that it is something that we should go out of our way to document. But we need to support it, since any field in the default machine platform must be honored. We don't have a way to exclude fields that are in the machine pool platforms from also being in the default machine platform.

Comment 7 Yunfei Jiang 2021-04-08 06:56:30 UTC
(In reply to Matthew Staebler from comment #6)
> The installer is not checking the permissions given to the roles. 
installer will do permission check for IAM user used by `openshift-install create clsuter`, if some of required permissions of user provided roles are missing, it will cause install failure, I think the user experience will be better if we could check if some required permissions are missing before installing cluster, like checking IAM user permissions.

another issue is after cluster destroyed, the shared tag in IAM role was not removed, looks like we hit the same issue in bug 1926547, will it be fixed by bug 1926547?

Comment 8 Matthew Staebler 2021-04-08 13:30:24 UTC
(In reply to Yunfei Jiang from comment #7)
> (In reply to Matthew Staebler from comment #6)
> > The installer is not checking the permissions given to the roles. 
> installer will do permission check for IAM user used by `openshift-install
> create clsuter`, if some of required permissions of user provided roles are
> missing, it will cause install failure, I think the user experience will be
> better if we could check if some required permissions are missing before
> installing cluster, like checking IAM user permissions.

Presumably, if the user is supplying their own IAM roles for instances, they are also going to be using manual credentials mode. In that case, the installer will not perform any permissions checking, even of the IAM entity used by the installer.

 
> another issue is after cluster destroyed, the shared tag in IAM role was not
> removed, looks like we hit the same issue in bug 1926547, will it be fixed
> by bug 1926547?

No, it will not be addressed by that BZ. Please open a new BZ, or fail this BZ. The BZ that you linked will address the fact that the destroyer erroneously completes successfully despite not being able to removed the shared tag. But the underlying issue of not being able to remove the shared tag is not addressed. Out of curiosity, does the IAM user that you are using have the `iam:UntagRole` permission? That permission is needed to remove the shared tag from the IAM role.

Comment 9 Yunfei Jiang 2021-04-12 03:25:54 UTC
Thanks Matthew, tested against following four configurations, all clusters could be installed successfully with correct IAM role.
There is an issue that mentioned in comment 8, I created bug 1948359 to track this issue, mask this bug as VERIFIED.

OCP Version:4.8.0-0.nightly-2021-04-08-043959


> config 1 - master and worker:
compute:
- architecture: amd64
  name: worker
  platform:
    aws:
      iamRole: existing_iam_worker2
controlPlane:
  name: master
  platform:
    aws:
      iamRole: existing_iam_master2
platform:
  aws:
    region: us-east-2

> config 2 - master only:
compute:
- architecture: amd64
  name: worker
  platform: {}
controlPlane:
  name: master
  platform:
    aws:
      iamRole: existing_iam_master2
platform:
  aws:
    region: us-east-2

> config 3 - worker only:
compute:
- architecture: amd64
  name: worker
  platform:
    aws:
      iamRole: existing_iam_worker2
controlPlane:
  name: master
  platform: {}
platform:
  aws:
    region: us-east-2

> config 4 - master and worker (with defaultMachinePlatform):
compute:
- architecture: amd64
  name: worker
  platform: {}
controlPlane:
  name: master
  platform: {}
platform:
  aws:
    region: us-east-2
    defaultMachinePlatform:
      iamRole: existing_iam_master2

Comment 12 errata-xmlrpc 2021-07-27 22:57:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.