Bug 1829101 - OCP issues with AWS Organizations SCPs
Summary: OCP issues with AWS Organizations SCPs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Credential Operator
Version: 4.3.z
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Joel Diaz
QA Contact: wang lin
URL:
Whiteboard:
: 1832640 (view as bug list)
Depends On:
Blocks: 1757244 1868350
TreeView+ depends on / blocked
 
Reported: 2020-04-28 21:31 UTC by Juanjo Floristan
Modified: 2023-12-15 17:47 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: Support bypassing the dynamic cloud credentials permissions checking Reason: In AWS accounts with Service Control Policies (SCP) enabled, the results of the permissions simulations performed to determine the capabilities of AWS cloud credentials are unreliable. Result: Telling the OpenShift installer and the in-cluster cloud-credential-operator what capabilities the cloud credentials support, allows bypassing the dynamic permissions checking that can return the unreliable results.
Clone Of:
: 1868350 (view as bug list)
Environment:
Last Closed: 2020-10-27 15:58:32 UTC
Target Upstream Version:
Embargoed:
lwan: needinfo-
mworthin: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift api pull 692 0 None closed create config for cloud-credential-operator 2021-02-04 18:09:20 UTC
Github openshift cloud-credential-operator pull 227 0 None closed start using the CCO config object 2021-02-04 18:09:20 UTC
Github openshift cloud-credential-operator pull 228 0 None closed handle bootstrap user-defined mode 2021-02-04 18:09:20 UTC
Github openshift installer pull 3919 0 None closed types: add field to InstallConfig to force credentials mode 2021-02-04 18:09:20 UTC
Github openshift installer pull 3968 0 None closed types: capitalize CredentialsMode values 2021-02-04 18:09:19 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 15:58:56 UTC

Description Juanjo Floristan 2020-04-28 21:31:42 UTC
Description of problem:

 OpenShift does not support AWS IPI or UPI installations when using AWS 
 Organizations and Service Control Policies if there is a rule that deny all 
 actions or a specific required permission using a global condition, i.e. “for 
 all regions except us-east-1 and us-west-2” or “for all roles except role 
 A”.. even if the credentials have actually the permission to perform that 
 action.

 The reason this happens is because OpenShift depends on an AWS policy 
 simulator API that fails to validate conditions in AWS Organizations SCPs, 
 providing false negatives, and the installation cannot proceed if the 
 validation fails.

 The BZ 1750338 addresses the problem of passing the region when policies 
 include global conditions based on the region, but this only works with IAM 
 policies, it does not work in AWS Organizations SCPs, for the reason 
 explained, the API does not evaluate global conditions against SCPs. Therefore 
 it only sees the rule, applied unconditionally.

 This problem is hit by the openshift-installer and the cloud-credential- 
 operator


How reproducible:

 Implement an environment in AWS using AWS Organizations and SCPs to control 
 permissions, in any of the SCPs applied to the user or account whose 
 credentials will be provided to OCP, make sure you include a statement like 
 this:

      {
         "Sid":"DenyOtherRegions",
         "Effect":"Deny",
         "Resource":"*",
         "Action":"*",
         "Condition":{
            "StringNotEquals":{
               "aws:RequestedRegion":[
                  "us-east-1",
                  "us-west-2"
               ]
            }
         }
      }

 Note: The AWS credentials provided have all the permissions needed by Openshift 

 Try to install Openshift IPI in AWS following the official documentation

 $ openshift-install create cluster --dir=ocp

Actual results:

 Installer fails to install due to validation permissions even when the user 
 has all the permission to perform the actual actions. AWS validation fails 
 because it ignores the region condition and sees only "Deny: *" statement 
 (false negative)

 $ openshift-install create cluster --dir=ocp
 ...
 WARNING Action not allowed with tested creds action="ec2:CreateDhcpOptions"
 WARNING Action not allowed with tested creds 
 action="ec2:CreateInternetGateway"
 WARNING Action not allowed with tested creds action="ec2:CreateNatGateway"
 WARNING Action not allowed with tested creds action="ec2:CreateRoute"
 WARNING Action not allowed with tested creds action="ec2:CreateRouteTable"
 WARNING Action not allowed with tested creds action="ec2:CreateSecurityGroup"
 WARNING Action not allowed with tested creds action="ec2:CreateSubnet"
 WARNING Action not allowed with tested creds action="ec2:CreateTags"
 
 I patched the installer following the workaround suggested in 
 https://bugzilla.redhat.com/show_bug.cgi?id=1750338#c25:

 The installer was still failing with the error:
 
 “AWS credentials cannot be used to either create new creds or use as-is”
 
 The I had to manually patch pkg/asset/installconfig/aws/permissions.go to 
 prevent the mint and  passthrough credentials errors (they do not work 
 either) and force a successful return add the end of the module:

        # return nil added after failing to verify mint or passthrough checks
        return nil
        return errors.New("AWS credentials cannot be used to either create new 
                           creds or use as-is")

 I could made the installer progress but then is the cloud-credential- 
 operator who fails to validate permissions too, preventing any 
 CredentialRequest from being granted and therefore blocking the cluster 
 deployment completion

 $ oc logs cloud-credential-operator-69479545fc-mlcn7 -n openshift-cloud- 
 credential-operator -f

 time="2020-04-16T00:52:03Z" level=info msg="calculating metrics for all 
 CredentialsRequests" controller=metrics
 time="2020-04-16T00:52:03Z" level=info msg="reconcile complete" 
 controller=metrics elapsed=1.660646ms
 time="2020-04-16T00:52:08Z" level=info msg="validating cloud cred secret" 
 controller=secretannotator
 time="2020-04-16T00:52:08Z" level=debug msg="Loading infrastructure name: 
 oc4poc-fw6q6" controller=secretannotator
 time="2020-04-16T00:52:08Z" level=warning msg="Action not allowed with tested 
 creds" action="iam:CreateAccessKey" controller=secretannotator
 time="2020-04-16T00:52:08Z" level=warning msg="Action not allowed with tested 
 creds" action="iam:CreateUser" controller=secretannotator
 time="2020-04-16T00:52:08Z" level=warning msg="Action not allowed with tested 
 creds" action="iam:DeleteAccessKey" controller=secretannotator

 After this worker nodes are not deployed and there are no pending CSRs. The 
 cluster stops deploying with operators depending on cloud credentials blocked

Expected results:

 Openshift to install normally, maybe adding a mechanism to manually allow 
 ignoring the validation of permissions for the provided AWS credentials

Comment 2 Abhinav Dahiya 2020-04-29 22:30:36 UTC
To moving to credm-minter. I think we need to fix this in the cluster before we fix it in the installer.

Comment 3 James Harrington 2020-04-30 12:55:41 UTC
This is the same issue that I raised here https://bugzilla.redhat.com/show_bug.cgi?id=1815331

Comment 4 Joel Diaz 2020-05-05 14:40:21 UTC
We are presently pursuing access to an SCP-enabled environment to investigate/replicate.

Comment 5 Joel Diaz 2020-05-14 19:17:42 UTC
For interested parties, the enhancement proposal to allow bypassing pre-flight permissions checks posted at https://github.com/openshift/enhancements/pull/324

Comment 6 Devan Goodwin 2020-05-15 17:46:49 UTC
*** Bug 1832640 has been marked as a duplicate of this bug. ***

Comment 8 Joel Diaz 2020-05-22 17:50:49 UTC
Summary of investigation was that the permissions simulation is indeed unreliable when SCP is being used in an account. The enhancement proposal https://github.com/openshift/enhancements/pull/324 is written to allow instructing the OpenShift installer to not perform permissions checks before the installation, and to indicate to the in-cluster cloud-credential-operator that it too should not perform permissions simulations (CCO must be told whether to run in 'mint' or 'passthrough' mode through).

Comment 9 Greg Sheremeta 2020-06-18 18:21:09 UTC
waiting on enhancement to be approved

Comment 12 Devan Goodwin 2020-07-16 14:21:14 UTC
This bugfix is actively being worked on and we expect to complete this sprint.

Comment 16 wang lin 2020-08-13 03:01:39 UTC
The bug has fixed. 
test payload: 4.6.0-0.nightly-2020-08-09-151434

Test steps:
1. prepare a SCP account
2. Create install-config.yaml, choose the region(like us-east-1) where your credential has all permissions for a installation, and put "credentialMode: Mint" to install-config.yaml file force changing mode to Mint and bypass permissions checking
./openshift-install create install-config --dir demo
3. install a cluster
./openshift-install create cluster --dir demo --log-level debug

The cluster can install successfully.

Comment 22 errata-xmlrpc 2020-10-27 15:58:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 23 Kaify 2022-10-06 10:45:59 UTC
So we struggled for weeks trying to figure out what the issue is because the installer never threw an error that it's a permission issue. We have an SCP at org level that denies any action from any IAM user created even with Full Admin permission unless it has MFA enabled or has specific exception tag, had to do Phd on what every installer component does (red hat support was of no help) niether was AWS support able to figure out as the installer runs for 30 mins + and creates a log of logs and cloud trails. But we knew we are doing it right because it worked on an independent account.

Long story short - please enable throwing error about permission (even if you chose to ignore it - so something like SCP Permission Issue - Access Denied - Ignoring..Continuing). Secondly, please help me raise a bug for how can we make the installer add a specific tag to the 6 IAM users Opensihft IPI creates so Org SCP doesn't restrict installation.

Comment 24 wang lin 2022-10-08 02:50:52 UTC
In my opinion, if cluster is in global SCP ploicy, we must mandatory/explicit set credentialsMode to mint in install-config file, then installer will not do any permisions check for such scenario under current code logic. this is more like an enhancement, will forward the requirement to Hive/CCO team. 

cc mworthin efried , could you help take a look?

Comment 25 wang lin 2022-10-08 03:25:39 UTC
And installer supports a field in install-config to add tags to all created resources, not sure if it meets your needs.  

#####
platform.aws.userTags: A map of keys and values that the installation program adds as tags to all resources that it creates.

docs here: https://docs.openshift.com/container-platform/4.11/installing/installing_aws/installing-aws-customizations.html#installation-configuration-parameters-optional-aws_installing-aws-customizations

Comment 26 Kaify 2022-10-09 18:23:37 UTC
(In reply to wang lin from comment #25)
> And installer supports a field in install-config to add tags to all created
> resources, not sure if it meets your needs.  
> 
> #####
> platform.aws.userTags: A map of keys and values that the installation
> program adds as tags to all resources that it creates.
> 
> docs here:
> https://docs.openshift.com/container-platform/4.11/installing/installing_aws/
> installing-aws-customizations.html#installation-configuration-parameters-
> optional-aws_installing-aws-customizations

@lwan Thank you so much for addressing this, this is exactly what I was looking for.

Comment 27 Kaify 2022-10-10 05:38:41 UTC
(In reply to wang lin from comment #25)
> And installer supports a field in install-config to add tags to all created
> resources, not sure if it meets your needs.  
> 
> #####
> platform.aws.userTags: A map of keys and values that the installation
> program adds as tags to all resources that it creates.
> 
> docs here:
> https://docs.openshift.com/container-platform/4.11/installing/installing_aws/
> installing-aws-customizations.html#installation-configuration-parameters-
> optional-aws_installing-aws-customizations

Hi @lwan I jumped the gun there, upon updating the install-config it does apply the specific user tag I wish to be applied but it applies to all resources except the one I wanted which is IAM users (listed below) and these IAM users:

kaifyoct10-d2tbw-aws-ebs-csi-driver-operator-sfcvz
kaifyoct10-d2tbw-cloud-credential-operator-iam-ro-qql72
kaifyoct10-d2tbw-openshift-cloud-network-config-contro-c6w7n
kaifyoct10-d2tbw-openshift-image-registry-zrmh6
kaifyoct10-d2tbw-openshift-ingress-pdf8j
kaifyoct10-d2tbw-openshift-machine-api-aws-c8drd

What can I do to make installer add the userTags automatically to these IAM users. Without this the IAM users never get the access to do anything due to global SCP and fail at 
DEBUG Still waiting for the cluster to initialize: Working towards 4.11.1: 770 of 802 done (96% complete) and the worker node instance never turns up and it fails as several operator fails to initialize.



______________________________________________________________________________________________________________________________________________________________________________
______________________________________________________________________________________________________________________________________________________________________________


Optional extra infomration in case it comes handy:

For current installation I went ahead and added the tags to these IAM users manually as I didn't want them to fail as then I had to wait for 40 mins for it to realize it can't progress and fails, but here's a installer debug log message on failure from past failures:

ERROR Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::OAuthClientsController_SyncError::OAuthServerDeployment_PreconditionNotFulfilled::OAuthServerRouteEndpointAccessibleController_SyncError::OAuthServerServiceEndpointAccessibleController_SyncError::OAuthServerServiceEndpointsEndpointAccessibleController_SyncError::WellKnownReadyController_SyncError: IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server 
ERROR OAuthClientsControllerDegraded: no ingress for host oauth-openshift.apps.kaifyoct10.openshift2.XXXXXXengineering.net in route oauth-openshift in namespace openshift-authentication 
ERROR OAuthServerDeploymentDegraded: waiting for the oauth-openshift route to contain an admitted ingress: no admitted ingress for route oauth-openshift in namespace openshift-authentication 
ERROR OAuthServerDeploymentDegraded:               
ERROR OAuthServerRouteEndpointAccessibleControllerDegraded: route "openshift-authentication/oauth-openshift": status does not have a valid host address 
ERROR OAuthServerServiceEndpointAccessibleControllerDegraded: Get "https://172.30.46.33:443/healthz": dial tcp 172.30.46.33:443: connect: connection refused 
ERROR OAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: oauth service endpoints are not ready 
ERROR WellKnownReadyControllerDegraded: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap "oauth-openshift" not found (check authentication operator, it is supposed to create this) 
INFO Cluster operator authentication Available is False with OAuthServerDeployment_PreconditionNotFulfilled::OAuthServerRouteEndpointAccessibleController_ResourceNotFound::OAuthServerServiceEndpointAccessibleController_EndpointUnavailable::OAuthServerServiceEndpointsEndpointAccessibleController_ResourceNotFound::ReadyIngressNodes_NoReadyIngressNodes::WellKnown_NotReady: OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve route from cache: route.route.openshift.io "oauth-openshift" not found 
INFO OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.46.33:443/healthz": dial tcp 172.30.46.33:443: connect: connection refused 
INFO OAuthServerServiceEndpointsEndpointAccessibleControllerAvailable: endpoints "oauth-openshift" not found 
INFO ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes, 3 master nodes, 0 custom target nodes (none are schedulable or ready for ingress pods). 
INFO WellKnownAvailable: The well-known endpoint is not yet available: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap "oauth-openshift" not found (check authentication operator, it is supposed to create this) 
INFO Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform 
INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected 
INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected 
INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected 
INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected 
ERROR Cluster operator console Degraded is True with DefaultRouteSync_FailedAdmitDefaultRoute::RouteHealth_RouteNotAdmitted::SyncLoopRefresh_FailedIngress: DefaultRouteSyncDegraded: no ingress for host downloads-kaifyoct10.openshift2.XXXXXXengineering.net in route downloads in namespace openshift-console 
ERROR RouteHealthDegraded: console route is not admitted 
ERROR SyncLoopRefreshDegraded: no ingress for host console-kaifyoct10.openshift2.XXXXXXengineering.net in route console in namespace openshift-console 
INFO Cluster operator console Available is False with RouteHealth_RouteNotAdmitted: RouteHealthAvailable: console route is not admitted 
INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required 
INFO Cluster operator image-registry Available is False with DeploymentNotFound: Available: The deployment does not exist 
INFO NodeCADaemonAvailable: The daemon set node-ca has available replicas 
INFO ImagePrunerAvailable: Pruner CronJob has been created 
INFO Cluster operator image-registry Progressing is True with Error: Progressing: Unable to apply resources: unable to sync storage configuration: AccessDenied: Access Denied 
INFO Progressing:       status code: 403, request id: D7V0MW8YZYMPVKKZ, host id: fHTFjql1qr7CdRk4CeQN0SuRTWYI9COjXEgHQeM8wg3ymLMQGFcTl1qtytiR/zBkP0AEWJMYZmg= 
ERROR Cluster operator image-registry Degraded is True with Unavailable: Degraded: The deployment does not exist 
INFO Cluster operator ingress Available is False with IngressUnavailable: The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DNSReady=False (NoZones: The record isn't present in any zones.) 
INFO Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available. 
ERROR Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-78f7456b4-c7xpx" cannot be scheduled: 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling. Pod "router-default-78f7456b4-4ppf2" cannot be scheduled: 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling. Make sure you have sufficient worker nodes.), DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1), DNSReady=False (NoZones: The record isn't present in any zones.), CanaryChecksSucceeding=Unknown (CanaryRouteNotAdmitted: Canary route is not admitted by the default ingress controller) 
INFO Cluster operator insights ClusterTransferAvailable is False with NoClusterTransfer: no available cluster transfer 
INFO Cluster operator insights Disabled is False with AsExpected:  
INFO Cluster operator insights SCAAvailable is False with NotFound: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 404: {"code":"ACCT-MGMT-7","href":"/api/accounts_mgmt/v1/errors/7","id":"7","kind":"Error","operation_id":"d6f0af7b-11aa-442c-b50b-1e06e938f78b","reason":"The organization (id= 2DCYQ3e4fX8DBIQv9fR1ncF5rTR) does not have any certificate of type sca. Enable SCA at https://access.redhat.com/management."} 
INFO Cluster operator monitoring Available is False with UpdatingPrometheusOperatorFailed: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. 
INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack. 
ERROR Cluster operator monitoring Degraded is True with UpdatingPrometheusOperatorFailed: Failed to rollout the stack. Error: updating prometheus operator: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 2 unavailable replicas 
INFO Cluster operator network ManagementStateDegraded is False with :  
INFO Cluster operator network Progressing is True with Deploying: Deployment "/openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready 
ERROR Cluster initialization failed because one or more operators are not functioning properly. 
ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, 
ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html 
ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation 
ERROR failed to initialize the cluster: Some cluster operators are still updating: authentication, console, image-registry, ingress, monitoring

Comment 28 Eric Fried 2022-10-10 15:46:27 UTC
re https://bugzilla.redhat.com/show_bug.cgi?id=1829101#c24 I'm not as savvy with CCO as @abutcher, so I'm tagging him and stepping out.

Comment 29 Kaify 2022-10-13 06:39:39 UTC
Hi @lwan could you please check my update above, would there be anyway to add tags to IAM users that the installer creates?

Comment 30 Kaify 2022-10-13 08:02:55 UTC
I see that each of these 6 IAM users have this tag on them added by installer:

Tag key                               |  Tag value
kubernetes.io/cluster/kaifyoct13-l7x5c    owned

which is what I think you have as shared tag on the file github openshift/installer/pkg/asset/cluster/aws/aws.go (line 23 and 95)

func sharedTag(clusterID string) (string, string) {
	return fmt.Sprintf("kubernetes.io/cluster/%s", clusterID), "shared"
}

point is if installer can add this tag to the IAM users then it can definitely add userTags as well, and further enhancement could be to have option in the install-config to specifify which resources to apply what tag (but that could be a lengthy work) for now even if the installer just adds the userTags to IAM user as well that would be great. Thanks!

Comment 31 wang lin 2022-10-13 08:37:56 UTC
I launched a cluster with userTags and there is no tag added on IAM users, I suspect it's because the IAM users are created by CCO(Cloud Credential Operator), not installer, and I don't find a place to add tags for CCO, me and dgoodwin has moved off from CCO team.

@jshu Could you help to check with CCO devs in slack to see any updates or solution? thanks

Comment 32 Jianping SHu 2022-10-13 09:56:51 UTC
Hi, I'm the QE for CCO.
This BZ was closed and the above sounds like new request for CCO i.e. tagging the IAM users.
Can you create new ticket with all required info on CCO project to get timely feedback? 
https://issues.redhat.com/projects/CCO/issues

Comment 33 Andrew Butcher 2022-10-13 13:16:52 UTC
As far as I can tell CCO does not currently support setting user defined tags for the IAM users created in CCO's mint operation mode. I found this RFE https://issues.redhat.com/browse/CCO-78 which seems like the same ask.

Comment 34 Red Hat Bugzilla 2023-09-18 00:20:59 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.