Description of problem: OpenShift does not support AWS IPI or UPI installations when using AWS Organizations and Service Control Policies if there is a rule that deny all actions or a specific required permission using a global condition, i.e. “for all regions except us-east-1 and us-west-2” or “for all roles except role A”.. even if the credentials have actually the permission to perform that action. The reason this happens is because OpenShift depends on an AWS policy simulator API that fails to validate conditions in AWS Organizations SCPs, providing false negatives, and the installation cannot proceed if the validation fails. The BZ 1750338 addresses the problem of passing the region when policies include global conditions based on the region, but this only works with IAM policies, it does not work in AWS Organizations SCPs, for the reason explained, the API does not evaluate global conditions against SCPs. Therefore it only sees the rule, applied unconditionally. This problem is hit by the openshift-installer and the cloud-credential- operator How reproducible: Implement an environment in AWS using AWS Organizations and SCPs to control permissions, in any of the SCPs applied to the user or account whose credentials will be provided to OCP, make sure you include a statement like this: { "Sid":"DenyOtherRegions", "Effect":"Deny", "Resource":"*", "Action":"*", "Condition":{ "StringNotEquals":{ "aws:RequestedRegion":[ "us-east-1", "us-west-2" ] } } } Note: The AWS credentials provided have all the permissions needed by Openshift Try to install Openshift IPI in AWS following the official documentation $ openshift-install create cluster --dir=ocp Actual results: Installer fails to install due to validation permissions even when the user has all the permission to perform the actual actions. AWS validation fails because it ignores the region condition and sees only "Deny: *" statement (false negative) $ openshift-install create cluster --dir=ocp ... WARNING Action not allowed with tested creds action="ec2:CreateDhcpOptions" WARNING Action not allowed with tested creds action="ec2:CreateInternetGateway" WARNING Action not allowed with tested creds action="ec2:CreateNatGateway" WARNING Action not allowed with tested creds action="ec2:CreateRoute" WARNING Action not allowed with tested creds action="ec2:CreateRouteTable" WARNING Action not allowed with tested creds action="ec2:CreateSecurityGroup" WARNING Action not allowed with tested creds action="ec2:CreateSubnet" WARNING Action not allowed with tested creds action="ec2:CreateTags" I patched the installer following the workaround suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1750338#c25: The installer was still failing with the error: “AWS credentials cannot be used to either create new creds or use as-is” The I had to manually patch pkg/asset/installconfig/aws/permissions.go to prevent the mint and passthrough credentials errors (they do not work either) and force a successful return add the end of the module: # return nil added after failing to verify mint or passthrough checks return nil return errors.New("AWS credentials cannot be used to either create new creds or use as-is") I could made the installer progress but then is the cloud-credential- operator who fails to validate permissions too, preventing any CredentialRequest from being granted and therefore blocking the cluster deployment completion $ oc logs cloud-credential-operator-69479545fc-mlcn7 -n openshift-cloud- credential-operator -f time="2020-04-16T00:52:03Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics time="2020-04-16T00:52:03Z" level=info msg="reconcile complete" controller=metrics elapsed=1.660646ms time="2020-04-16T00:52:08Z" level=info msg="validating cloud cred secret" controller=secretannotator time="2020-04-16T00:52:08Z" level=debug msg="Loading infrastructure name: oc4poc-fw6q6" controller=secretannotator time="2020-04-16T00:52:08Z" level=warning msg="Action not allowed with tested creds" action="iam:CreateAccessKey" controller=secretannotator time="2020-04-16T00:52:08Z" level=warning msg="Action not allowed with tested creds" action="iam:CreateUser" controller=secretannotator time="2020-04-16T00:52:08Z" level=warning msg="Action not allowed with tested creds" action="iam:DeleteAccessKey" controller=secretannotator After this worker nodes are not deployed and there are no pending CSRs. The cluster stops deploying with operators depending on cloud credentials blocked Expected results: Openshift to install normally, maybe adding a mechanism to manually allow ignoring the validation of permissions for the provided AWS credentials
To moving to credm-minter. I think we need to fix this in the cluster before we fix it in the installer.
This is the same issue that I raised here https://bugzilla.redhat.com/show_bug.cgi?id=1815331
We are presently pursuing access to an SCP-enabled environment to investigate/replicate.
For interested parties, the enhancement proposal to allow bypassing pre-flight permissions checks posted at https://github.com/openshift/enhancements/pull/324
*** Bug 1832640 has been marked as a duplicate of this bug. ***
Summary of investigation was that the permissions simulation is indeed unreliable when SCP is being used in an account. The enhancement proposal https://github.com/openshift/enhancements/pull/324 is written to allow instructing the OpenShift installer to not perform permissions checks before the installation, and to indicate to the in-cluster cloud-credential-operator that it too should not perform permissions simulations (CCO must be told whether to run in 'mint' or 'passthrough' mode through).
waiting on enhancement to be approved
This bugfix is actively being worked on and we expect to complete this sprint.
The bug has fixed. test payload: 4.6.0-0.nightly-2020-08-09-151434 Test steps: 1. prepare a SCP account 2. Create install-config.yaml, choose the region(like us-east-1) where your credential has all permissions for a installation, and put "credentialMode: Mint" to install-config.yaml file force changing mode to Mint and bypass permissions checking ./openshift-install create install-config --dir demo 3. install a cluster ./openshift-install create cluster --dir demo --log-level debug The cluster can install successfully.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196
So we struggled for weeks trying to figure out what the issue is because the installer never threw an error that it's a permission issue. We have an SCP at org level that denies any action from any IAM user created even with Full Admin permission unless it has MFA enabled or has specific exception tag, had to do Phd on what every installer component does (red hat support was of no help) niether was AWS support able to figure out as the installer runs for 30 mins + and creates a log of logs and cloud trails. But we knew we are doing it right because it worked on an independent account. Long story short - please enable throwing error about permission (even if you chose to ignore it - so something like SCP Permission Issue - Access Denied - Ignoring..Continuing). Secondly, please help me raise a bug for how can we make the installer add a specific tag to the 6 IAM users Opensihft IPI creates so Org SCP doesn't restrict installation.
In my opinion, if cluster is in global SCP ploicy, we must mandatory/explicit set credentialsMode to mint in install-config file, then installer will not do any permisions check for such scenario under current code logic. this is more like an enhancement, will forward the requirement to Hive/CCO team. cc mworthin efried , could you help take a look?
And installer supports a field in install-config to add tags to all created resources, not sure if it meets your needs. ##### platform.aws.userTags: A map of keys and values that the installation program adds as tags to all resources that it creates. docs here: https://docs.openshift.com/container-platform/4.11/installing/installing_aws/installing-aws-customizations.html#installation-configuration-parameters-optional-aws_installing-aws-customizations
(In reply to wang lin from comment #25) > And installer supports a field in install-config to add tags to all created > resources, not sure if it meets your needs. > > ##### > platform.aws.userTags: A map of keys and values that the installation > program adds as tags to all resources that it creates. > > docs here: > https://docs.openshift.com/container-platform/4.11/installing/installing_aws/ > installing-aws-customizations.html#installation-configuration-parameters- > optional-aws_installing-aws-customizations @lwan Thank you so much for addressing this, this is exactly what I was looking for.
(In reply to wang lin from comment #25) > And installer supports a field in install-config to add tags to all created > resources, not sure if it meets your needs. > > ##### > platform.aws.userTags: A map of keys and values that the installation > program adds as tags to all resources that it creates. > > docs here: > https://docs.openshift.com/container-platform/4.11/installing/installing_aws/ > installing-aws-customizations.html#installation-configuration-parameters- > optional-aws_installing-aws-customizations Hi @lwan I jumped the gun there, upon updating the install-config it does apply the specific user tag I wish to be applied but it applies to all resources except the one I wanted which is IAM users (listed below) and these IAM users: kaifyoct10-d2tbw-aws-ebs-csi-driver-operator-sfcvz kaifyoct10-d2tbw-cloud-credential-operator-iam-ro-qql72 kaifyoct10-d2tbw-openshift-cloud-network-config-contro-c6w7n kaifyoct10-d2tbw-openshift-image-registry-zrmh6 kaifyoct10-d2tbw-openshift-ingress-pdf8j kaifyoct10-d2tbw-openshift-machine-api-aws-c8drd What can I do to make installer add the userTags automatically to these IAM users. Without this the IAM users never get the access to do anything due to global SCP and fail at DEBUG Still waiting for the cluster to initialize: Working towards 4.11.1: 770 of 802 done (96% complete) and the worker node instance never turns up and it fails as several operator fails to initialize. ______________________________________________________________________________________________________________________________________________________________________________ ______________________________________________________________________________________________________________________________________________________________________________ Optional extra infomration in case it comes handy: For current installation I went ahead and added the tags to these IAM users manually as I didn't want them to fail as then I had to wait for 40 mins for it to realize it can't progress and fails, but here's a installer debug log message on failure from past failures: ERROR Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::OAuthClientsController_SyncError::OAuthServerDeployment_PreconditionNotFulfilled::OAuthServerRouteEndpointAccessibleController_SyncError::OAuthServerServiceEndpointAccessibleController_SyncError::OAuthServerServiceEndpointsEndpointAccessibleController_SyncError::WellKnownReadyController_SyncError: IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server ERROR OAuthClientsControllerDegraded: no ingress for host oauth-openshift.apps.kaifyoct10.openshift2.XXXXXXengineering.net in route oauth-openshift in namespace openshift-authentication ERROR OAuthServerDeploymentDegraded: waiting for the oauth-openshift route to contain an admitted ingress: no admitted ingress for route oauth-openshift in namespace openshift-authentication ERROR OAuthServerDeploymentDegraded: ERROR OAuthServerRouteEndpointAccessibleControllerDegraded: route "openshift-authentication/oauth-openshift": status does not have a valid host address ERROR OAuthServerServiceEndpointAccessibleControllerDegraded: Get "https://172.30.46.33:443/healthz": dial tcp 172.30.46.33:443: connect: connection refused ERROR OAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: oauth service endpoints are not ready ERROR WellKnownReadyControllerDegraded: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap "oauth-openshift" not found (check authentication operator, it is supposed to create this) INFO Cluster operator authentication Available is False with OAuthServerDeployment_PreconditionNotFulfilled::OAuthServerRouteEndpointAccessibleController_ResourceNotFound::OAuthServerServiceEndpointAccessibleController_EndpointUnavailable::OAuthServerServiceEndpointsEndpointAccessibleController_ResourceNotFound::ReadyIngressNodes_NoReadyIngressNodes::WellKnown_NotReady: OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve route from cache: route.route.openshift.io "oauth-openshift" not found INFO OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.46.33:443/healthz": dial tcp 172.30.46.33:443: connect: connection refused INFO OAuthServerServiceEndpointsEndpointAccessibleControllerAvailable: endpoints "oauth-openshift" not found INFO ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes, 3 master nodes, 0 custom target nodes (none are schedulable or ready for ingress pods). INFO WellKnownAvailable: The well-known endpoint is not yet available: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap "oauth-openshift" not found (check authentication operator, it is supposed to create this) INFO Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected ERROR Cluster operator console Degraded is True with DefaultRouteSync_FailedAdmitDefaultRoute::RouteHealth_RouteNotAdmitted::SyncLoopRefresh_FailedIngress: DefaultRouteSyncDegraded: no ingress for host downloads-kaifyoct10.openshift2.XXXXXXengineering.net in route downloads in namespace openshift-console ERROR RouteHealthDegraded: console route is not admitted ERROR SyncLoopRefreshDegraded: no ingress for host console-kaifyoct10.openshift2.XXXXXXengineering.net in route console in namespace openshift-console INFO Cluster operator console Available is False with RouteHealth_RouteNotAdmitted: RouteHealthAvailable: console route is not admitted INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required INFO Cluster operator image-registry Available is False with DeploymentNotFound: Available: The deployment does not exist INFO NodeCADaemonAvailable: The daemon set node-ca has available replicas INFO ImagePrunerAvailable: Pruner CronJob has been created INFO Cluster operator image-registry Progressing is True with Error: Progressing: Unable to apply resources: unable to sync storage configuration: AccessDenied: Access Denied INFO Progressing: status code: 403, request id: D7V0MW8YZYMPVKKZ, host id: fHTFjql1qr7CdRk4CeQN0SuRTWYI9COjXEgHQeM8wg3ymLMQGFcTl1qtytiR/zBkP0AEWJMYZmg= ERROR Cluster operator image-registry Degraded is True with Unavailable: Degraded: The deployment does not exist INFO Cluster operator ingress Available is False with IngressUnavailable: The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DNSReady=False (NoZones: The record isn't present in any zones.) INFO Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available. ERROR Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-78f7456b4-c7xpx" cannot be scheduled: 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling. Pod "router-default-78f7456b4-4ppf2" cannot be scheduled: 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling. Make sure you have sufficient worker nodes.), DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1), DNSReady=False (NoZones: The record isn't present in any zones.), CanaryChecksSucceeding=Unknown (CanaryRouteNotAdmitted: Canary route is not admitted by the default ingress controller) INFO Cluster operator insights ClusterTransferAvailable is False with NoClusterTransfer: no available cluster transfer INFO Cluster operator insights Disabled is False with AsExpected: INFO Cluster operator insights SCAAvailable is False with NotFound: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 404: {"code":"ACCT-MGMT-7","href":"/api/accounts_mgmt/v1/errors/7","id":"7","kind":"Error","operation_id":"d6f0af7b-11aa-442c-b50b-1e06e938f78b","reason":"The organization (id= 2DCYQ3e4fX8DBIQv9fR1ncF5rTR) does not have any certificate of type sca. Enable SCA at https://access.redhat.com/management."} INFO Cluster operator monitoring Available is False with UpdatingPrometheusOperatorFailed: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack. ERROR Cluster operator monitoring Degraded is True with UpdatingPrometheusOperatorFailed: Failed to rollout the stack. Error: updating prometheus operator: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 2 unavailable replicas INFO Cluster operator network ManagementStateDegraded is False with : INFO Cluster operator network Progressing is True with Deploying: Deployment "/openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready ERROR Cluster initialization failed because one or more operators are not functioning properly. ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation ERROR failed to initialize the cluster: Some cluster operators are still updating: authentication, console, image-registry, ingress, monitoring
re https://bugzilla.redhat.com/show_bug.cgi?id=1829101#c24 I'm not as savvy with CCO as @abutcher, so I'm tagging him and stepping out.
Hi @lwan could you please check my update above, would there be anyway to add tags to IAM users that the installer creates?
I see that each of these 6 IAM users have this tag on them added by installer: Tag key | Tag value kubernetes.io/cluster/kaifyoct13-l7x5c owned which is what I think you have as shared tag on the file github openshift/installer/pkg/asset/cluster/aws/aws.go (line 23 and 95) func sharedTag(clusterID string) (string, string) { return fmt.Sprintf("kubernetes.io/cluster/%s", clusterID), "shared" } point is if installer can add this tag to the IAM users then it can definitely add userTags as well, and further enhancement could be to have option in the install-config to specifify which resources to apply what tag (but that could be a lengthy work) for now even if the installer just adds the userTags to IAM user as well that would be great. Thanks!
I launched a cluster with userTags and there is no tag added on IAM users, I suspect it's because the IAM users are created by CCO(Cloud Credential Operator), not installer, and I don't find a place to add tags for CCO, me and dgoodwin has moved off from CCO team. @jshu Could you help to check with CCO devs in slack to see any updates or solution? thanks
Hi, I'm the QE for CCO. This BZ was closed and the above sounds like new request for CCO i.e. tagging the IAM users. Can you create new ticket with all required info on CCO project to get timely feedback? https://issues.redhat.com/projects/CCO/issues
As far as I can tell CCO does not currently support setting user defined tags for the IAM users created in CCO's mint operation mode. I found this RFE https://issues.redhat.com/browse/CCO-78 which seems like the same ask.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days