Hide Forgot
Config CCO in manual mode, install a private cluster in us-gov-east-1 region: level=fatal msg="failed to fetch Master Machines: failed to generate asset \"Master Machines\": creating AWS session: fetching availability zones: AuthFailure: AWS was not able to validate the provided access credentials\n\tstatus code: 401, request id: 5e8be396-a673-4e6d-9225-d0a08ded83fe” The cluster with same configuration can be installed in us-gov-west-1 successfully. Version-Release number of the following components: 4.6.0-0.nightly-2020-09-21-030155 How reproducible: Always Steps to Reproduce: 1. Create 4.6 cluster with CCO in manual mode in us-gov-east-1, refer to https://github.com/openshift/cloud-credential-operator/blob/master/docs/mode-manual-creds.md Actual results: AuthFailure: AWS was not able to validate the provided access credentials Expected results: Cluster can be installed successfully Additional info: ======= update on Sep. 30 The latest description & errors please refer to Comment 14
Manual creds for CCO doesn't mean the user doesn't need to provide credentials to the installer to communicate with AWS APIs. Manual mode is only for cluster operators. ``` level=fatal msg="failed to fetch Master Machines: failed to generate asset \"Master Machines\": creating AWS session: fetching availability zones: AuthFailure: AWS was not able to validate the provided access credentials\n\tstatus code: 401, request id: 5e8be396-a673-4e6d-9225-d0a08ded83fe” ``` Please make sure the installer has access to credentials with appropriate permissions to call AWS APIs. Moving to CCO team to triage/update the docs.
Abhinav: CCO docs don't really cover what the installer needs or doesn't, is there something in particular that is inaccurate in the CCO docs? Is this just a request to clarify in the manual mode docs in our repo, and product documentation, that while you can put the CCO into manual mode, you still need to provide the installer with a credential?
I followed up this document [1] to config CCO in manual mode, and install cluster. This works in `us-east-2`, `us-gov-west-1` regions, as I mentioned in description: The cluster with same configuration can be installed in us-gov-west-1 successfully. Per my understanding, `us-gov-west-1` and `us-gov-east-1` use the same IAM service, if it works in `us-gov-west-1`, it should work in `us-gov-east-1`. >> FYI, Secrets I provided: openshift-cloud-credential-operator/cloud-credential-operator-iam-ro-creds "iam:GetUserPolicy", "iam:GetUser", "iam:ListAccessKeys" openshift-ingress-operator/cloud-credentials "elasticloadbalancing:DescribeLoadBalancers", "route53:ListHostedZones", "route53:ChangeResourceRecordSets", "tag:GetResources" openshift-machine-api/aws-cloud-credentials "ec2:CreateTags", "ec2:DescribeAvailabilityZones", "ec2:DescribeDhcpOptions", "ec2:DescribeImages", "ec2:DescribeInstances", "ec2:DescribeSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeVpcs", "ec2:RunInstances", "ec2:TerminateInstances", "elasticloadbalancing:DescribeLoadBalancers", "elasticloadbalancing:DescribeTargetGroups", "elasticloadbalancing:RegisterInstancesWithLoadBalancer", "elasticloadbalancing:RegisterTargets", "iam:PassRole", "iam:CreateServiceLinkedRole" openshift-image-registry/installer-cloud-credentials "s3:CreateBucket", "s3:DeleteBucket", "s3:PutBucketTagging", "s3:GetBucketTagging", "s3:PutBucketPublicAccessBlock", "s3:GetBucketPublicAccessBlock", "s3:PutEncryptionConfiguration", "s3:GetEncryptionConfiguration", "s3:PutLifecycleConfiguration", "s3:GetLifecycleConfiguration", "s3:GetBucketLocation", "s3:ListBucket", "s3:HeadBucket", "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucketMultipartUploads", "s3:AbortMultipartUpload" openshift-cloud-credential-operator/cloud-credential-operator-s3 "s3:PutObject", "s3:PutBucketTagging", "s3:CreateBucket", "s3:PutObjectAcl" openshift-cluster-csi-drivers/ebs-cloud-credentials "ec2:DetachVolume", "ec2:AttachVolume", "ec2:ModifyVolume", "ec2:DeleteSnapshot", "ec2:DescribeInstances", "ec2:DeleteTags", "ec2:DescribeTags", "ec2:CreateTags", "ec2:DescribeVolumesModifications", "ec2:DescribeSnapshots", "ec2:CreateVolume", "ec2:DeleteVolume", "ec2:DescribeVolumes", "ec2:CreateSnapshot" [1] https://github.com/openshift/cloud-credential-operator/blob/master/docs/mode-manual-creds.md
Correct Secrets in Comment 3 , sorry for the confused info. >> openshift-cloud-credential-operator/cloud-credential-operator-iam-ro-creds { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "iam:GetUserPolicy", "iam:GetUser", "iam:ListAccessKeys" ], "Resource": "*" } ] } >> openshift-ingress-operator/cloud-credentials { "Version": "2012-10-17", "Statement": [ { "Action": [ "elasticloadbalancing:DescribeLoadBalancers", "route53:ListHostedZones", "route53:ChangeResourceRecordSets", "tag:GetResources" ], "Effect": "Allow", "Resource": "*" } ] } >> openshift-machine-api/aws-cloud-credentials { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ec2:CreateTags", "ec2:DescribeAvailabilityZones", "ec2:DescribeDhcpOptions", "ec2:DescribeImages", "ec2:DescribeInstances", "ec2:DescribeSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeVpcs", "ec2:RunInstances", "ec2:TerminateInstances", "elasticloadbalancing:DescribeLoadBalancers", "elasticloadbalancing:DescribeTargetGroups", "elasticloadbalancing:RegisterInstancesWithLoadBalancer", "elasticloadbalancing:RegisterTargets", "iam:PassRole", "iam:CreateServiceLinkedRole" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey", "kms:GenerateDataKeyWithoutPlainText", "kms:DescribeKey" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "kms:RevokeGrant", "kms:CreateGrant", "kms:ListGrants" ], "Resource": "*", "Condition": { "Bool": { "kms:GrantIsForAWSResource": true } } } ] } >> openshift-image-registry/installer-cloud-credentials { "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:CreateBucket", "s3:DeleteBucket", "s3:PutBucketTagging", "s3:GetBucketTagging", "s3:PutBucketPublicAccessBlock", "s3:GetBucketPublicAccessBlock", "s3:PutEncryptionConfiguration", "s3:GetEncryptionConfiguration", "s3:PutLifecycleConfiguration", "s3:GetLifecycleConfiguration", "s3:GetBucketLocation", "s3:ListBucket", "s3:HeadBucket", "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucketMultipartUploads", "s3:AbortMultipartUpload" ], "Effect": "Allow", "Resource": "*" } ] } >> openshift-cloud-credential-operator/cloud-credential-operator-s3 { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:PutBucketTagging", "s3:CreateBucket", "s3:PutObjectAcl" ], "Resource": "*" } ] } >> openshift-cluster-csi-drivers/ebs-cloud-credentials { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "ec2:DetachVolume", "ec2:AttachVolume", "ec2:ModifyVolume", "ec2:DeleteSnapshot", "ec2:DescribeInstances", "ec2:DeleteTags", "ec2:DescribeTags", "ec2:CreateTags", "ec2:DescribeVolumesModifications", "ec2:DescribeSnapshots", "ec2:CreateVolume", "ec2:DeleteVolume", "ec2:DescribeVolumes", "ec2:CreateSnapshot" ], "Resource": "*" } ] }
If you were to take these credentials and: 1. configure them on a local workstation 2. configure the aws CLI to use a govcloud endpoint, I've never done this but maybe with --endpoint-url on the command below? 3. use the aws CLI to list availability zones: aws ec2 describe-availability-zones Does this work in us-gov-east-1, us-gov-east-2, or both?
My suspicion here is that there is absolutely nothing we can fix in CCO for this. Either: (1) We are mistaken and us-gov-east-1 credentials can't be used in us-gov-west-1 (2) Credentials can be used in both but the AWS IAM simulation API is broken and tells you you can, even though you actually can. Neither can be fixed by CCO. If it's (2) then we can use the new force mode functionality in the InstallConfig and CloudCredential config that is present in latest 4.5 releases. I don't think this qualifies as a 4.6 blocker unless more information becomes available so I'm going to take it off the list for now, but we will keep trying to get to the bottom of what Yunfei is experiencing.
Devan, I configured machine api credential as my default aws profile (export AWS_PROFILE=xxx): weather I specify endpoint or not in east or west region, the credentials work well: >> aws --region us-gov-east-1 --endpoint-url https://ec2.us-gov-east-1.amazonaws.com ec2 describe-availability-zones >> aws --region us-gov-east-1 ec2 describe-availability-zones { "AvailabilityZones": [ { "State": "available", "OptInStatus": "opt-in-not-required", "Messages": [], "RegionName": "us-gov-east-1", "ZoneName": "us-gov-east-1a", "ZoneId": "usge1-az1", "GroupName": "us-gov-east-1", "NetworkBorderGroup": "us-gov-east-1", "ZoneType": "availability-zone" }, { "State": "available", "OptInStatus": "opt-in-not-required", "Messages": [], "RegionName": "us-gov-east-1", "ZoneName": "us-gov-east-1b", "ZoneId": "usge1-az2", "GroupName": "us-gov-east-1", "NetworkBorderGroup": "us-gov-east-1", "ZoneType": "availability-zone" }, { "State": "available", "OptInStatus": "opt-in-not-required", "Messages": [], "RegionName": "us-gov-east-1", "ZoneName": "us-gov-east-1c", "ZoneId": "usge1-az3", "GroupName": "us-gov-east-1", "NetworkBorderGroup": "us-gov-east-1", "ZoneType": "availability-zone" } ] } >> aws --region us-gov-west-1 --endpoint-url https://ec2.us-gov-west-1.amazonaws.com ec2 describe-availability-zones >> aws --region us-gov-west-1 ec2 describe-availability-zones { "AvailabilityZones": [ { "State": "available", "OptInStatus": "opt-in-not-required", "Messages": [], "RegionName": "us-gov-west-1", "ZoneName": "us-gov-west-1a", "ZoneId": "usgw1-az1", "GroupName": "us-gov-west-1", "NetworkBorderGroup": "us-gov-west-1", "ZoneType": "availability-zone" }, { "State": "available", "OptInStatus": "opt-in-not-required", "Messages": [], "RegionName": "us-gov-west-1", "ZoneName": "us-gov-west-1b", "ZoneId": "usgw1-az2", "GroupName": "us-gov-west-1", "NetworkBorderGroup": "us-gov-west-1", "ZoneType": "availability-zone" }, { "State": "available", "OptInStatus": "opt-in-not-required", "Messages": [], "RegionName": "us-gov-west-1", "ZoneName": "us-gov-west-1c", "ZoneId": "usgw1-az3", "GroupName": "us-gov-west-1", "NetworkBorderGroup": "us-gov-west-1", "ZoneType": "availability-zone" } ] } >> but if I specified an incorrect endpoint oe region: aws --region us-gov-east-1 --endpoint-url https://ec2.us-gov-west-1.amazonaws.com ec2 describe-availability-zones An error occurred (AuthFailure) when calling the DescribeAvailabilityZones operation: AWS was not able to validate the provided access credentials We config CCO in manual mode when installing disconnected + private clusters, which is an important configuration for customer, even it works in us-gov-west-1 region, but it is missing in east region. I think it is better if we can fix this issue before 4.6 GA, otherwise, we need to document in release note.
Yunfei when you said if you specified an incorrect endpoint, your command looks identical to the one that was working above: aws --region us-gov-east-1 --endpoint-url https://ec2.us-gov-west-1.amazonaws.com ec2 describe-availability-zones But did you mean that if you change the region or the endpoint in this command above to something invalid, you get the same error message you're seeing in this attempt to install? Could you walk through the precise steps you take to install a cluster into us-gov-east-1? I'm wondering if there could have been a typo in the region or endpoint you configured when installing the cluster?
Just noticed the commands you provided did differ slightly, first time I couldn't see it. Working: aws --region us-gov-east-1 --endpoint-url https://ec2.us-gov-east-1.amazonaws.com ec2 describe-availability-zones Failing: aws --region us-gov-east-1 --endpoint-url https://ec2.us-gov-west-1.amazonaws.com ec2 describe-availability-zones So mismatching the region and the endpoint URL seems to simulate the failure you're getting in the install? Is it possible this was what happened in the installconfig?
(In reply to Devan Goodwin from comment #9) > Just noticed the commands you provided did differ slightly, first time I > couldn't see it. > > Working: aws --region us-gov-east-1 --endpoint-url > https://ec2.us-gov-east-1.amazonaws.com ec2 describe-availability-zones > Failing: aws --region us-gov-east-1 --endpoint-url > https://ec2.us-gov-west-1.amazonaws.com ec2 describe-availability-zones > > So mismatching the region and the endpoint URL seems to simulate the failure > you're getting in the install? yes, mismatching endpoint and region can cause above error. >> Is it possible this was what happened in the installconfig? I did not specify any endpoints in install-config, just set region as us-gov-east-1 or us-gov-west-1.
Ok possible clue, the Installer which vendors github.com/aws/aws-sdk-go v1.32.3 has some defaults for region to endpoint: https://github.com/openshift/installer/blob/master/vendor/github.com/aws/aws-sdk-go/aws/endpoints/defaults.go#L7328 cloud-credential operator which vendors github.com/aws/aws-sdk-go v1.30.5 does not have this section: https://github.com/openshift/cloud-credential-operator/blob/master/vendor/github.com/aws/aws-sdk-go/aws/endpoints/defaults.go
Abhinav I think this was prematurely sent to us, with the use of manual mode it might initially look like Yunfei did not give the installer a valid credential, however I don't think this was the case. It looks like: - Yunfei's credential is good and working in us-gov-west-1. - A credential good for us-gov-west-1 should also work in us-gov-east-1. - The failing code is the installer's AZ lookup probably for generating MachineSets. I don't know if it's related for sure, but Yunfei can reproduce the error message by doing an aws describe AZs command and mismatching the region with the endpoint, i.e.: aws --region us-gov-east-1 --endpoint-url https://ec2.us-gov-west-1.amazonaws.com ec2 describe-availability-zones This may be a red herring, but is it possible the sdk is somehow mapping the given region us-gov-east-1 to the wrong endpoint? It does look like this mapping is done internally in AWS sdk. So it looks like there may be some kind of an issue here but at this point, it's surfacing in the installer, and we have not yet made it to any CCO code. Documentation doesn't seem to be the issue here at this point and I suspect it may have nothing to do with manual mode.
Here's how i created ignition configs for us-gov-east-1 to test the failure in creating machinesets. 1. Create an install config ``` $ yq m -CP -x aws-install-config.yaml elide-install-config.yaml apiVersion: v1 baseDomain: devcluster.openshift.com metadata: name: adahiya-2 platform: aws: region: us-gov-east-1 amiID: ami-96c6f8f7 publish: Internal pullSecret: "" sshKey: "" ``` 2. Create ignition configs ``` $ (rm -rf dev && mkdir -p dev && cp aws-install-config.yaml dev/install-config.yaml) && OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE= ./bin/openshift-install --dir dev create ignition-configs INFO Consuming Install Config from target directory INFO Credentials loaded from the "govcloud" profile in file "/home/adahiya/.aws/credentials" INFO Ignition-Configs created in: dev and dev/auth ``` So there is no bug in installer for using the default mode in us-gov-east-1 Therefore the reason I moved it to the cco team was because the reporter pointed to the docs in CCO repo and it looks like the docs are not making it clear the the credentialsMode manual is only for cluster operators and the installer still needs to be provided with appropriate credentials to communicate with the API like any other type of workflow. @yunjiang - Can you provide reproducer steps? - Also include the .openshift_install.log and the install-config.yaml that was used.
Created attachment 1717799 [details] comment 14 install log
Created attachment 1717800 [details] comment 14 VPC cloudformation template
Hello Brandi, Thanks for the update, LGTM.
Tagging with UpcomingSprint while investigation is either ongoing or pending. Will be considered for earlier release versions when diagnosed and resolved.
Needinfo answered in comment #22.
Closing since https://github.com/openshift/openshift-docs/pull/26300 has merged.
I’m reassigning to the docs team to address https://bugzilla.redhat.com/show_bug.cgi?id=1881262#c27.
After bug 1892129 bug 1921901 and bug 1903226 have been fixed, I re-visited this issue, went through the whole process and fixed an automation issue, now the disconnect and private cluster could be installed successfully in us-gov-east-1. verified version: 4.6.0-0.nightly-2021-02-04-203135 and 4.7.0-0.nightly-2021-02-05-005950