Bug 1862065

Summary: [aws-custom-region] error "listing hosted zones: SignatureDoesNotMatch" occurred when creating cluster in af-south-1 region
Product: OpenShift Container Platform Reporter: Yunfei Jiang <yunjiang>
Component: InstallerAssignee: Abhinav Dahiya <adahiya>
Installer sub component: openshift-installer QA Contact: Yunfei Jiang <yunjiang>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: high CC: adahiya, dhansen, hongli
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-12 21:15:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
install log (IAM)
none
install log without IAM and route53 none

Description Yunfei Jiang 2020-07-30 10:40:46 UTC
OCP 4.6 supports custom endpoints, cluster could be installed on a public regions without native support for RHCOS
https://github.com/openshift/enhancements/blob/master/enhancements/installer/aws-custom-region-and-endpoints.md


Try to install a cluster into af-south-1, get following error:

time="2020-07-30T08:57:06Z" level=debug msg="resolved AWS service route53 (af-south-1) to \"https://route53.amazonaws.com\""
time="2020-07-30T08:57:06Z" level=fatal msg="failed to fetch Common Manifests: failed to fetch dependency of \"Common Manifests\": failed to generate asset \"DNS Config\": getting public zone for \"qe.devcluster.openshift.com\": listing hosted zones: SignatureDoesNotMatch: Credential should be scoped to a valid region, not 'af-south-1'. \n\tstatus code: 403, request id: 352dc7d4-a5ae-4eb7-aa41-373c3c8f170c"

install config:
---
apiVersion: v1
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
metadata:
  name: yunjiang-af2
platform:
  aws:
    region: af-south-1
    serviceEndpoints:
    - name: ec2
      url: https://ec2.af-south-1.amazonaws.com
    - name: elasticloadbalancing
      url: https://elasticloadbalancing.af-south-1.amazonaws.com
    - name: s3
      url: https://s3.af-south-1.amazonaws.com
    - name: iam
      url: https://iam.af-south-1.amazonaws.com
    - name: tagging
      url: https://tagging.af-south-1.amazonaws.com
    - name: route53
      url: https://route53.amazonaws.com
pullSecret: HIDDEN
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  serviceNetwork:
  - 172.30.0.0/16
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
publish: External
baseDomain: qe.devcluster.openshift.com
sshKey: HIDDEN


Version-Release number of the following components:
Cluster version is 4.6.0-0.nightly-2020-07-25-091217

How reproducible:
100%

Steps to Reproduce:
1. Create install config
2. Set up endpoints:
  aws:
    region: af-south-1
    serviceEndpoints:
    - name: ec2
      url: https://ec2.af-south-1.amazonaws.com
    - name: elasticloadbalancing
      url: https://elasticloadbalancing.af-south-1.amazonaws.com
    - name: s3
      url: https://s3.af-south-1.amazonaws.com
    - name: iam
      url: https://iam.af-south-1.amazonaws.com
    - name: tagging
      url: https://tagging.af-south-1.amazonaws.com
    - name: route53
      url: https://route53.amazonaws.com
3. create cluster

Actual results:
SignatureDoesNotMatch: Credential should be scoped to a valid region, not 'af-south-1'. 

Expected results:
No errors, create cluster successfully

Additional info:
4.6.0-0.nightly-2020-07-25-091217 could be installed successfully in us-east-2

Comment 3 Yunfei Jiang 2020-08-03 15:56:24 UTC
verified on 4.6.0-0.nightly-2020-08-02-091622 - FAILED

the original route53 problem was resolved, but got IAM problem (install log is attached), error log:

time="2020-08-03T15:00:49Z" level=debug msg="module.dns.aws_route53_record.api_internal: Creation complete after 57s [id=Z00683592YKJ1PSSGQYWP_api-int.yunjiang-af03bz.qe.devcluster.openshift.com_A]"
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error msg="Error: Error creating IAM Role yunjiang-af03bz-fr9mc-bootstrap-role: SignatureDoesNotMatch: Credential should be scoped to a valid region, not 'af-south-1'. "
time="2020-08-03T15:00:49Z" level=error msg="\tstatus code: 403, request id: 4213a2c0-6b69-4731-8483-b683c7cfd00c"
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error msg="  on ../../../../../tmp/openshift-install-787325111/bootstrap/main.tf line 51, in resource \"aws_iam_role\" \"bootstrap\":"
time="2020-08-03T15:00:49Z" level=error msg="  51: resource \"aws_iam_role\" \"bootstrap\" {"
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error msg="Error: Error creating IAM Role yunjiang-af03bz-fr9mc-worker-role: SignatureDoesNotMatch: Credential should be scoped to a valid region, not 'af-south-1'. "
time="2020-08-03T15:00:49Z" level=error msg="\tstatus code: 403, request id: b9fbb43c-cbfa-45c1-80a4-ea2a09029197"
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error msg="  on ../../../../../tmp/openshift-install-787325111/iam/main.tf line 13, in resource \"aws_iam_role\" \"worker_role\":"
time="2020-08-03T15:00:49Z" level=error msg="  13: resource \"aws_iam_role\" \"worker_role\" {"
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error msg="Error: Error creating IAM Role yunjiang-af03bz-fr9mc-master-role: SignatureDoesNotMatch: Credential should be scoped to a valid region, not 'af-south-1'. "
time="2020-08-03T15:00:49Z" level=error msg="\tstatus code: 403, request id: 345e615d-9b91-4154-865f-3718a567588c"
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error msg="  on ../../../../../tmp/openshift-install-787325111/master/main.tf line 17, in resource \"aws_iam_role\" \"master_role\":"
time="2020-08-03T15:00:49Z" level=error msg="  17: resource \"aws_iam_role\" \"master_role\" {"
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error msg="Error: Error creating EIP: AddressLimitExceeded: The maximum number of addresses has been reached."
time="2020-08-03T15:00:49Z" level=error msg="\tstatus code: 400, request id: f2f41aa5-d18a-44c9-8a82-b1339d578f08"
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error msg="  on ../../../../../tmp/openshift-install-787325111/vpc/vpc-public.tf line 68, in resource \"aws_eip\" \"nat_eip\":"
time="2020-08-03T15:00:49Z" level=error msg="  68: resource \"aws_eip\" \"nat_eip\" {"
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=error
time="2020-08-03T15:00:49Z" level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply Terraform: failed to complete the change"

config:
...
platform:
  aws:
    region: af-south-1
    serviceEndpoints:
    - name: ec2
      url: https://ec2.af-south-1.amazonaws.com
    - name: elasticloadbalancing
      url: https://elasticloadbalancing.af-south-1.amazonaws.com
    - name: s3
      url: https://s3.af-south-1.amazonaws.com
    - name: iam
      url: https://iam.amazonaws.com
    - name: tagging
      url: https://tagging.af-south-1.amazonaws.com
    - name: route53
      url: https://route53.amazonaws.com

Comment 4 Yunfei Jiang 2020-08-03 15:57:16 UTC
Created attachment 1703722 [details]
install log (IAM)

Comment 5 Abhinav Dahiya 2020-08-03 18:16:54 UTC
Can you try without the IAM and route53 endpoints because for public regions these are already known and probably do not need the override.

```
platform:
  aws:
    region: af-south-1
    serviceEndpoints:
    - name: ec2
      url: https://ec2.af-south-1.amazonaws.com
    - name: elasticloadbalancing
      url: https://elasticloadbalancing.af-south-1.amazonaws.com
    - name: s3
      url: https://s3.af-south-1.amazonaws.com
    - name: tagging
      url: https://tagging.af-south-1.amazonaws.com
```

Comment 6 Yunfei Jiang 2020-08-05 09:23:40 UTC
it works after removing IAM and route53 endpoints

the bootstrap process completed:

time="2020-08-05T08:11:43Z" level=info msg="API v4.6.0-202008031851.p0-dirty up"
time="2020-08-05T08:11:43Z" level=info msg="Waiting up to 30m0s for bootstrapping to complete..."
time="2020-08-05T08:27:25Z" level=debug msg="Bootstrap status: complete"
time="2020-08-05T08:27:25Z" level=info msg="Destroying the bootstrap resources..."

but the install process failed due to some operators error:

time="2020-08-05T08:59:25Z" level=fatal msg="failed to initialize the cluster: Cluster operator console is reporting a failure: RouteHealthDegraded: failed to GET route (https://console-openshift-console.apps.yunjiang-05bug.qe.devcluster.openshift.com/health): Get \"https://console-openshift-console.apps.yunjiang-05bug.qe.devcluster.openshift.com/health\": dial tcp: lookup console-openshift-console.apps.yunjiang-05bug.qe.devcluster.openshift.com on 172.30.0.10:53: no such host"

probably it is another issue, install log is attached.

Comment 7 Yunfei Jiang 2020-08-05 09:24:20 UTC
Created attachment 1710489 [details]
install log without IAM and route53

Comment 8 Yunfei Jiang 2020-08-05 10:46:54 UTC
seems the ingress operator is affected by the service endpoints, please refer to https://bugzilla.redhat.com/show_bug.cgi?id=1866299

Comment 9 Daneyon Hansen 2020-08-06 17:35:28 UTC
Please see https://bugzilla.redhat.com/show_bug.cgi?id=1866299#c2

Comment 10 Hongan Li 2020-08-11 02:49:27 UTC
see https://bugzilla.redhat.com/show_bug.cgi?id=1866299#c4
It works well after updating the tagging endpoint as below:
      - name: tagging
        url: https://tagging.us-east-1.amazonaws.com

Comment 11 Daneyon Hansen 2020-08-12 21:15:37 UTC

*** This bug has been marked as a duplicate of bug 1866299 ***

Comment 12 Yunfei Jiang 2020-08-18 09:25:43 UTC
the issue as described is still there:

1. according to the document [1], ec2/elb/s3/iam/tagging/route53 could be provided by user, if user provide above endpoints correctly (even it overrides the default), the cluster should be installed successfully. 
2. according to comments [2][3], following config could work
<--snip-->
      serviceEndpoints:
      - name: ec2
        url: https://ec2.af-south-1.amazonaws.com
      - name: elasticloadbalancing
        url: https://elasticloadbalancing.af-south-1.amazonaws.com
      - name: s3
        url: https://s3.af-south-1.amazonaws.com
      - name: tagging
        url: https://tagging.us-east-1.amazonaws.com
<--snip-->

3. tried to install using following config (note for tagging endpoint, it's `us-east-1` as comment [2] mentioned)
<--snip-->
platform:
  aws:
    region: af-south-1
    serviceEndpoints:
    - name: ec2
      url: https://ec2.af-south-1.amazonaws.com
    - name: elasticloadbalancing
      url: https://elasticloadbalancing.af-south-1.amazonaws.com
    - name: s3
      url: https://s3.af-south-1.amazonaws.com
    - name: iam
      url: https://iam.amazonaws.com
    - name: tagging
      url: https://tagging.us-east-1.amazonaws.com
    - name: route53
      url: https://route53.amazonaws.com
<--snip-->

got following error:

level=error msg="Error: Error creating IAM Role yunjiang-18af4-6p56w-bootstrap-role: SignatureDoesNotMatch: Credential should be scoped to a valid region, not 'af-south-1'. " 

Is this the same issue as bug 1866299 described? Will the 'SignatureDoesNotMatch' issue be fixed?

Need Daneyon and Abhinav to confirm, thanks.


[1] https://github.com/openshift/enhancements/blob/master/enhancements/installer/aws-custom-region-and-endpoints.md
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1866299#c2
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1866299#c4

Comment 13 Red Hat Bugzilla 2023-09-14 06:04:33 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days