Bug 2064723

Summary: Installation fails in ap-southeast-3 region
Product: OpenShift Container Platform Reporter: Kirk Bater <kbater>
Component: Cloud ComputeAssignee: Joel Speed <jspeed>
Cloud Compute sub component: Other Providers QA Contact: sunzhaohua <zhsun>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: cblecker, dmoiseev, mstaeble
Version: 4.9Keywords: ServiceDeliveryImpact
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-18 14:25:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
AP-Southeast-3 Must Gather none

Description Kirk Bater 2022-03-16 12:30:10 UTC
Created attachment 1866188 [details]
AP-Southeast-3 Must Gather

Version:

$ openshift-install version
v4.9.0

Platform:
AWS/OSD

Please specify:
* IPI

What happened?

When attempting to create a cluster in the new ap-southeast-3 region, the installer gets past the bootstrap phase and fails to create worker nodes.

Must gather is attached, I forgot to get the install logs specifically.

When I was digging into the cluster, I saw the following errors in the machine-api-operator machine-controller pod:

I0315 17:09:29.768070       1 controller.go:174] kirk-aps3-nqj8z-worker-ap-southeast-3a-lhfpv: reconciling Machine
I0315 17:09:29.768081       1 actuator.go:104] kirk-aps3-nqj8z-worker-ap-southeast-3a-lhfpv: actuator checking if machine exists
E0315 17:09:29.768650       1 controller.go:281] kirk-aps3-nqj8z-worker-ap-southeast-3a-lhfpv: failed to check if machine exists: kirk-aps3-nqj8z-worker-ap-southeast-3a-lhfpv: failed to create scope for machine: failed to create aws client: region "ap-southeast-3" not resolved: UnknownEndpointError: could not resolve endpoint
        partition: "all partitions", service: "ec2", region: "ap-southeast-3"
E0315 17:09:29.776716       1 controller.go:304] controller-runtime/manager/controller/machine_controller "msg"="Reconciler error" "error"="kirk-aps3-nqj8z-worker-ap-southeast-3a-lhfpv: failed to create scope for machine: failed to create aws client: region \"ap-southeast-3\" not resolved: UnknownEndpointError: could not resolve endpoint\n\tpartition: \"all partitions\", service: \"ec2\", region: \"ap-southeast-3\"" "name"="kirk-aps3-nqj8z-worker-ap-southeast-3a-lhfpv" "namespace"="openshift-machine-api"

What did you expect to happen?

Worker nodes to be spun up and installation finished.

How to reproduce it (as minimally and precisely as possible)?

Create a cluster in OCM with the following json and then wait for install to fail

$ cat << EOF > ocm.json
{
    "name": "[CLUSTER_NAME]",
    "ccs": {
        "enabled": true
    },
    "region": {
        "id": "ap-southeast-3"
    },
    "version": {
    "id": "openshift-v4.9.21",
    "channel_group": "stable"
     },
    "aws": {
        "account_id": "[ACCOUNT ID]",
        "access_key_id": "[ACCESS_KEY_ID]",
        "secret_access_key": "[SECRET_ACCESS_KEY]"
    }
}
EOF
$ ocm post /api/clusters_mgmt/v1/clusters --body ocm.json
...
wait for install to fail


Anything else we need to know?

Comment 1 Matthew Staebler 2022-03-16 13:04:51 UTC
See https://issues.redhat.com/browse/OCPCLOUD-1159.

Comment 2 dmoiseev 2022-03-18 14:25:19 UTC
The same bug already exists: https://bugzilla.redhat.com/show_bug.cgi?id=2065510

*** This bug has been marked as a duplicate of bug 2065510 ***