Hide Forgot
Created attachment 1839717 [details] Pull secret has been sanitized Version: $ openshift-install version ./openshift-install 4.9.4 built from commit 6e5b992ba719dd4ea2d0c2a8b08ecad45179e553 release image quay.io/openshift-release-dev/ocp-release@sha256:3d5800990dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47e2e1ef release architecture amd64 Platform: aws (likely equally valid for the other cloud platforms) Please specify: IPI What happened? Providing an invalid AWS region for `openshift-install`, e.g., us-xyz-1, causes the installer to hang for 60 minutes until finally timing out. What did you expect to happen? The installer should be able to detect an invalid region, fail fast, and warn the user accordingly How to reproduce it (as minimally and precisely as possible)? (refer to the attached install-config.yaml for what to include in the "test" directory) $ ./openshift-install create manifests --dir test Anything else we need to know? Output from above command with --log-level debug, with no indication that the command is hanging due to the region ``` DEBUG OpenShift Installer 4.9.4 DEBUG Built from commit 6e5b992ba719dd4ea2d0c2a8b08ecad45179e553 DEBUG Fetching Master Machines... DEBUG Loading Master Machines... DEBUG Loading Cluster ID... DEBUG Loading Install Config... DEBUG Loading SSH Key... DEBUG Using SSH Key loaded from state file DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Using Platform loaded from state file DEBUG Using Base Domain loaded from state file DEBUG Loading Cluster Name... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Using Cluster Name loaded from state file DEBUG Loading Networking... DEBUG Loading Platform... DEBUG Using Networking loaded from state file DEBUG Loading Pull Secret... DEBUG Using Pull Secret loaded from state file DEBUG Loading Platform... DEBUG Loading Install Config from both state file and target directory DEBUG Using Install Config loaded from target directory DEBUG Loading Platform Credentials Check... DEBUG Loading Install Config... DEBUG Loading Install Config... DEBUG Loading Image... DEBUG Loading Install Config... DEBUG Loading Master Ignition Config... DEBUG Loading Install Config... DEBUG Loading Root CA... DEBUG Fetching Cluster ID... DEBUG Fetching Install Config... DEBUG Reusing previously-fetched Install Config DEBUG Generating Cluster ID... DEBUG Fetching Platform Credentials Check... DEBUG Fetching Install Config... DEBUG Reusing previously-fetched Install Config DEBUG Generating Platform Credentials Check... INFO Credentials loaded from the "default" profile in file "/Users/wgordon/.aws/credentials" DEBUG Fetching Install Config... DEBUG Reusing previously-fetched Install Config DEBUG Fetching Image... DEBUG Fetching Install Config... DEBUG Reusing previously-fetched Install Config DEBUG Generating Image... DEBUG Fetching Master Ignition Config... DEBUG Fetching Install Config... DEBUG Reusing previously-fetched Install Config DEBUG Fetching Root CA... DEBUG Generating Root CA... DEBUG Generating Master Ignition Config... DEBUG Generating Master Machines... ```
A possible resolution for this would be to fetch the regions from AWS in order to perform validation that the region chosen is actually an AWS region. The only value in that would be to avoid the 1-minute DNS resolution timeout. I suspect that we will never get around to making this fix.
(In reply to Matthew Staebler from comment #3) > A possible resolution for this would be to fetch the regions from AWS in > order to perform validation that the region chosen is actually an AWS > region. The only value in that would be to avoid the 1-minute DNS resolution > timeout. I suspect that we will never get around to making this fix. I misread the title. This is a 60 *minute* timeout? Could you please attach the logs from the installer? It seems like there may be retries happening that should not be.
The length of the timeout is due to the installer allowing for 25 retries before failing. Presumably there is some backoff with each retry. Changing the max retries to 3 allows the installer to fail in 1 second.
Created attachment 1840769 [details] install log
verified. PASS. OCP Version: 4.10.0-0.nightly-2021-12-25-025639 time="2021-12-27T01:33:25-05:00" level=debug msg="OpenShift Installer 4.10.0-0.nightly-2021-12-25-025639" time="2021-12-27T01:33:25-05:00" level=debug msg="Built from commit 37c09290190e5dd00446b603c32792c06d205b62" time="2021-12-27T01:33:25-05:00" level=debug msg="Fetching Metadata..." time="2021-12-27T01:33:25-05:00" level=debug msg="Loading Metadata..." time="2021-12-27T01:33:25-05:00" level=debug msg=" Loading Cluster ID..." time="2021-12-27T01:33:25-05:00" level=debug msg=" Loading Install Config..." time="2021-12-27T01:33:25-05:00" level=debug msg=" Loading SSH Key..." time="2021-12-27T01:33:25-05:00" level=debug msg=" Loading Base Domain..." time="2021-12-27T01:33:25-05:00" level=debug msg=" Loading Platform..." time="2021-12-27T01:33:25-05:00" level=debug msg=" Loading Cluster Name..." time="2021-12-27T01:33:25-05:00" level=debug msg=" Loading Base Domain..." time="2021-12-27T01:33:25-05:00" level=debug msg=" Loading Platform..." time="2021-12-27T01:33:25-05:00" level=debug msg=" Loading Networking..." time="2021-12-27T01:33:25-05:00" level=debug msg=" Loading Platform..." time="2021-12-27T01:33:25-05:00" level=debug msg=" Loading Pull Secret..." time="2021-12-27T01:33:25-05:00" level=debug msg=" Loading Platform..." time="2021-12-27T01:33:25-05:00" level=info msg="Credentials loaded from the \"default\" profile in file \"/home/cloud-user/.aws/credentials\"" time="2021-12-27T01:33:25-05:00" level=fatal msg="failed to fetch Metadata: failed to load asset \"Install Config\": platform.aws.serviceEndpoints.region: Invalid value: \"us-xyzz-2\": dial tcp: lookup ec2.us-xyzz-2.amazonaws.com on 10.11.5.19:53: no such host"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056