Description of problem: The valid domain name of `platform.aws.hostedZone` should be <cluster-name>.<baseDomain> or <baseDomain>, otherwise the installer will generate invalid api endpoints, this will cause the bootstrap process to fail. e.g. base domain: qe.devcluster.openshift.com domain name of `platform.aws.hostedZone`: pre-created-yunjiang-r53e-4639299.qe.devcluster.openshift.com The records in private hosted zone will be like: api.yunjiang-r53e.qe.devcluster.openshift.com.pre-created-yunjiang-r53e-4639299.qe.devcluster.openshift.com. api-int.yunjiang-r53e.qe.devcluster.openshift.com.pre-created-yunjiang-r53e-4639299.qe.devcluster.openshift.com. bootkube error: Apr 26 02:10:58 ip-10-0-4-35 sudo[10173]: pam_unix(sudo:session): session opened for user root by (uid=0) Apr 26 02:10:58 ip-10-0-4-35 sudo[10198]: root : TTY=unknown ; PWD=/var/opt/openshift ; USER=root ; ENV=KUBECONFIG=/opt/openshift/auth/kubeconfig ; COMMAND=/bin/oc --request-timeout=5s get secrets --all-namespaces -o=custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,TYPE:.type,ANNOTATIONS:.metadata.annotations Apr 26 02:10:58 ip-10-0-4-35 sudo[10198]: pam_unix(sudo:session): session opened for user root by (uid=0) Apr 26 02:10:58 ip-10-0-4-35 systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILURE Apr 26 02:10:58 ip-10-0-4-35 systemd[1]: bootkube.service: Failed with result 'exit-code'. Apr 26 02:11:00 ip-10-0-4-35 bootkube.sh[2326]: Unable to connect to the server: dial tcp: lookup api-int.yunjiang-r53e.qe.devcluster.openshift.com on 10.0.0.2:53: no such host How reproducible: Always. Steps to Reproduce: 1. Create a private hosted zone and associate with VPC. domain name: pre-created-yunjiang-r53e-4639299.qe.devcluster.openshift.com hosted zone id: Z05164533R1ZNB61TQ6QA 2. Create and update install-config: apiVersion: v1 baseDomain: qe.devcluster.openshift.com controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: {} replicas: 3 compute: - architecture: amd64 hyperthreading: Enabled name: worker platform: {} replicas: 3 metadata: name: yunjiang-r53e platform: aws: region: us-east-2 subnets: - subnet-0cb8fd22ec8849782 - subnet-09f8d13a9f6b8d0c3 - subnet-022b879d8b45bdfd9 - subnet-0bb1e820db022652a hostedZone: Z05164533R1ZNB61TQ6QA pullSecret: HIDDEN sshKey: HIDDEN networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 serviceNetwork: - 172.30.0.0/16 machineNetwork: - cidr: 10.0.0.0/16 networkType: OpenShiftSDN publish: External 3. Create cluster Actual results: Bootstrap process failed with the above errors. Expected results: installer does a pre-check, if the domain name of `platform.aws.hostedZone` is not <baseDomain> or <cluster-name>.<baseDomain>, it should report a fatal error and exit in the early stage, instead of getting failed in the bootstrap process.
[QA Summary] [Version] Using version "4.8.0-0.ci-2021-05-03-055425" since the latest nightly doesn't contain yet the related PR#4886: ~~~ $ ./openshift-install version ./openshift-install 4.8.0-0.ci-2021-05-03-055425 built from commit 04211fb553783eb7998bd3a63189b84f9b028052 release image registry.ci.openshift.org/ocp/release@sha256:aa74b16ccc044f1171d85ad679c0d122cab4248071cc15cfbb477e35c399682e ~~~ [Parameters] ~~~ baseDomain: qe.devcluster.openshift.com ... metadata: name: pamoedo-bz1953803 ... platform: aws: region: eu-west-3 subnets: - subnet-0d8bdfd52c0931fcd - subnet-043abed7d315e07fe - subnet-0345b9abcecafe617 hostedZone: Z08537461DZ9H25LAAV19 publish: Internal ~~~ [Results] As expected, the installer aborts at early stage with related "hostedZone" FATAL error: ~~~ DEBUG Generating Platform Provisioning Check... FATAL failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Provisioning Check": aws.hostedZone: Invalid value: "Z08537461DZ9H25LAAV19": hosted zone domain "pre-created-pamoedom-bz1953803.qe.devcluster.openshift.com." is not a parent of the cluster domain "pamoedo-bz1953803.qe.devcluster.openshift.com." ~~~ Best Regards.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438