Bug 1953803 - [AWS] Installer should do pre-check to ensure user-provided private hosted zone name is valid for OCP cluster
Summary: [AWS] Installer should do pre-check to ensure user-provided private hosted zo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.8.0
Assignee: Matthew Staebler
QA Contact: Pedro Amoedo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-27 01:39 UTC by Yunfei Jiang
Modified: 2021-07-27 23:04 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Bug in new feature.
Clone Of:
Environment:
Last Closed: 2021-07-27 23:04:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4886 0 None open Bug 1953803: aws: validate byo hostedzone is parent of cluster domain 2021-04-28 22:10:54 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:04:20 UTC

Description Yunfei Jiang 2021-04-27 01:39:00 UTC
Description of problem:

The valid domain name of  `platform.aws.hostedZone` should be <cluster-name>.<baseDomain> or <baseDomain>, otherwise the installer will generate invalid api endpoints, this will cause the bootstrap process to fail.

e.g.
base domain: qe.devcluster.openshift.com
domain name of `platform.aws.hostedZone`: pre-created-yunjiang-r53e-4639299.qe.devcluster.openshift.com

The records in private hosted zone will be like:
api.yunjiang-r53e.qe.devcluster.openshift.com.pre-created-yunjiang-r53e-4639299.qe.devcluster.openshift.com.
api-int.yunjiang-r53e.qe.devcluster.openshift.com.pre-created-yunjiang-r53e-4639299.qe.devcluster.openshift.com.

bootkube error:
Apr 26 02:10:58 ip-10-0-4-35 sudo[10173]: pam_unix(sudo:session): session opened for user root by (uid=0)
Apr 26 02:10:58 ip-10-0-4-35 sudo[10198]:     root : TTY=unknown ; PWD=/var/opt/openshift ; USER=root ; ENV=KUBECONFIG=/opt/openshift/auth/kubeconfig ; COMMAND=/bin/oc --request-timeout=5s get secrets --all-namespaces -o=custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,TYPE:.type,ANNOTATIONS:.metadata.annotations
Apr 26 02:10:58 ip-10-0-4-35 sudo[10198]: pam_unix(sudo:session): session opened for user root by (uid=0)
Apr 26 02:10:58 ip-10-0-4-35 systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILURE
Apr 26 02:10:58 ip-10-0-4-35 systemd[1]: bootkube.service: Failed with result 'exit-code'.
Apr 26 02:11:00 ip-10-0-4-35 bootkube.sh[2326]: Unable to connect to the server: dial tcp: lookup api-int.yunjiang-r53e.qe.devcluster.openshift.com on 10.0.0.2:53: no such host


How reproducible:
Always.

Steps to Reproduce:
1. Create a private hosted zone and associate with VPC.
domain name: pre-created-yunjiang-r53e-4639299.qe.devcluster.openshift.com
hosted zone id: Z05164533R1ZNB61TQ6QA

2. Create and update install-config:

apiVersion: v1
baseDomain: qe.devcluster.openshift.com
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
metadata:
  name: yunjiang-r53e
platform:
  aws:
    region: us-east-2
    subnets:
    - subnet-0cb8fd22ec8849782
    - subnet-09f8d13a9f6b8d0c3
    - subnet-022b879d8b45bdfd9
    - subnet-0bb1e820db022652a
    hostedZone: Z05164533R1ZNB61TQ6QA
pullSecret: HIDDEN
sshKey: HIDDEN
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  serviceNetwork:
  - 172.30.0.0/16
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
publish: External

3. Create cluster

Actual results:
Bootstrap process failed with the above errors.

Expected results:
installer does a pre-check, if the domain name of `platform.aws.hostedZone` is not <baseDomain> or <cluster-name>.<baseDomain>, it should report a fatal error and exit in the early stage, instead of getting failed in the bootstrap process.

Comment 2 Pedro Amoedo 2021-05-03 16:15:22 UTC
[QA Summary]

[Version]

Using version "4.8.0-0.ci-2021-05-03-055425" since the latest nightly doesn't contain yet the related PR#4886:

~~~
$ ./openshift-install version
./openshift-install 4.8.0-0.ci-2021-05-03-055425
built from commit 04211fb553783eb7998bd3a63189b84f9b028052
release image registry.ci.openshift.org/ocp/release@sha256:aa74b16ccc044f1171d85ad679c0d122cab4248071cc15cfbb477e35c399682e
~~~

[Parameters]

~~~
baseDomain: qe.devcluster.openshift.com
...
metadata:
  name: pamoedo-bz1953803
...
platform:
  aws:
    region: eu-west-3
    subnets:
    - subnet-0d8bdfd52c0931fcd
    - subnet-043abed7d315e07fe
    - subnet-0345b9abcecafe617
    hostedZone: Z08537461DZ9H25LAAV19
publish: Internal
~~~

[Results]

As expected, the installer aborts at early stage with related "hostedZone" FATAL error:

~~~
DEBUG   Generating Platform Provisioning Check...  
FATAL failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Provisioning Check": aws.hostedZone: Invalid value: "Z08537461DZ9H25LAAV19": hosted zone domain "pre-created-pamoedom-bz1953803.qe.devcluster.openshift.com." is not a parent of the cluster domain "pamoedo-bz1953803.qe.devcluster.openshift.com."
~~~

Best Regards.

Comment 5 errata-xmlrpc 2021-07-27 23:04:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.