On the bootstrap node I can resolve the API endpoint hostname and ping the endpoint, and I am able to reproduce the SSL certificate error via the CLI using curl: [core@interop-hwpzm-bootstrap ~]$ ping api.interop.interop.oasis.css-qe.com PING api.interop.interop.oasis.css-qe.com (172.24.0.5) 56(84) bytes of data. 64 bytes from api-int.interop.interop.oasis.css-qe.com (172.24.0.5): icmp_seq=1 ttl=64 time=0.092 ms 64 bytes from api-int.interop.interop.oasis.css-qe.com (172.24.0.5): icmp_seq=2 ttl=64 time=0.440 ms 64 bytes from api-int.interop.interop.oasis.css-qe.com (172.24.0.5): icmp_seq=3 ttl=64 time=0.060 ms 64 bytes from api-int.interop.interop.oasis.css-qe.com (172.24.0.5): icmp_seq=4 ttl=64 time=0.102 ms 64 bytes from api-int.interop.interop.oasis.css-qe.com (172.24.0.5): icmp_seq=5 ttl=64 time=0.153 ms 64 bytes from api-int.interop.interop.oasis.css-qe.com (172.24.0.5): icmp_seq=6 ttl=64 time=0.054 ms ^C --- api.interop.interop.oasis.css-qe.com ping statistics --- 6 packets transmitted, 6 received, 0% packet loss, time 133ms rtt min/avg/max/mdev = 0.054/0.150/0.440/0.133 ms [core@interop-hwpzm-bootstrap ~]$ curl https://api.interop.interop.oasis.css-qe.com:6443 curl: (60) SSL certificate problem: self signed certificate in certificate chain More details here: https://curl.haxx.se/docs/sslcerts.html curl failed to verify the legitimacy of the server and therefore could not establish a secure connection to it. To learn more about this situation and how to fix it, please visit the web page mentioned above.
Created attachment 1700064 [details] SSL error viewed via Firefox
I believe I have found the issue. In my 'install-config.yaml' I had the following domain config: baseDomain: interop.oasis.css-qe.com. Notice the trailing dot in the domain name. I am using Google Cloud DNS for this project, and I copied the hostname I am using from the Google web interface, and it included the trailing dot. This never jumped out at me, since the trailing dot is used in every DNS lookup behind the scenes, but is typically hidden from the user. Web browsers and DNS utilities like 'host' and 'dig' work with the dot just fine. At this point I think that 'openshift-install' should not throw an x509 error if the user provides a 'baseDomain' with a trailing dot, since in my view that is a valid DNS name.
Thanks for reporting. The installer could indeed ignore the trailing dot. Lowering the priority as it is not considered a functional issue.
This has already been patched in 4.4: https://github.com/openshift/installer/blob/db69e0456f2f7d6b937a8e88fc1ee6be32bf61fd/pkg/validate/validate.go#L61
This issue was observed with 4.6. The patch you have linked here for 4.4. Not sure why this is "NOTABUG' for 4.6.
Ah, my bad. The linked code does not actually sanitize the input downstream in the installer.
Note to self, to be covered in test case ocp-24404 for gcp regression.
Checked with 4.6.0-0.nightly-2020-09-27-075304, and verified. verified log is attached.
Created attachment 1717173 [details] verified log with basedomain have dot
install-config.yaml which used to verify: --- apiVersion: v1 baseDomain: wjiang.shiftstack.io. compute: - architecture: amd64 hyperthreading: Enabled name: worker platform: {} replicas: 2 controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: {} replicas: 3 metadata: creationTimestamp: null name: bmocp networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 machineNetwork: - cidr: 192.168.0.0/18 networkType: OpenShiftSDN serviceNetwork: - 172.30.0.0/16 platform: openstack: apiVIP: 192.168.0.5 cloud: shiftstack computeFlavor: m4.xlarge externalDNS: null externalNetwork: public ingressVIP: 192.168.0.7 lbFloatingIP: 10.46.43.177 publish: External pullSecret: 'xxxxx' sshKey: | ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCbI/Ls4UkisLh0bz/YHLdw4N8edQ0cQzN9U92DR1lgqA7/Ex0TM4UYmpqPzAaGziURiL4m1Z2s9w7HU9OsYU9c3LrUWuxXiGL7kUdnhZ0haV5AwZqQtoOF+nWToQ4rsrGNhJatH8Bh+hKOocf3LmsB8tAOuAh2WZbv7KHRFoCH/oFRNHHPR979/b2jrMJQJgMZOU5OzwM4/jNo0RfXNHQPAjdn1sJVKfsUKDCdrhwKasi/viRf/JM2f+A7BLVeIl4+92XJU21WTQp0OmzBm47vCi+k7MKNh3aEVpnGVhQOTEBdWjTT/3QGxoEDrOvzx7omPDNusXj5l84Pdeg6fmrx
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196