Hide Forgot
Description of problem: See the following details. Version-Release number of the following components: # ./openshift-install version ./openshift-install v0.8.0-master-8-g713289e20bd6afccb06f2e4ff7ed89d2483fac9a How reproducible: Always Steps to Reproduce: 1. Trigger an install with "qe-jialiu-" cluster name 2. 3. Actual results: Install failed with the following error. INFO Waiting up to 30m0s for the Kubernetes API... DEBUG Still waiting for the Kubernetes API: Get https://qe-jialiu--api.qe.devcluster.openshift.com:6443/version?timeout=32s: dial tcp 3.17.117.40:6443: i/o timeout DEBUG Still waiting for the Kubernetes API: Get https://qe-jialiu--api.qe.devcluster.openshift.com:6443/version?timeout=32s: dial tcp 3.16.59.249:6443: connect: connection refused DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource Go to bootstrap node, get the following log: # journalctl -b -f -u bootkube.service -- Logs begin at Fri 2019-01-04 10:51:44 UTC. -- Jan 04 11:13:15 ip-10-0-6-86 bootkube.sh[4399]: https://qe-jialiu--etcd-1.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp 10.0.21.223:2379: getsockopt: connection refused Jan 04 11:13:15 ip-10-0-6-86 bootkube.sh[4399]: https://qe-jialiu--etcd-2.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp 10.0.42.31:2379: getsockopt: connection refused Jan 04 11:13:15 ip-10-0-6-86 bootkube.sh[4399]: https://qe-jialiu--etcd-0.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp 10.0.3.48:2379: getsockopt: connection refused Jan 04 11:13:15 ip-10-0-6-86 bootkube.sh[4399]: Error: unhealthy cluster Jan 04 11:13:15 ip-10-0-6-86 bootkube.sh[4399]: etcdctl failed. Retrying in 5 seconds... Expected results: Installation should be completed successfully. If etcd cluster do not work with "qe-jialiu--" prefix, installer would prompt user, and exit installer in advance. Additional info: After correct cluster name with "qe-jialiu", installation would be completed successfully
I was able to confirm this. I tried to use "crawford-" as my cluster name. On the master node, I see the following from the discovery container: # crictl logs 8a3df72c9e097 I0109 18:52:09.698063 1 run.go:47] Version: 3.11.0-408-g09742d64-dirty I0109 18:52:09.698592 1 run.go:57] ip addr is 192.168.126.11 E0109 18:52:09.698666 1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.crawford-.openshift.testing: no such host E0109 18:53:09.698965 1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.crawford-.openshift.testing: no such host E0109 18:54:09.698973 1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.crawford-.openshift.testing: no such host E0109 18:55:09.698975 1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.crawford-.openshift.testing: no such host E0109 18:56:09.698920 1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.crawford-.openshift.testing: no such host E0109 18:57:09.698970 1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.crawford-.openshift.testing: no such host E0109 18:57:09.699024 1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.crawford-.openshift.testing: no such host F0109 18:57:09.699056 1 main.go:30] Error executing etcd-setup-environment: could not find self: timed out waiting for the condition In /var/lib/libvirt/dnsmasq/crawford-.conf I see the following entry: srv-host=_etcd-server-ssl._tcp.crawford-.openshift.testing,crawford--etcd-0.openshift.testing,2380,0,10 I'm also able to use dig to fetch that record: $ dig _etcd-server-ssl._tcp.crawford-.openshift.testing SRV +short 0 10 2380 crawford--etcd-0.openshift.testing. It looks like the problem lies within registry.svc.ci.openshift.org/openshift/origin-v4.0:setup-etcd-environment (https://github.com/openshift/machine-config-operator/blob/09742d642e6846afcf1297ae6911e6bdfc88a48d/cmd/setup-etcd-environment/run.go).
Abhinav, did you get a chance to dig into this further. Last I remember, we traced the problem back to the Go standard library but maybe a trailing hyphen isn't a valid subdomain/hostname.
Fix in https://github.com/openshift/installer/pull/1255. The installer should not be allowing a cluster name that ends with a hyphen. The installer was validating this when the cluster name was entered in the CLI. But the installer was not validating this when an install-config.yaml was provided.
Verified this bug with v4.0.0-0.173.0.0-dirty, and PASS. # ./openshift-install version ./openshift-install v4.0.0-0.173.0.0-dirty # ./openshift-install create cluster --dir demo ? Platform aws ? Region us-east-2 ? Base Domain qe.devcluster.openshift.com X Sorry, your reply was invalid: a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for valX Sorry, your reply was invalid: a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*') ? Cluster Name qe-jialiu ? Pull Secret [? for help] *************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************WARNING Found override for OS Image. Please be warned, this is not advised WARNING Found override for ReleaseImage. Please be warned, this is not advised INFO Creating cluster...
According to comment 6, move this bug to 'VERIFIED'.
And 0.13.0 is out with the fix [1]. [1]: https://github.com/openshift/installer/releases/tag/v0.13.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758