Bug 1663447
| Summary: | etcd cluster failed to start when cluster name ends with "-" | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Johnny Liu <jialiu> |
| Component: | Installer | Assignee: | Matthew Staebler <mstaeble> |
| Installer sub component: | openshift-installer | QA Contact: | Johnny Liu <jialiu> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | adahiya, mstaeble, wking |
| Version: | 4.1.0 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.1.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-04 10:41:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Johnny Liu
2019-01-04 11:26:50 UTC
I was able to confirm this. I tried to use "crawford-" as my cluster name. On the master node, I see the following from the discovery container: # crictl logs 8a3df72c9e097 I0109 18:52:09.698063 1 run.go:47] Version: 3.11.0-408-g09742d64-dirty I0109 18:52:09.698592 1 run.go:57] ip addr is 192.168.126.11 E0109 18:52:09.698666 1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.crawford-.openshift.testing: no such host E0109 18:53:09.698965 1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.crawford-.openshift.testing: no such host E0109 18:54:09.698973 1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.crawford-.openshift.testing: no such host E0109 18:55:09.698975 1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.crawford-.openshift.testing: no such host E0109 18:56:09.698920 1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.crawford-.openshift.testing: no such host E0109 18:57:09.698970 1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.crawford-.openshift.testing: no such host E0109 18:57:09.699024 1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.crawford-.openshift.testing: no such host F0109 18:57:09.699056 1 main.go:30] Error executing etcd-setup-environment: could not find self: timed out waiting for the condition In /var/lib/libvirt/dnsmasq/crawford-.conf I see the following entry: srv-host=_etcd-server-ssl._tcp.crawford-.openshift.testing,crawford--etcd-0.openshift.testing,2380,0,10 I'm also able to use dig to fetch that record: $ dig _etcd-server-ssl._tcp.crawford-.openshift.testing SRV +short 0 10 2380 crawford--etcd-0.openshift.testing. It looks like the problem lies within registry.svc.ci.openshift.org/openshift/origin-v4.0:setup-etcd-environment (https://github.com/openshift/machine-config-operator/blob/09742d642e6846afcf1297ae6911e6bdfc88a48d/cmd/setup-etcd-environment/run.go). Abhinav, did you get a chance to dig into this further. Last I remember, we traced the problem back to the Go standard library but maybe a trailing hyphen isn't a valid subdomain/hostname. Fix in https://github.com/openshift/installer/pull/1255. The installer should not be allowing a cluster name that ends with a hyphen. The installer was validating this when the cluster name was entered in the CLI. But the installer was not validating this when an install-config.yaml was provided. Verified this bug with v4.0.0-0.173.0.0-dirty, and PASS. # ./openshift-install version ./openshift-install v4.0.0-0.173.0.0-dirty # ./openshift-install create cluster --dir demo ? Platform aws ? Region us-east-2 ? Base Domain qe.devcluster.openshift.com X Sorry, your reply was invalid: a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for valX Sorry, your reply was invalid: a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*') ? Cluster Name qe-jialiu ? Pull Secret [? for help] *************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************WARNING Found override for OS Image. Please be warned, this is not advised WARNING Found override for ReleaseImage. Please be warned, this is not advised INFO Creating cluster... And 0.13.0 is out with the fix [1]. [1]: https://github.com/openshift/installer/releases/tag/v0.13.0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |