Hide Forgot
Description of problem: Start and enable node failed due to node has 64 characters hostname, it break the installer, according to " https://bugzilla.redhat.com/show_bug.cgi?id=1211856#c7 ", seems installer will be break early if instance has more than 64 characters. The atomic-openshift-node.service require no more than 63 characters. Version-Release number of selected component (if applicable): openshift-ansible-3.6.133-1.git.0.950bb48.el7 How reproducible: 100% Steps to Reproduce: 1. prepare instances with 64 characters hostname 2. install OCP 3. Actual results: # ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbboks/byo/config.yml ... TASK [openshift_node : Start and enable node] ********************************** Wednesday 05 July 2017 06:05:01 +0000 (0:00:00.075) 0:13:13.322 ******** FAILED - RETRYING: TASK: openshift_node : Start and enable node (1 retries left). fatal: [qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com]: FAILED! => { "attempts": 1, "changed": false, "failed": true } MSG: Unable to start service atomic-openshift-node: Job for atomic-openshift-node.service failed because the control process exited with error code. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details. ... Expected results: Installation succeed Additional info: # journalctl -xe -u atomic-openshift-node ... Jul 05 06:53:56 qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com atomic-openshift-node[35512]: E0705 06:53:56.800726 35512 kubelet_node_status.go:101] Unable to register node "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com" with API server: Node "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com" is invalid: metadata.labels: Invalid value: "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com": must be no more than 63 characters ... # echo -n "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com" | wc -c 64
I'm wondering, is this actually a bug/limitation in OCP/Kube rather than the installer?
Lets handle this in sanitize_inventory role
While I agree it may be an issue in OpenShift, I have submitted the following PR to work around on our end: https://github.com/openshift/openshift-ansible/pull/5100 I don't believe sanitize_inventory is the most ideal place for this check as we can potentially populate other names as nodename.
In QE's case, didn't set the openshift_hostname, the openshift_public_hostname set, when openshift_public_hostname greater than 63 installer still failed in "restart node" task, installer succeed when the openshift_public_hostname less than 64. Suggest to add one more check like this: - fail: msg: openshift_public_hostname must be 63 characters or less when: openshift_public_hostname is defined and openshift_public_hostname | length > 63 By the way, when set openshift_public_hostname greater than 63 characters, the host has same internal and public hostname.
New pull request created: https://github.com/openshift/openshift-ansible/pull/5353 This pull request address the issue of openshift_public_hostname length as suggested by QE.
merged
Verified with version openshift-ansible-3.7.0-0.126.1.git.0.0bb5b0c.el7, installer failed when hostname has more than 63 characters. # echo -n "qe-weshitest-node-registry-router-1.centralus.cloudapp.azure.com" | wc -c 64 # ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbboks/byo/config.yml ... TASK [fail] ******************************************************************** Thursday 14 September 2017 10:12:44 +0000 (0:00:00.026) 0:00:25.239 **** skipping: [qe-weshitest-master-etcd-nfs-1.centralus.cloudapp.azure.com] => { "changed": false, "skip_reason": "Conditional result was False", "skipped": true } fatal: [qe-weshitest-node-registry-router-1.centralus.cloudapp.azure.com]: FAILED! => { "changed": false, "failed": true } MSG: openshift_public_hostname must be 63 characters or less ... PLAY RECAP ********************************************************************* localhost : ok=13 changed=0 unreachable=0 failed=0 qe-weshitest-master-etcd-nfs-1.centralus.cloudapp.azure.com : ok=494 changed=186 unreachable=0 failed=0 qe-weshitest-node-registry-router-1.centralus.cloudapp.azure.com : ok=13 changed=2 unreachable=0 failed=1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188