Bug 1467790

Summary: Start and enable node failed due to node has 64 characters hostname
Product: OpenShift Container Platform Reporter: Wenkai Shi <weshi>
Component: InstallerAssignee: Michael Gugino <mgugino>
Status: CLOSED ERRATA QA Contact: Wenkai Shi <weshi>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.0CC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openshift-ansible-3.7.0-0.126.1.git.0.0bb5b0c.el7.noarch Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 22:00:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wenkai Shi 2017-07-05 07:29:53 UTC
Description of problem:
Start and enable node failed due to node has 64 characters hostname, it break the installer, according to " https://bugzilla.redhat.com/show_bug.cgi?id=1211856#c7 ", seems installer will be break early if instance has more than 64 characters. The atomic-openshift-node.service require no more than 63 characters.

Version-Release number of selected component (if applicable):
openshift-ansible-3.6.133-1.git.0.950bb48.el7

How reproducible:
100%

Steps to Reproduce:
1. prepare instances with 64 characters hostname
2. install OCP
3.

Actual results:
# ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbboks/byo/config.yml
...
TASK [openshift_node : Start and enable node] **********************************
Wednesday 05 July 2017  06:05:01 +0000 (0:00:00.075)       0:13:13.322 ******** 
FAILED - RETRYING: TASK: openshift_node : Start and enable node (1 retries left).
fatal: [qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com]: FAILED! => {
    "attempts": 1, 
    "changed": false, 
    "failed": true
}

MSG:

Unable to start service atomic-openshift-node: Job for atomic-openshift-node.service failed because the control process exited with error code. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details.
...

Expected results:
Installation succeed

Additional info:
# journalctl -xe -u atomic-openshift-node
...
Jul 05 06:53:56 qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com atomic-openshift-node[35512]: E0705 06:53:56.800726   35512 kubelet_node_status.go:101] Unable to register node "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com" with API server: Node "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com" is invalid: metadata.labels: Invalid value: "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com": must be no more than 63 characters
...

# echo -n "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com" | wc -c 
64

Comment 1 Tim Bielawa 2017-07-10 15:02:21 UTC
I'm wondering, is this actually a bug/limitation in OCP/Kube rather than the installer?

Comment 2 Scott Dodson 2017-07-20 14:37:29 UTC
Lets handle this in sanitize_inventory role

Comment 3 Michael Gugino 2017-08-16 01:08:47 UTC
While I agree it may be an issue in OpenShift, I have submitted the following PR to work around on our end:

https://github.com/openshift/openshift-ansible/pull/5100

I don't believe sanitize_inventory is the most ideal place for this check as we can potentially populate other names as nodename.

Comment 5 Wenkai Shi 2017-09-11 07:56:03 UTC
In QE's case, didn't set the openshift_hostname, the openshift_public_hostname set, when openshift_public_hostname greater than 63 installer still failed in "restart node" task, installer succeed when the openshift_public_hostname less than 64. Suggest to add one more check like this:

  - fail:
      msg: openshift_public_hostname must be 63 characters or less
    when: openshift_public_hostname is defined and openshift_public_hostname | length > 63

By the way, when set openshift_public_hostname greater than 63 characters, the host has same internal and public hostname.

Comment 6 Michael Gugino 2017-09-11 13:50:43 UTC
New pull request created:  https://github.com/openshift/openshift-ansible/pull/5353

This pull request address the issue of openshift_public_hostname length as suggested by QE.

Comment 7 Scott Dodson 2017-09-12 13:26:46 UTC
merged

Comment 8 Wenkai Shi 2017-09-14 15:14:56 UTC
Verified with version openshift-ansible-3.7.0-0.126.1.git.0.0bb5b0c.el7, installer failed when hostname has more than 63 characters.

# echo -n "qe-weshitest-node-registry-router-1.centralus.cloudapp.azure.com" | wc -c
64

# ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbboks/byo/config.yml
...
TASK [fail] ********************************************************************
Thursday 14 September 2017  10:12:44 +0000 (0:00:00.026)       0:00:25.239 **** 
skipping: [qe-weshitest-master-etcd-nfs-1.centralus.cloudapp.azure.com] => {
    "changed": false, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}
fatal: [qe-weshitest-node-registry-router-1.centralus.cloudapp.azure.com]: FAILED! => {
    "changed": false, 
    "failed": true
}

MSG:

openshift_public_hostname must be 63 characters or less
...

PLAY RECAP *********************************************************************
localhost                  : ok=13   changed=0    unreachable=0    failed=0   
qe-weshitest-master-etcd-nfs-1.centralus.cloudapp.azure.com : ok=494  changed=186  unreachable=0    failed=0   
qe-weshitest-node-registry-router-1.centralus.cloudapp.azure.com : ok=13   changed=2    unreachable=0    failed=1

Comment 11 errata-xmlrpc 2017-11-28 22:00:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188