Bug 1467790 - Start and enable node failed due to node has 64 characters hostname
Start and enable node failed due to node has 64 characters hostname
Status: VERIFIED
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer (Show other bugs)
3.6.0
Unspecified Unspecified
medium Severity medium
: ---
: 3.7.0
Assigned To: Michael Gugino
Wenkai Shi
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-05 03:29 EDT by Wenkai Shi
Modified: 2017-10-05 13:47 EDT (History)
3 users (show)

See Also:
Fixed In Version: openshift-ansible-3.7.0-0.126.1.git.0.0bb5b0c.el7.noarch
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Wenkai Shi 2017-07-05 03:29:53 EDT
Description of problem:
Start and enable node failed due to node has 64 characters hostname, it break the installer, according to " https://bugzilla.redhat.com/show_bug.cgi?id=1211856#c7 ", seems installer will be break early if instance has more than 64 characters. The atomic-openshift-node.service require no more than 63 characters.

Version-Release number of selected component (if applicable):
openshift-ansible-3.6.133-1.git.0.950bb48.el7

How reproducible:
100%

Steps to Reproduce:
1. prepare instances with 64 characters hostname
2. install OCP
3.

Actual results:
# ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbboks/byo/config.yml
...
TASK [openshift_node : Start and enable node] **********************************
Wednesday 05 July 2017  06:05:01 +0000 (0:00:00.075)       0:13:13.322 ******** 
FAILED - RETRYING: TASK: openshift_node : Start and enable node (1 retries left).
fatal: [qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com]: FAILED! => {
    "attempts": 1, 
    "changed": false, 
    "failed": true
}

MSG:

Unable to start service atomic-openshift-node: Job for atomic-openshift-node.service failed because the control process exited with error code. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details.
...

Expected results:
Installation succeed

Additional info:
# journalctl -xe -u atomic-openshift-node
...
Jul 05 06:53:56 qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com atomic-openshift-node[35512]: E0705 06:53:56.800726   35512 kubelet_node_status.go:101] Unable to register node "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com" with API server: Node "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com" is invalid: metadata.labels: Invalid value: "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com": must be no more than 63 characters
...

# echo -n "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com" | wc -c 
64
Comment 1 Tim Bielawa 2017-07-10 11:02:21 EDT
I'm wondering, is this actually a bug/limitation in OCP/Kube rather than the installer?
Comment 2 Scott Dodson 2017-07-20 10:37:29 EDT
Lets handle this in sanitize_inventory role
Comment 3 Michael Gugino 2017-08-15 21:08:47 EDT
While I agree it may be an issue in OpenShift, I have submitted the following PR to work around on our end:

https://github.com/openshift/openshift-ansible/pull/5100

I don't believe sanitize_inventory is the most ideal place for this check as we can potentially populate other names as nodename.
Comment 5 Wenkai Shi 2017-09-11 03:56:03 EDT
In QE's case, didn't set the openshift_hostname, the openshift_public_hostname set, when openshift_public_hostname greater than 63 installer still failed in "restart node" task, installer succeed when the openshift_public_hostname less than 64. Suggest to add one more check like this:

  - fail:
      msg: openshift_public_hostname must be 63 characters or less
    when: openshift_public_hostname is defined and openshift_public_hostname | length > 63

By the way, when set openshift_public_hostname greater than 63 characters, the host has same internal and public hostname.
Comment 6 Michael Gugino 2017-09-11 09:50:43 EDT
New pull request created:  https://github.com/openshift/openshift-ansible/pull/5353

This pull request address the issue of openshift_public_hostname length as suggested by QE.
Comment 7 Scott Dodson 2017-09-12 09:26:46 EDT
merged
Comment 8 Wenkai Shi 2017-09-14 11:14:56 EDT
Verified with version openshift-ansible-3.7.0-0.126.1.git.0.0bb5b0c.el7, installer failed when hostname has more than 63 characters.

# echo -n "qe-weshitest-node-registry-router-1.centralus.cloudapp.azure.com" | wc -c
64

# ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbboks/byo/config.yml
...
TASK [fail] ********************************************************************
Thursday 14 September 2017  10:12:44 +0000 (0:00:00.026)       0:00:25.239 **** 
skipping: [qe-weshitest-master-etcd-nfs-1.centralus.cloudapp.azure.com] => {
    "changed": false, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}
fatal: [qe-weshitest-node-registry-router-1.centralus.cloudapp.azure.com]: FAILED! => {
    "changed": false, 
    "failed": true
}

MSG:

openshift_public_hostname must be 63 characters or less
...

PLAY RECAP *********************************************************************
localhost                  : ok=13   changed=0    unreachable=0    failed=0   
qe-weshitest-master-etcd-nfs-1.centralus.cloudapp.azure.com : ok=494  changed=186  unreachable=0    failed=0   
qe-weshitest-node-registry-router-1.centralus.cloudapp.azure.com : ok=13   changed=2    unreachable=0    failed=1

Note You need to log in before you can comment on or make changes to this bug.