1467790 – Start and enable node failed due to node has 64 characters hostname

Bug 1467790 - Start and enable node failed due to node has 64 characters hostname

Summary: Start and enable node failed due to node has 64 characters hostname

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.7.0
Assignee:	Michael Gugino
QA Contact:	Wenkai Shi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-07-05 07:29 UTC by Wenkai Shi
Modified:	2017-11-28 22:00 UTC (History)
CC List:	3 users (show)
Fixed In Version:	openshift-ansible-3.7.0-0.126.1.git.0.0bb5b0c.el7.noarch
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-11-28 22:00:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:3188	0	normal	SHIPPED_LIVE	Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update	2017-11-29 02:34:54 UTC

Description Wenkai Shi 2017-07-05 07:29:53 UTC

Description of problem:
Start and enable node failed due to node has 64 characters hostname, it break the installer, according to " https://bugzilla.redhat.com/show_bug.cgi?id=1211856#c7 ", seems installer will be break early if instance has more than 64 characters. The atomic-openshift-node.service require no more than 63 characters.

Version-Release number of selected component (if applicable):
openshift-ansible-3.6.133-1.git.0.950bb48.el7

How reproducible:
100%

Steps to Reproduce:
1. prepare instances with 64 characters hostname
2. install OCP
3.

Actual results:
# ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbboks/byo/config.yml
...
TASK [openshift_node : Start and enable node] **********************************
Wednesday 05 July 2017  06:05:01 +0000 (0:00:00.075)       0:13:13.322 ******** 
FAILED - RETRYING: TASK: openshift_node : Start and enable node (1 retries left).
fatal: [qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com]: FAILED! => {
    "attempts": 1, 
    "changed": false, 
    "failed": true
}

MSG:

Unable to start service atomic-openshift-node: Job for atomic-openshift-node.service failed because the control process exited with error code. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details.
...

Expected results:
Installation succeed

Additional info:
# journalctl -xe -u atomic-openshift-node
...
Jul 05 06:53:56 qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com atomic-openshift-node[35512]: E0705 06:53:56.800726   35512 kubelet_node_status.go:101] Unable to register node "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com" with API server: Node "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com" is invalid: metadata.labels: Invalid value: "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com": must be no more than 63 characters
...

# echo -n "qe-weshi-master-registry-router-nfs-1.westus2.cloudapp.azure.com" | wc -c 
64

Comment 1 Tim Bielawa 2017-07-10 15:02:21 UTC

I'm wondering, is this actually a bug/limitation in OCP/Kube rather than the installer?

Comment 2 Scott Dodson 2017-07-20 14:37:29 UTC

Lets handle this in sanitize_inventory role

Comment 3 Michael Gugino 2017-08-16 01:08:47 UTC

While I agree it may be an issue in OpenShift, I have submitted the following PR to work around on our end:

https://github.com/openshift/openshift-ansible/pull/5100

I don't believe sanitize_inventory is the most ideal place for this check as we can potentially populate other names as nodename.

Comment 5 Wenkai Shi 2017-09-11 07:56:03 UTC

In QE's case, didn't set the openshift_hostname, the openshift_public_hostname set, when openshift_public_hostname greater than 63 installer still failed in "restart node" task, installer succeed when the openshift_public_hostname less than 64. Suggest to add one more check like this:

  - fail:
      msg: openshift_public_hostname must be 63 characters or less
    when: openshift_public_hostname is defined and openshift_public_hostname | length > 63

By the way, when set openshift_public_hostname greater than 63 characters, the host has same internal and public hostname.

Comment 6 Michael Gugino 2017-09-11 13:50:43 UTC

New pull request created:  https://github.com/openshift/openshift-ansible/pull/5353

This pull request address the issue of openshift_public_hostname length as suggested by QE.

Comment 7 Scott Dodson 2017-09-12 13:26:46 UTC

merged

Comment 8 Wenkai Shi 2017-09-14 15:14:56 UTC

Verified with version openshift-ansible-3.7.0-0.126.1.git.0.0bb5b0c.el7, installer failed when hostname has more than 63 characters.

# echo -n "qe-weshitest-node-registry-router-1.centralus.cloudapp.azure.com" | wc -c
64

# ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbboks/byo/config.yml
...
TASK [fail] ********************************************************************
Thursday 14 September 2017  10:12:44 +0000 (0:00:00.026)       0:00:25.239 **** 
skipping: [qe-weshitest-master-etcd-nfs-1.centralus.cloudapp.azure.com] => {
    "changed": false, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}
fatal: [qe-weshitest-node-registry-router-1.centralus.cloudapp.azure.com]: FAILED! => {
    "changed": false, 
    "failed": true
}

MSG:

openshift_public_hostname must be 63 characters or less
...

PLAY RECAP *********************************************************************
localhost                  : ok=13   changed=0    unreachable=0    failed=0   
qe-weshitest-master-etcd-nfs-1.centralus.cloudapp.azure.com : ok=494  changed=186  unreachable=0    failed=0   
qe-weshitest-node-registry-router-1.centralus.cloudapp.azure.com : ok=13   changed=2    unreachable=0    failed=1

Comment 11 errata-xmlrpc 2017-11-28 22:00:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.