Bug 1304582 - Node or Master will not start when /etc/hosts has 127.0.0.1 equal to hostname
Summary: Node or Master will not start when /etc/hosts has 127.0.0.1 equal to hostname
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Dan Williams
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-04 02:50 UTC by Ryan Howe
Modified: 2019-10-10 11:06 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-12 16:27:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2210321 0 None None None 2017-01-18 03:25:19 UTC
Red Hat Product Errata RHSA-2016:1064 0 normal SHIPPED_LIVE Important: Red Hat OpenShift Enterprise 3.2 security, bug fix, and enhancement update 2016-05-12 20:19:17 UTC

Description Ryan Howe 2016-02-04 02:50:19 UTC
Description of problem:

When hostname set to localhost ip in /etc/hosts file  master and node service will not start


Version-Release number of selected component (if applicable):
3.1.1.6

How reproducible:
100%

Steps to Reproduce:
1. Set /etc/hosts 
127.0.0.1     master1.example.com master1

2. dig master1.example.com resolve to correct IP 

3. restart atomic-openshift-node.service  or restart atomic-openshift-master


Actual results:

 Node service fails to start

Expected results:
 
 Node serivce to start


Additional info:

Most likely due to this change

   https://github.com/openshift/origin/commit/d28b90d3e15e0ceab3f9fae2e92052bee02e716d


Master Error: 
eb 02 22:27:40 master.example.com atomic-openshift-master[10860]: F0202 22:27:40.864492   10860 run_components.go:334] SDN initialization failed: Failed to obtain IP address from node name:master.example.com
Feb 02 22:27:40 master.example.com systemd[1]: atomic-openshift-master.service: main process exited, code=exited, status=255/n/a
Feb 02 22:27:40 master.example.com systemd[1]: Unit atomic-openshift-master.service entered failed state.
Feb 02 22:27:40 master.example.com systemd[1]: atomic-openshift-master.service failed.
Feb 02 22:27:41 master.example.com systemd[1]: atomic-openshift-master.service holdoff time over, scheduling restart.
Feb 02 22:27:41 master.example.com systemd[1]: Starting Atomic OpenShift Master...

Comment 2 Ben Bennett 2016-02-05 15:59:14 UTC
I'm surprised that this would ever work.  Was this supported in the past?  If you pass NodeIP in the node config then it should work.

But what's the point of having the hostname set to 127.0.0.1?  The Linux networking stack is smart enough to handle traffic to the local IP in a faster manner, so there should be no need to set it to 127.0.0.1.

Comment 3 Dan Williams 2016-02-05 16:02:35 UTC
You have a few options:

1) do not alias the hostname to 127.0.0.1/localhost, but allow the node's hostname to resolve to the node's IP address through DNS

2) set NodeName in the configuration to either an IP address or to some name that resolves via DNS/hosts to something other than 127.0.0.1

3) set NodeIP in the configuration to the IP address that the node is accessible by

This is not really related to either of the changes mentioned, it's simply that OpenShift/Kubernetes/openshift-sdn need to know the actual IP address of the node, and that is determined by:

1) if NodeIP is given, always use that

2) if NodeName is given and is a hostname, that is looked up via the glibc resolver and must resolve to something other than 127.0.0.1

3) if NodeName is given and is an IP address, use that

4) if no NodeName is given, and no NodeIP is given, take the machine name from 'uname -n' and look that up via the glibc resolver

Comment 4 Eric Rich 2016-02-05 18:30:27 UTC
(In reply to Dan Williams from comment #3)
> You have a few options:
> 
> 1) do not alias the hostname to 127.0.0.1/localhost, but allow the node's
> hostname to resolve to the node's IP address through DNS
> 
> 2) set NodeName in the configuration to either an IP address or to some name
> that resolves via DNS/hosts to something other than 127.0.0.1
> 
> 3) set NodeIP in the configuration to the IP address that the node is
> accessible by
> 
> This is not really related to either of the changes mentioned, it's simply
> that OpenShift/Kubernetes/openshift-sdn need to know the actual IP address
> of the node, and that is determined by:
> 
> 1) if NodeIP is given, always use that
> 
> 2) if NodeName is given and is a hostname, that is looked up via the glibc
> resolver and must resolve to something other than 127.0.0.1
> 
> 3) if NodeName is given and is an IP address, use that
> 

What would thes options look like when passed to the installer?

> 4) if no NodeName is given, and no NodeIP is given, take the machine name
> from 'uname -n' and look that up via the glibc resolver

Comment 5 Dan Williams 2016-02-05 19:47:09 UTC
(In reply to Eric Rich from comment #4)
> (In reply to Dan Williams from comment #3)
> > You have a few options:
> > 
> > 1) do not alias the hostname to 127.0.0.1/localhost, but allow the node's
> > hostname to resolve to the node's IP address through DNS
> > 
> > 2) set NodeName in the configuration to either an IP address or to some name
> > that resolves via DNS/hosts to something other than 127.0.0.1
> > 
> > 3) set NodeIP in the configuration to the IP address that the node is
> > accessible by
> > 
> > This is not really related to either of the changes mentioned, it's simply
> > that OpenShift/Kubernetes/openshift-sdn need to know the actual IP address
> > of the node, and that is determined by:
> > 
> > 1) if NodeIP is given, always use that
> > 
> > 2) if NodeName is given and is a hostname, that is looked up via the glibc
> > resolver and must resolve to something other than 127.0.0.1
> > 
> > 3) if NodeName is given and is an IP address, use that
> > 
> 
> What would thes options look like when passed to the installer?

# Configure nodeIP in the node config
# This is needed in cases where node traffic is desired to go over an
# interface other than the default network interface.
#openshift_node_set_node_ip=True

For nodeName, I don't think you can override it with the Ansible installer, it will always be set to the system hostname.

But in the end, just don't alias the system hostname to 127.0.0.1.  I'd love to know why that was done...

Comment 6 Dan Williams 2016-02-05 19:47:46 UTC
Upstream pull request to grab NodeIP off the default gateway interface (matching kubelet behavior):

https://github.com/openshift/openshift-sdn/pull/256

Comment 7 Eric Paris 2016-02-15 15:03:40 UTC
os-sdn merged

Comment 8 Meng Bo 2016-02-24 10:14:18 UTC
Checked on OSE build 3.1.1.905, issue has been fixed.

[root@ose-master ~]# ping ose-master.bmeng.local 
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.019 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.032 ms
^C
--- localhost ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.019/0.025/0.032/0.008 ms
[root@ose-master ~]# ping -c1 ose-master.bmeng.local 
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.018 ms

--- localhost ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.018/0.018/0.018/0.000 ms
[root@ose-master ~]# systemctl restart atomic-openshift-master.service 
[root@ose-master ~]# systemctl status atomic-openshift-master.service 
● atomic-openshift-master.service - Atomic OpenShift Master
   Loaded: loaded (/usr/lib/systemd/system/atomic-openshift-master.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2016-02-24 18:12:43 CST; 6s ago
     Docs: https://github.com/openshift/origin
 Main PID: 2502 (openshift)
   CGroup: /system.slice/atomic-openshift-master.service
           └─2502 /usr/bin/openshift start master --config=/etc/origin/master/master-config.yaml --loglevel=2

Feb 24 18:12:48 ose-master.bmeng.local atomic-openshift-master[2502]: ua5EspbpY/n75siWeQk/++e1CEYx9JbYW8qk8A7HtgMzyO2k09G1ZgrKJAouEZ4R
Feb 24 18:12:48 ose-master.bmeng.local atomic-openshift-master[2502]: KLAWiG+T7rDj+AG6HVxxvb/QSF7/9XyV5aNxzCSOVV4UoxQvKB+PziBcykMZZBRh
Feb 24 18:12:48 ose-master.bmeng.local atomic-openshift-master[2502]: MfZIJJsCgYEAuISl8QZmipueb7w6Bh+4yt8vohy+vcJv9Ydb4oNnfB/mcJmtrjPo
Feb 24 18:12:48 ose-master.bmeng.local atomic-openshift-master[2502]: q0VOo06zNo1WQfw2YCHL8SdL0WjCzclQ7lcRrYid5hxRpgut4nBt9vCrnY5phMHi
Feb 24 18:12:48 ose-master.bmeng.local atomic-openshift-master[2502]: Zvx53sPdHVpszBq5FcIPFEU5Ts9kX7GBWRzTapBVZr+z4rLFXglQEAo=
Feb 24 18:12:48 ose-master.bmeng.local atomic-openshift-master[2502]: -----END RSA PRIVATE KEY-----
Feb 24 18:12:48 ose-master.bmeng.local atomic-openshift-master[2502]: ValueFrom:<nil>} {Name:OPENSHIFT_MASTER Value:https://ose-master.bmeng.local:8443 Valu....io/ser
Feb 24 18:12:48 ose-master.bmeng.local atomic-openshift-master[2502]: E0224 18:12:48.057521    2502 factory.go:340] Error scheduling default ipf-ha-1-lvt6r:...ny node
Feb 24 18:12:48 ose-master.bmeng.local atomic-openshift-master[2502]: fit failure on node (ose-node1.bmeng.local): PodFitsPorts
Feb 24 18:12:48 ose-master.bmeng.local atomic-openshift-master[2502]: ; retrying
Hint: Some lines were ellipsized, use -l to show in full.



[root@ose-node1 ~]# ping -c 1 ose-node1.bmeng.local
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.026 ms

--- localhost ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.026/0.026/0.026/0.000 ms
[root@ose-node1 ~]# systemctl restart atomic-openshift-node.service 
[root@ose-node1 ~]# systemctl status atomic-openshift-node.service 
● atomic-openshift-node.service - Atomic OpenShift Node
   Loaded: loaded (/usr/lib/systemd/system/atomic-openshift-node.service; disabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d
           └─openshift-sdn-ovs.conf
   Active: active (running) since Wed 2016-02-24 18:13:28 CST; 2s ago
     Docs: https://github.com/openshift/origin
 Main PID: 3248 (openshift)
   CGroup: /system.slice/atomic-openshift-node.service
           └─3248 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2

Feb 24 18:13:28 ose-node1.bmeng.local atomic-openshift-node[3248]: I0224 18:13:28.299975    3248 proxier.go:477] Setting endpoints for "default/kubernetes:d...8.6:53]
Feb 24 18:13:28 ose-node1.bmeng.local atomic-openshift-node[3248]: I0224 18:13:28.299984    3248 proxier.go:477] Setting endpoints for "default/kubernetes:h...6:8443]
Feb 24 18:13:28 ose-node1.bmeng.local atomic-openshift-node[3248]: I0224 18:13:28.300001    3248 proxier.go:558] Not syncing iptables until Services and End... master
Feb 24 18:13:28 ose-node1.bmeng.local atomic-openshift-node[3248]: I0224 18:13:28.300236    3248 proxier.go:414] Adding new service "default/kubernetes:http...443/TCP
Feb 24 18:13:28 ose-node1.bmeng.local atomic-openshift-node[3248]: I0224 18:13:28.300302    3248 proxier.go:414] Adding new service "default/kubernetes:dns"...:53/UDP
Feb 24 18:13:28 ose-node1.bmeng.local atomic-openshift-node[3248]: I0224 18:13:28.300325    3248 proxier.go:414] Adding new service "default/kubernetes:dns-...:53/TCP
Feb 24 18:13:28 ose-node1.bmeng.local atomic-openshift-node[3248]: I0224 18:13:28.300342    3248 proxier.go:414] Adding new service "default/router:80-tcp" ...:80/TCP
Feb 24 18:13:28 ose-node1.bmeng.local atomic-openshift-node[3248]: I0224 18:13:28.300360    3248 proxier.go:414] Adding new service "default/router:443-tcp"...443/TCP
Feb 24 18:13:28 ose-node1.bmeng.local atomic-openshift-node[3248]: I0224 18:13:28.300376    3248 proxier.go:414] Adding new service "default/router:1936-tcp...936/TCP
Feb 24 18:13:28 ose-node1.bmeng.local atomic-openshift-node[3248]: I0224 18:13:28.300397    3248 proxier.go:414] Adding new service "u1p1/ha-service:" at 17...736/TCP
Hint: Some lines were ellipsized, use -l to show in full.

Comment 10 errata-xmlrpc 2016-05-12 16:27:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064


Note You need to log in before you can comment on or make changes to this bug.