Bug 1462428 - Can not start atomic-openshift-node if the system does not have a default route
Summary: Can not start atomic-openshift-node if the system does not have a default route
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.5.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.6.z
Assignee: Dan Williams
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks: 1489023 1489024
TreeView+ depends on / blocked
 
Reported: 2017-06-17 15:35 UTC by Alexis Solanas
Modified: 2017-09-06 15:36 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the OpenShift node proxy previously did not support using a specified IP address. Consequence: this bug could prevent correct operation on hosts with multiple network interface cards. Fix: the OpenShift node process already accepts a "--bind-address=<ip address>:<port>" command-line flag and "bindAddress:" config file option for the multiple network interface card case. The proxy functionality has been fixed to respect these options. Result: when --bind-address or bindAddress are used, the OpenShift node proxy should work correctly when the OpenShift node host has multiple network interface cards
Clone Of:
: 1489023 1489024 (view as bug list)
Environment:
Last Closed: 2017-08-10 05:28:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Origin (Github) 14815 0 None None None 2017-06-23 14:59:26 UTC
Red Hat Product Errata RHEA-2017:1716 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 09:02:50 UTC

Description Alexis Solanas 2017-06-17 15:35:25 UTC
Description of problem:

 If the default route doesn't exist in a host, atomic-openshift-node will not start


Version-Release number of selected component (if applicable):

 Openshift 3.5.5.15


How reproducible:

 Always


Steps to Reproduce:

1. # systemctl stop atomic-openshift-node
2. # ip route del default
3. # systemctl start atomic-openshift-node

Actual results:

 The service fails to start with the errors: 

Jun 17 16:23:31 node1 journal: I0617 16:23:31.588253    8091 healthcheck.go:119] Initializing kube-proxy health checker
Jun 17 16:23:31 node1 journal: F0617 16:23:31.646927    8091 node.go:463] error: Could not initialize Kubernetes Proxy: failed to select a host interface: Unable to select an IP.
Jun 17 16:23:31 node1 journal: When running in a container, you must run the container in the host network namespace with --net=host and with --privileged
Jun 17 16:23:31 node1 atomic-openshift-node: F0617 16:23:31.646927    8091 node.go:463] error: Could not initialize Kubernetes Proxy: failed to select a host interface: Unable to select an IP.
Jun 17 16:23:31 node1 atomic-openshift-node: When running in a container, you must run the container in the host network namespace with --net=host and with --privileged


Expected results:

 atomic-openshift-node starts correctly 


Additional info:

 As a workaround you can create a bogus route
 Upstream issue: https://github.com/openshift/origin/issues/13798

Comment 1 Dan Williams 2017-06-19 20:55:30 UTC
This is a bug in OpenShift's usage of kube-proxy, we should port an equivalent of https://github.com/kubernetes/kubernetes/pull/46678 where if given, we pass bind-address to the proxy instead of calling getNodeIP().

Comment 2 Dan Williams 2017-06-21 20:29:48 UTC
Origin PR to fix this issue: https://github.com/openshift/origin/pull/14815

Note that if you don't include a default route, you *must* include the BindAddress option for the proxy configuration, otherwise the proxy has no idea what IP address to use.

Comment 3 Hongan Li 2017-06-30 10:21:44 UTC
verified in openshift v3.6.126.4 but still cannot start node service after delete default route.

logs:
Jun 30 06:09:57 ip-172-18-1-137.ec2.internal atomic-openshift-node[94471]: I0630 06:09:57.705118   94471 interface.go:259] No valid IP found
Jun 30 06:09:57 ip-172-18-1-137.ec2.internal atomic-openshift-node[94471]: F0630 06:09:57.705138   94471 node.go:450] error: Could not initialize Kubernetes Proxy. You must run this process as root to use the service proxy: failed to select a host interface: Unable to select an IP.
Jun 30 06:09:57 ip-172-18-1-137.ec2.internal systemd[1]: Failed to start Atomic OpenShift Node.

node-config.yaml
<-----snip----->
servingInfo:
  bindAddress: 0.0.0.0:10250
  certFile: server.crt
  clientCA: ca.crt
  keyFile: server.key
volumeDirectory: /var/lib/origin/openshift.local.volumes
proxyArguments:
  proxy-mode:
     - iptables
<-----snip----->

Comment 4 Dan Williams 2017-06-30 15:25:15 UTC
(In reply to hongli from comment #3)
>   bindAddress: 0.0.0.0:10250

You must provide a real IP address of a NIC on the box if you do not want openshift to use the IP address of the default NIC.  You cannot use 0.0.0.0.

Comment 5 Hongan Li 2017-07-03 09:19:47 UTC
(In reply to Dan Williams from comment #4)
> (In reply to hongli from comment #3)
> >   bindAddress: 0.0.0.0:10250
> 
> You must provide a real IP address of a NIC on the box if you do not want
> openshift to use the IP address of the default NIC.  You cannot use 0.0.0.0.

tried all four IP addresses on the box but still cannot start node service. Maybe I missing something else ?

# ip a | grep "inet "
    inet 127.0.0.1/8 scope host lo
    inet 172.16.120.176/24 brd 172.16.120.255 scope global dynamic eth0
    inet 172.17.0.1/16 scope global docker0
    inet 10.128.0.1/23 scope global tun0

# grep "bind" /etc/origin/node/node-config.yaml 
  bindAddress: 172.16.120.176:10250

Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[84900]: I0703 05:17:06.998585   84900 interface.go:259] No valid IP found
Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[84900]: F0703 05:17:06.998607   84900 node.go:430] error: Could not initialize Kubernetes Proxy. You must run this process as root to use the service proxy: failed to select a host interface: Unable to select an IP.

Comment 6 Dan Williams 2017-07-06 18:27:26 UTC
(In reply to hongli from comment #5)
> # grep "bind" /etc/origin/node/node-config.yaml 
>   bindAddress: 172.16.120.176:10250
> 
> Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com
> atomic-openshift-node[84900]: I0703 05:17:06.998585   84900
> interface.go:259] No valid IP found
> Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com
> atomic-openshift-node[84900]: F0703 05:17:06.998607   84900 node.go:430]
> error: Could not initialize Kubernetes Proxy. You must run this process as
> root to use the service proxy: failed to select a host interface: Unable to
> select an IP.

Oh, sorry for not clarifying.  The bindAddress must be a plain IP address without a port #.  Does it work if you just specify "172.16.120.176"?

Comment 7 Dan Williams 2017-07-06 18:28:20 UTC
apidocs for bindAddress on the proxy say:

        // bindAddress is the IP address for the proxy server to serve on (set to 0.0.0.0
        // for all interfaces)
        BindAddress string `json:"bindAddress"`

Comment 8 Hongan Li 2017-07-07 02:28:46 UTC
(In reply to Dan Williams from comment #6)
> (In reply to hongli from comment #5)
> > # grep "bind" /etc/origin/node/node-config.yaml 
> >   bindAddress: 172.16.120.176:10250
> > 
> > Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com
> > atomic-openshift-node[84900]: I0703 05:17:06.998585   84900
> > interface.go:259] No valid IP found
> > Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com
> > atomic-openshift-node[84900]: F0703 05:17:06.998607   84900 node.go:430]
> > error: Could not initialize Kubernetes Proxy. You must run this process as
> > root to use the service proxy: failed to select a host interface: Unable to
> > select an IP.
> 
> Oh, sorry for not clarifying.  The bindAddress must be a plain IP address
> without a port #.  Does it work if you just specify "172.16.120.176"?

It doesn't work if just specify ip without port #:
Jul 06 22:25:30 host-8-175-59.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[6853]: Invalid NodeConfig /etc/origin/node/node-config.yaml
Jul 06 22:25:30 host-8-175-59.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[6853]: servingInfo.bindAddress: Invalid value: "172.16.120.176": must be a host:port

Comment 9 Dan Williams 2017-07-07 15:51:24 UTC
(In reply to hongli from comment #8)
> (In reply to Dan Williams from comment #6)
> > (In reply to hongli from comment #5)
> > > # grep "bind" /etc/origin/node/node-config.yaml 
> > >   bindAddress: 172.16.120.176:10250
> > > 
> > > Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com
> > > atomic-openshift-node[84900]: I0703 05:17:06.998585   84900
> > > interface.go:259] No valid IP found
> > > Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com
> > > atomic-openshift-node[84900]: F0703 05:17:06.998607   84900 node.go:430]
> > > error: Could not initialize Kubernetes Proxy. You must run this process as
> > > root to use the service proxy: failed to select a host interface: Unable to
> > > select an IP.
> > 
> > Oh, sorry for not clarifying.  The bindAddress must be a plain IP address
> > without a port #.  Does it work if you just specify "172.16.120.176"?
> 
> It doesn't work if just specify ip without port #:
> Jul 06 22:25:30 host-8-175-59.host.centralci.eng.rdu2.redhat.com
> atomic-openshift-node[6853]: Invalid NodeConfig
> /etc/origin/node/node-config.yaml
> Jul 06 22:25:30 host-8-175-59.host.centralci.eng.rdu2.redhat.com
> atomic-openshift-node[6853]: servingInfo.bindAddress: Invalid value:
> "172.16.120.176": must be a host:port

I am wrong, you are correct!

So the iptables proxy correctly handles the bindAddress after the patch in this bug.

But this new problem is in the Userspace proxy, because we default to the "unidling" feature turned on.  The userspace proxy does not correctly handle bindAddress and unconditionally calls ChooseHostInterface(), leading to the error.  So another patch for the userspace proxy will be required.

Comment 10 Dan Williams 2017-07-07 16:20:20 UTC
Upstream PR for userspace proxy bind address issue: https://github.com/kubernetes/kubernetes/pull/48613

Comment 11 Phil Cameron 2017-07-11 18:41:23 UTC
Should this discussion be added to the OCP documentation? It seems somewhat non-obvious and customers could encounter it.

Comment 12 Dan Williams 2017-07-12 19:54:15 UTC
Origin PR: https://github.com/openshift/origin/pull/15174

Comment 14 Hongan Li 2017-07-24 02:44:01 UTC
verified in atomic-openshift-3.6.162-1.git.0.b4e5dc3.el7.x86_64 and the issue has been fixed. node service can be started without default route.

Comment 16 errata-xmlrpc 2017-08-10 05:28:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716


Note You need to log in before you can comment on or make changes to this bug.