Description of problem: If the default route doesn't exist in a host, atomic-openshift-node will not start Version-Release number of selected component (if applicable): Openshift 3.5.5.15 How reproducible: Always Steps to Reproduce: 1. # systemctl stop atomic-openshift-node 2. # ip route del default 3. # systemctl start atomic-openshift-node Actual results: The service fails to start with the errors: Jun 17 16:23:31 node1 journal: I0617 16:23:31.588253 8091 healthcheck.go:119] Initializing kube-proxy health checker Jun 17 16:23:31 node1 journal: F0617 16:23:31.646927 8091 node.go:463] error: Could not initialize Kubernetes Proxy: failed to select a host interface: Unable to select an IP. Jun 17 16:23:31 node1 journal: When running in a container, you must run the container in the host network namespace with --net=host and with --privileged Jun 17 16:23:31 node1 atomic-openshift-node: F0617 16:23:31.646927 8091 node.go:463] error: Could not initialize Kubernetes Proxy: failed to select a host interface: Unable to select an IP. Jun 17 16:23:31 node1 atomic-openshift-node: When running in a container, you must run the container in the host network namespace with --net=host and with --privileged Expected results: atomic-openshift-node starts correctly Additional info: As a workaround you can create a bogus route Upstream issue: https://github.com/openshift/origin/issues/13798
This is a bug in OpenShift's usage of kube-proxy, we should port an equivalent of https://github.com/kubernetes/kubernetes/pull/46678 where if given, we pass bind-address to the proxy instead of calling getNodeIP().
Origin PR to fix this issue: https://github.com/openshift/origin/pull/14815 Note that if you don't include a default route, you *must* include the BindAddress option for the proxy configuration, otherwise the proxy has no idea what IP address to use.
verified in openshift v3.6.126.4 but still cannot start node service after delete default route. logs: Jun 30 06:09:57 ip-172-18-1-137.ec2.internal atomic-openshift-node[94471]: I0630 06:09:57.705118 94471 interface.go:259] No valid IP found Jun 30 06:09:57 ip-172-18-1-137.ec2.internal atomic-openshift-node[94471]: F0630 06:09:57.705138 94471 node.go:450] error: Could not initialize Kubernetes Proxy. You must run this process as root to use the service proxy: failed to select a host interface: Unable to select an IP. Jun 30 06:09:57 ip-172-18-1-137.ec2.internal systemd[1]: Failed to start Atomic OpenShift Node. node-config.yaml <-----snip-----> servingInfo: bindAddress: 0.0.0.0:10250 certFile: server.crt clientCA: ca.crt keyFile: server.key volumeDirectory: /var/lib/origin/openshift.local.volumes proxyArguments: proxy-mode: - iptables <-----snip----->
(In reply to hongli from comment #3) > bindAddress: 0.0.0.0:10250 You must provide a real IP address of a NIC on the box if you do not want openshift to use the IP address of the default NIC. You cannot use 0.0.0.0.
(In reply to Dan Williams from comment #4) > (In reply to hongli from comment #3) > > bindAddress: 0.0.0.0:10250 > > You must provide a real IP address of a NIC on the box if you do not want > openshift to use the IP address of the default NIC. You cannot use 0.0.0.0. tried all four IP addresses on the box but still cannot start node service. Maybe I missing something else ? # ip a | grep "inet " inet 127.0.0.1/8 scope host lo inet 172.16.120.176/24 brd 172.16.120.255 scope global dynamic eth0 inet 172.17.0.1/16 scope global docker0 inet 10.128.0.1/23 scope global tun0 # grep "bind" /etc/origin/node/node-config.yaml bindAddress: 172.16.120.176:10250 Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[84900]: I0703 05:17:06.998585 84900 interface.go:259] No valid IP found Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[84900]: F0703 05:17:06.998607 84900 node.go:430] error: Could not initialize Kubernetes Proxy. You must run this process as root to use the service proxy: failed to select a host interface: Unable to select an IP.
(In reply to hongli from comment #5) > # grep "bind" /etc/origin/node/node-config.yaml > bindAddress: 172.16.120.176:10250 > > Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com > atomic-openshift-node[84900]: I0703 05:17:06.998585 84900 > interface.go:259] No valid IP found > Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com > atomic-openshift-node[84900]: F0703 05:17:06.998607 84900 node.go:430] > error: Could not initialize Kubernetes Proxy. You must run this process as > root to use the service proxy: failed to select a host interface: Unable to > select an IP. Oh, sorry for not clarifying. The bindAddress must be a plain IP address without a port #. Does it work if you just specify "172.16.120.176"?
apidocs for bindAddress on the proxy say: // bindAddress is the IP address for the proxy server to serve on (set to 0.0.0.0 // for all interfaces) BindAddress string `json:"bindAddress"`
(In reply to Dan Williams from comment #6) > (In reply to hongli from comment #5) > > # grep "bind" /etc/origin/node/node-config.yaml > > bindAddress: 172.16.120.176:10250 > > > > Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com > > atomic-openshift-node[84900]: I0703 05:17:06.998585 84900 > > interface.go:259] No valid IP found > > Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com > > atomic-openshift-node[84900]: F0703 05:17:06.998607 84900 node.go:430] > > error: Could not initialize Kubernetes Proxy. You must run this process as > > root to use the service proxy: failed to select a host interface: Unable to > > select an IP. > > Oh, sorry for not clarifying. The bindAddress must be a plain IP address > without a port #. Does it work if you just specify "172.16.120.176"? It doesn't work if just specify ip without port #: Jul 06 22:25:30 host-8-175-59.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[6853]: Invalid NodeConfig /etc/origin/node/node-config.yaml Jul 06 22:25:30 host-8-175-59.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[6853]: servingInfo.bindAddress: Invalid value: "172.16.120.176": must be a host:port
(In reply to hongli from comment #8) > (In reply to Dan Williams from comment #6) > > (In reply to hongli from comment #5) > > > # grep "bind" /etc/origin/node/node-config.yaml > > > bindAddress: 172.16.120.176:10250 > > > > > > Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com > > > atomic-openshift-node[84900]: I0703 05:17:06.998585 84900 > > > interface.go:259] No valid IP found > > > Jul 03 05:17:06 host-8-175-59.host.centralci.eng.rdu2.redhat.com > > > atomic-openshift-node[84900]: F0703 05:17:06.998607 84900 node.go:430] > > > error: Could not initialize Kubernetes Proxy. You must run this process as > > > root to use the service proxy: failed to select a host interface: Unable to > > > select an IP. > > > > Oh, sorry for not clarifying. The bindAddress must be a plain IP address > > without a port #. Does it work if you just specify "172.16.120.176"? > > It doesn't work if just specify ip without port #: > Jul 06 22:25:30 host-8-175-59.host.centralci.eng.rdu2.redhat.com > atomic-openshift-node[6853]: Invalid NodeConfig > /etc/origin/node/node-config.yaml > Jul 06 22:25:30 host-8-175-59.host.centralci.eng.rdu2.redhat.com > atomic-openshift-node[6853]: servingInfo.bindAddress: Invalid value: > "172.16.120.176": must be a host:port I am wrong, you are correct! So the iptables proxy correctly handles the bindAddress after the patch in this bug. But this new problem is in the Userspace proxy, because we default to the "unidling" feature turned on. The userspace proxy does not correctly handle bindAddress and unconditionally calls ChooseHostInterface(), leading to the error. So another patch for the userspace proxy will be required.
Upstream PR for userspace proxy bind address issue: https://github.com/kubernetes/kubernetes/pull/48613
Should this discussion be added to the OCP documentation? It seems somewhat non-obvious and customers could encounter it.
Origin PR: https://github.com/openshift/origin/pull/15174
verified in atomic-openshift-3.6.162-1.git.0.b4e5dc3.el7.x86_64 and the issue has been fixed. node service can be started without default route.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716