1433236 – [vSphere] Unable to restart atomic-openshift-node, node ip conflicts with cluster network

Bug 1433236 - [vSphere] Unable to restart atomic-openshift-node, node ip conflicts with cluster network

Summary: [vSphere] Unable to restart atomic-openshift-node, node ip conflicts with clu...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	3.7.0
Assignee:	Dan Williams
QA Contact:	Meng Bo
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-03-17 07:14 UTC by Jianwei Hou
Modified:	2018-03-15 18:19 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: vSphere needed upstream Kubernetes changes to be included with OpenShift to work Consequence: vSphere had networking problems. Fix: The periodic resync of Kubernetes into OpenShift included the change. Result: vSphere worked correctly.
Clone Of:
Environment:
Last Closed:	2017-11-28 21:53:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
The ip of tun0 is displayed in the first place of VM ip addresses. (74.17 KB, image/png) 2017-03-17 07:22 UTC, Jianwei Hou	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:3188	0	normal	SHIPPED_LIVE	Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update	2017-11-29 02:34:54 UTC

Description Jianwei Hou 2017-03-17 07:14:41 UTC

Description of problem:
Setup OCP on vSphere. Enable cloud provider and restart node service. Node service failed to start with error "Mar 17 14:39:39 master docker: E0317 06:39:39.457526       1 eventqueue.go:109] event processing failed: error creating subnet for node master.c.vsphere.local, ip 10.128.0.1: Node IP 10.128.0.1 conflicts with cluster network 10.128.0.0/14"

Version-Release number of selected component (if applicable):
openshift v3.5.0.54
kubernetes v1.5.2+43a9be4
etcd 3.1.0

How reproducible:
Always

Steps to Reproduce:
1. Setup OCP on vSphere
2. Edit master and node configs to enable vSphere cloud provider
3. Restart master
4. Restart nodes
5. Remove the tun0 interface and restart node service, node is back.
ip addr del 10.128.0.1/23 dev tun0
systemctl restart atomic-openshift-node
oc get nodes

Actual results:
After step 4: Node service failed to restart

```
Mar 17 14:38:40 master.c.vsphere.local atomic-openshift-node[4111]: W0317 14:38:40.591060    4172 controller.go:71] Could not find an allocated subnet for node: master.c.vsphere.local, Waiting...
Mar 17 14:38:40 master.c.vsphere.local atomic-openshift-node[4111]: F0317 14:38:40.591100    4172 node.go:351] error: SDN node startup failed: Failed to get subnet for this host: master.c.vsphere.local, error: timed out waiting for the condition
Mar 17 14:38:41 master.c.vsphere.local systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a
Mar 17 14:38:41 master.c.vsphere.local atomic-openshift-node[4292]: Error response from daemon: No such container: atomic-openshift-node
Mar 17 14:38:41 master.c.vsphere.local systemd[1]: atomic-openshift-node.service: control process exited, code=exited status=1
Mar 17 14:38:41 master.c.vsphere.local systemd[1]: Unit atomic-openshift-node.service entered failed state.
Mar 17 14:38:41 master.c.vsphere.local systemd[1]: atomic-openshift-node.service failed.
```

Dug into log and found root cause is

```
Mar 17 14:47:55 master docker: I0317 06:47:55.511855       1 subnets.go:187] Watch Updated event for Node "master.c.vsphere.local"
Mar 17 14:47:55 master docker: E0317 06:47:55.511886       1 eventqueue.go:109] event processing failed: error creating subnet for node master.c.vsphere.local, ip 10.128.0.1: Node IP 10.128.0.1 conflicts with cluster network 10.128.0.0/14
```

The ip address 10.128.0.1 was taken as node ip. Also from the vCenter console, the ip address 10.128.0.1 is shown in the first place of all ip addresses of this VM(See attachment).

# oc get node master.c.vsphere.local -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: 2017-03-17T06:32:07Z
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    kubernetes.io/hostname: master.c.vsphere.local
  name: master.c.vsphere.local
  resourceVersion: "4832"
  selfLink: /api/v1/nodesmaster.c.vsphere.local
  uid: 716b5516-0adb-11e7-b483-0050569fb058
spec:
  externalID: /Datacenter/vm/master.c.vsphere.local
  providerID: vsphere:////Datacenter/vm/master.c.vsphere.local
status:
  addresses:
  - address: 10.128.0.1
    type: ExternalIP
  - address: fe80::6083:bfff:febf:db75
    type: ExternalIP
  - address: 10.66.146.29
    type: InternalIP
  - address: 2620:52:0:4292:250:56ff:fe9f:b058
    type: InternalIP
  - address: fe80::250:56ff:fe9f:b058
    type: InternalIP
  - address: 172.17.0.1
    type: ExternalIP
  - address: fe80::42:2eff:fe71:6bcd
    type: ExternalIP
  - address: master.c.vsphere.local
    type: Hostname
  allocatable:
    alpha.kubernetes.io/nvidia-gpu: "0"
    cpu: "1"
    memory: 3882448Ki
    pods: "10"
  capacity:
    alpha.kubernetes.io/nvidia-gpu: "0"
    cpu: "1"
    memory: 3882448Ki
    pods: "10"
  conditions:
  - lastHeartbeatTime: 2017-03-17T06:56:22Z
    lastTransitionTime: 2017-03-17T06:32:07Z
    message: kubelet has sufficient disk space available
    reason: KubeletHasSufficientDisk
    status: "False"
    type: OutOfDisk
  - lastHeartbeatTime: 2017-03-17T06:56:22Z
    lastTransitionTime: 2017-03-17T06:32:07Z
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: 2017-03-17T06:56:22Z
    lastTransitionTime: 2017-03-17T06:32:07Z
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: 2017-03-17T06:56:22Z
    lastTransitionTime: 2017-03-17T06:32:07Z
    message: container runtime is down,SDN pod network is not ready
    reason: KubeletNotReady
    status: "False"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/node@sha256:406f83489f01f29f57372ea26a6596a413ddd9342274e4a83791b5ddba0bf948
    - brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/node:v3.5.0.54
    sizeBytes: 975716132
  - names:
    - brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose@sha256:03606883abf6e414e1013e71ac5b515d4fb9cb9cc7844b560a8362321797936c
    - brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose@sha256:e1bb6ad6264dbebe64c8ffdd9508dbf2c5779673de341f356c82ae0bea82e148
    - brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose:latest
    - brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose:v3.5.0.54
    sizeBytes: 754559384
  - names:
    - brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/registry-console@sha256:e958ba7e7cc1f907168ff92f4c135c640a0322e757063babbd2d1c76a3d1cf7d
    - brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/registry-console:3.5
    sizeBytes: 435169849
  - names:
    - brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/openvswitch@sha256:fc8d5adbb04bb93ab09d9c4ac15714ec7f1087ee9df02fe49a2eb5f0eadc9c08
    - brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/openvswitch:v3.5.0.54
    sizeBytes: 427571401
  - names:
    - registry.access.redhat.com/rhel7/etcd@sha256:439bb270e38f396fcb793a2a2502f364ab98d7de99035c49a6afadd3ff39923b
    - registry.access.redhat.com/rhel7/etcd:latest
    sizeBytes: 233135973
  - names:
    - brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-pod@sha256:0161ab820306619d0792dde2d59a4168f8bfa22e7ecbd687fcff79e96229e553
    - brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-pod:v3.5.0.54
    sizeBytes: 205071914
  nodeInfo:
    architecture: amd64
    bootID: 22b5e7dd-10e6-4dd9-b261-0f6ab0ee3308
    containerRuntimeVersion: docker://1.12.3
    kernelVersion: 3.10.0-514.el7.x86_64
    kubeProxyVersion: v1.5.2+43a9be4
    kubeletVersion: v1.5.2+43a9be4
    machineID: 6e3d9207f2b04129ac31273525254823
    operatingSystem: linux
    osImage: Red Hat Enterprise Linux Server 7.3 (Maipo)
    systemUUID: 421F123B-B0C6-0A52-9681-2E5FB35CE84D

After step 5:
# oc get nodes
NAME                     STATUS    AGE
master.c.vsphere.local   Ready     28m
node1.c.vsphere.local    Ready     28m

Expected results:
Node service should restart successfully after enabling cloud provider

Additional info:

Comment 1 Jianwei Hou 2017-03-17 07:16:38 UTC

Another solution is adding 'nodeIP: xxx' in the node-config.yml so that the node is functional.

Comment 2 Jianwei Hou 2017-03-17 07:22:20 UTC

Created attachment 1263960 [details]
The ip of tun0 is displayed in the first place of VM ip addresses.

Comment 3 Ben Bennett 2017-03-17 14:40:07 UTC

I'm not really sure waht the bug is here.  You need to configure the node so that it has its own addresses.  There are several ways to do that (depending upon how you did the install).  The ansible installer has an argument, 'openshift start master' takes the --network-cidr argument.  etc.

Comment 4 Meng Bo 2017-03-20 02:28:53 UTC

@Ben

The problem here is, when we first time setup the env, the SDN will assign network to the node. Let's say 10.128.0.0/23 here, and the tun0 ip is 10.128.0.1

And after that if we restart the node service, the SDN node start up will fail due to the tun0 IP which assigned by SDN is included in the cluster CIDR.

I am not sure how did it happen, but the node address list looks abnormal to me.

status:
  addresses:
  - address: 10.128.0.1
    type: ExternalIP
  - address: fe80::6083:bfff:febf:db75
    type: ExternalIP
  - address: 10.66.146.29
    type: InternalIP
  - address: 2620:52:0:4292:250:56ff:fe9f:b058
    type: InternalIP
  - address: fe80::250:56ff:fe9f:b058
    type: InternalIP
  - address: 172.17.0.1
    type: ExternalIP
  - address: fe80::42:2eff:fe71:6bcd
    type: ExternalIP
  - address: master.c.vsphere.local
    type: Hostname

Comment 5 Ben Bennett 2017-03-28 15:33:55 UTC

So... the node is not actually using 10.128.0.1?  It looks to me like that was allocated to the machine.

Can you get me the full output from 'ip a' please?

Comment 6 Jianwei Hou 2017-03-29 02:42:20 UTC

@bbennett here is the output
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:50:56:a0:61:dd brd ff:ff:ff:ff:ff:ff
    inet 10.66.146.232/22 brd 10.66.147.255 scope global dynamic ens192
       valid_lft 13746sec preferred_lft 13746sec
    inet6 2620:52:0:4292:250:56ff:fea0:61dd/64 scope global noprefixroute dynamic 
       valid_lft 2591891sec preferred_lft 604691sec
    inet6 fe80::250:56ff:fea0:61dd/64 scope link 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN 
    link/ether 02:42:4b:a0:c4:0b brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 46:f3:8f:9b:55:71 brd ff:ff:ff:ff:ff:ff
5: br0: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN qlen 1000
    link/ether ea:1d:7b:cb:6a:47 brd ff:ff:ff:ff:ff:ff
32: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65470 qdisc noqueue master ovs-system state UNKNOWN qlen 1000
    link/ether f6:8e:05:be:52:c2 brd ff:ff:ff:ff:ff:ff
33: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN qlen 1000
    link/ether 4e:67:ad:c3:cf:4e brd ff:ff:ff:ff:ff:ff
    inet 10.128.0.1/23 scope global tun0
       valid_lft forever preferred_lft forever
    inet6 fe80::4c67:adff:fec3:cf4e/64 scope link 
       valid_lft forever preferred_lft forever

Comment 7 Dan Williams 2017-04-05 01:59:24 UTC

(In reply to Meng Bo from comment #4)
> @Ben
> 
> The problem here is, when we first time setup the env, the SDN will assign
> network to the node. Let's say 10.128.0.0/23 here, and the tun0 ip is
> 10.128.0.1
> 
> And after that if we restart the node service, the SDN node start up will
> fail due to the tun0 IP which assigned by SDN is included in the cluster
> CIDR.
> 
> I am not sure how did it happen, but the node address list looks abnormal to
> me.
> 
> status:
>   addresses:
>   - address: 10.128.0.1
>     type: ExternalIP
>   - address: fe80::6083:bfff:febf:db75
>     type: ExternalIP
>   - address: 10.66.146.29
>     type: InternalIP
>   - address: 2620:52:0:4292:250:56ff:fe9f:b058
>     type: InternalIP
>   - address: fe80::250:56ff:fe9f:b058
>     type: InternalIP
>   - address: 172.17.0.1
>     type: ExternalIP
>   - address: fe80::42:2eff:fe71:6bcd
>     type: ExternalIP
>   - address: master.c.vsphere.local
>     type: Hostname

The vSphere cloud provider code is completely wrong here, IMO.  It should not be returning all IP addresses of all Linux network interfaces on the system, since some of those can reach the outside and others cannot.  vSphere has no idea which ones are which, only the guest OS and it's network setup does.

Furthermore, since nothing in Kubernetes/OpenShift currently supports IPv6, it's quite odd to be sending IPv6 addresess.

Even more, sending *link-local* IPv6 addresses is unlikely to do what people expect, even if Kubernetes supported IPv6.

The relevant code is in the vSphere cloud provider in kubernetes/pkg/cloudprovider/providers/vsphere/vsphere.go:

	var mvm mo.VirtualMachine
	err = getVirtualMachineManagedObjectReference(ctx, c, vm, "guest.net", &mvm)

	// retrieve VM's ip(s)
	for _, v := range mvm.Guest.Net {
		var addressType api.NodeAddressType
		if i.cfg.Network.PublicNetwork == v.Network {
			addressType = api.NodeExternalIP
		} else {
			addressType = api.NodeInternalIP
		}
		for _, ip := range v.IpAddress {
			api.AddToNodeAddresses(&addrs,
				api.NodeAddress{
					Type:    addressType,
					Address: ip,
				},
			)
		}
	}
	return addrs, nil

It looks like the vSphere API's "guest.net" object is an alias for the "GuestInfo.net" property described here:

http://pubs.vmware.com/vsphere-50/index.jsp#com.vmware.wssdk.apiref.doc_50/vim.vm.GuestInfo.html#field_detail

and that appears to list all NICs and IP addresses on the guest, regardless of whether they got their address from VMWare infrastructure or not.  The GCE and AWS cloud providers don't have this problem, because they know exactly what IP address the VM was provisioned with by the GCE/AWS infrastructure and report that, rather than scraping the machine for them.

Perhaps vSphere cloud provider should be using "guest.ipaddress" instead, which is documented in that same link as "Primary IP address assigned to the guest operating system, if known."?

Comment 8 Dan Williams 2017-04-05 03:50:28 UTC

Can you paste in your cloud-config file that you pass to openshift-node?

What was the value of the 'public-network' key (if any) and did that ever change over the course of your efforts here?

Comment 9 Dan Williams 2017-04-05 04:06:58 UTC

Filed an upstream issue for the vSphere stuff:

https://github.com/kubernetes/kubernetes/issues/44075

Beyond that, openshift-sdn should probably:

1) not pick IPv6 addresses from Node.Status.Addresses

2) do something more intelligent than picking the first IP address from Node.Status.Addresses; but this depends on cloud providers being consistent in what they call ExternalIP and InternalIP, and vSphere currently torpedoes that consistency.

Comment 12 Ben Bennett 2017-04-19 14:00:25 UTC

This is a vSphere bug, Dan has opened the issue there.  Any further improvements we make are just minor things that don't fix the main problem.

Dropping the severity so we track the other potential changes.

Comment 13 Dan Williams 2017-06-09 20:33:32 UTC

Part of this is supposedly fixed upstream in kube-1.7 due to https://github.com/kubernetes/kubernetes/pull/43545.  There are apparently other fixes required as well, which supposedly VMWare is working on.  c7a22a588f112dd34ae8dce9b4014ba5510e5575 might be the that second part.

Comment 14 Dan Williams 2017-06-09 20:43:35 UTC

We could potentially backport some of these fixes if PM thinks it's a high enough priority.  The 43545 pull is part of kube 1.7, while "second part" I reference above will be part of kube 1.8.  So both would need backports to OSE 3.6.

Comment 15 Dan Williams 2017-07-24 16:45:11 UTC

OpenShift has now been rebased to Kube 1.7 and that pulls in some upstream VSphere fixes.  If somebody can test OSE 3.7 nightlies and see if that fixes any of these issues, that would be great.

Comment 16 Jianwei Hou 2017-09-14 02:36:37 UTC

Verified this is fixed in
oc v3.7.0-0.125.0
kubernetes v1.7.0+695f48a16f
features: Basic-Auth GSSAPI Kerberos SPNEGO

Now I have completed removed nodeIP from node-config.yml, removed the node, enabled vSphere cloud provider. After atomic-openshift-node service is restarted, the node can be added to the cluster.

Comment 22 errata-xmlrpc 2017-11-28 21:53:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.