1197170 – openshift-sdn does not allow re-configuration of cluster subnet

Bug 1197170 - openshift-sdn does not allow re-configuration of cluster subnet

Summary: openshift-sdn does not allow re-configuration of cluster subnet

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.0.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Dan Winship
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-02-27 16:10 UTC by Scott Dodson
Modified:	2016-02-26 13:18 UTC (History)
CC List:	11 users (show)
Fixed In Version:	openshift-sdn-0.4-3.git.0.954ef11.el7ose.x86_64.rpm
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-02-26 13:18:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Scott Dodson 2015-02-27 16:10:59 UTC

Customers may run into IP/DNS conflicts with openshift-sdn cluster subnet of 10.1.0.0/16 so we should allow this value to be configured.

github issue: https://github.com/openshift/openshift-sdn/issues/30

Comment 2 Scott Dodson 2015-03-10 20:30:06 UTC

Container Subnets can now be set adding these flags to /etc/sysconfig/openshift-sdn-master OPTIONS line

  -container-network=10.1.0.0/16 -container-subnet-length=8

That will allocate 254 subnets each with 254 IPs to each host, ie : 10.1.1.0/24

Comment 3 Gaoyun Pei 2015-03-11 08:20:26 UTC

Test this bug with openshift-sdn-0.4-3.git.0.954ef11.el7ose.x86_64

Found this option only works well in fresh installation.

Configuring OPTIONS='-v=4 -container-network=10.2.0.0/16 -container-subnet-length=8' in /etc/sysconfig/openshift-sdn-master in a fresh installation, "Sub":"10.2.0.0/24" would be created for the minion while starting openshift-sdn-master, lbr0 on the minion would get ip addr 10.2.0.1 after openshift-sdn-node started. 
Create a pod on the minion, it get a 10.2.0.x address, which is working as expected.
[root@openshift-v3 beta2]# osc get pod
POD                 IP                  CONTAINER(S)        IMAGE(S)                    HOST                   LABELS                 STATUS
hello-openshift     10.2.0.2            hello-openshift     openshift/hello-openshift   master/192.168.0.183   name=hello-opens


Then update the OPTION as "-container-network=10.4.0.0/16 -container-subnet-length=12", restart openshift-sdn-master, it still use "Sub":"10.2.0.0/24":
...
Mar 11 15:33:14 openshift-v3 openshift-sdn: W0311 15:33:14.160534 04338 registry.go:242] Found existing network configuration, overwriting it.
Mar 11 15:33:14 openshift-v3 openshift-sdn: Provided subnet doesn't belong to network:  10.2.0.0/24
Mar 11 15:33:14 openshift-v3 openshift-sdn: I0311 15:33:14.162589 04338 registry.go:222] Unmarshalling response: {"Minion":"192.168.0.183","Sub":"10.2.0.0/24"}

Restart openshift-sdn-node, it's using the previous subnet configuration:
...
Mar 11 16:02:30 master systemd: Starting OpenShift Node...
Mar 11 16:02:30 master systemd: Started OpenShift Node.
Mar 11 16:02:30 master openshift-sdn: I0311 16:02:30.933926 05198 controller.go:193] Output of setup script:
Mar 11 16:02:30 master openshift-sdn: + subnet_gateway=10.2.0.1
Mar 11 16:02:30 master openshift-sdn: + subnet=10.2.0.0/24
Mar 11 16:02:30 master openshift-sdn: + container_network=10.4.0.0/16
Mar 11 16:02:30 master openshift-sdn: + subnet_mask_len=24
Mar 11 16:02:30 master openshift-sdn: + printf 'Container network is "%s"; local host has subnet "%s" and gateway "%s".\n' 10.4.0.0/16 10.2.0.0/24 10.2.0.1
Mar 11 16:02:30 master openshift-sdn: Container network is "10.4.0.0/16"; local host has subnet "10.2.0.0/24" and gateway "10.2.0.1".
...
Mar 11 16:02:30 master openshift-sdn: + ip route del 10.2.0.0/24 dev lbr0 proto kernel scope link src 10.2.0.1
Mar 11 16:02:30 master openshift-sdn: + ip route add 10.4.0.0/16 dev lbr0 proto kernel scope link src 10.2.0.1

And the new created pod was allocated a 10.2.0.x address.

Also checked etcd, found the subnets content was not changed after updating container-network configuration. 
[root@openshift-v3 etcd-v2.0.4-linux-amd64]# ./etcdctl get /registry/sdn/subnets/master
{"Minion":"192.168.0.183","Sub":"10.2.0.0/24"}


So put it back as ASSIGNED. If there's anything I missed, pls don't hesitate to let me know.

Comment 4 Brenton Leanhardt 2015-03-13 15:40:31 UTC

Mrunal,

How hard would it be to add the functionality Gaoyun is describing?

Comment 5 Mrunal Patel 2015-03-13 17:08:49 UTC

Brenton,
This isn't really a bug but we can add a clear flag that will allow one to reset etcd. The user would then have to restart the openshift-sdn processes.

Comment 6 Brenton Leanhardt 2015-03-13 18:00:16 UTC

Yeah, that's exactly what we'd like to see.

Comment 7 Scott Dodson 2015-03-13 18:03:08 UTC

Mrunal, Rajat,

Does this not require the entire cluster to be restarted in order to ensure proper network connectivity after changing the subnet?

Comment 8 Rajat Chopra 2015-03-13 18:27:29 UTC

Yes. Scott.

Brenton and I had a chat, and this is what we are going to do:

1. If the daemon is started with a different subnet than it was started earlier with (unless its the first run), then we give an error about it and quit. We also point to the correct method of changing subnets midway.

2. The correct method will be a combination of manual steps, and possibly a helper script. The script will clear up the pre-stored config from etcd, and the manual steps will ask the user to restart all pods, after restarting the master daemon and all the node-daemons.

The admin should expect that the pods will have their network broken while the steps to change the existing subnets are being carried out.

Comment 9 Scott Dodson 2015-03-13 18:42:41 UTC

Sounds good, thanks.

Comment 10 Scott Dodson 2015-06-30 17:59:36 UTC

Rajat,

Do we have docs on how to reconfigure the sdn subnets? I couldn't find any.

--
Scott

Comment 12 Dan Winship 2016-02-16 21:06:05 UTC

https://github.com/openshift/openshift-docs/pull/1602

Comment 13 Dan Winship 2016-02-26 13:18:53 UTC

the code side of this has been fixed for a while and the docs side is now committed

Note You need to log in before you can comment on or make changes to this bug.