Description: New pod created on enlarged ClusterNetworkCIDR node get an incorrect IP which doesn't belong to the node subnet Version-Release number of selected component (if applicable): oc v1.1.3-267-g0842757 kubernetes v1.2.0-alpha.7-703-gbc4550d How reproducible: Always Steps: 1. Setup a new multi-node env , the default clusterNetworkCIDR: 10.1.0.0/16 [root@openshift-106 ~]# oc get hostsubnet NAME HOST HOST IP SUBNET openshift-106.lab.sjc.redhat.com openshift-106.lab.sjc.redhat.com 10.14.6.106 10.1.2.0/24 openshift-113.lab.sjc.redhat.com openshift-113.lab.sjc.redhat.com 10.14.6.113 10.1.1.0/24 openshift-125.lab.sjc.redhat.com openshift-125.lab.sjc.redhat.com 10.14.6.125 10.1.0.0/24 2. Delete one node [root@openshift-106 ~]# oc delete node openshift-113.lab.sjc.redhat.com node "openshift-113.lab.sjc.redhat.com" deleted 3. Enlarge the clusterNetwork in master-config.yaml eg, change it to "clusterNetworkCIDR: 10.0.0.0/15" 4. Restart master service and node service on openshift-106.lab.sjc.redhat.com and openshift-125.lab.sjc.redhat.com 5. Restart node service on openshift-113.lab.sjc.redhat.com to add the node [root@openshift-106 ~]# oc get hostsubnet NAME HOST HOST IP SUBNET openshift-106.lab.sjc.redhat.com openshift-106.lab.sjc.redhat.com 10.14.6.106 10.1.2.0/24 openshift-113.lab.sjc.redhat.com openshift-113.lab.sjc.redhat.com 10.14.6.113 10.0.0.0/24 openshift-125.lab.sjc.redhat.com openshift-125.lab.sjc.redhat.com 10.14.6.125 10.1.0.0/24 6. Check the new node's network config [root@openshift-113 ~]# ip a 14: lbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP link/ether c2:c2:40:47:b9:c9 brd ff:ff:ff:ff:ff:ff inet 10.0.0.1/24 scope global lbr0 valid_lft forever preferred_lft forever inet6 fe80::60bb:42ff:fe58:90b7/64 scope link valid_lft forever preferred_lft forever 7. Create pod on the node (you could set --schedulable=false on other nodes) [root@openshift-106 ~]# oc get node NAME STATUS AGE openshift-106.lab.sjc.redhat.com Ready,SchedulingDisabled 1h openshift-113.lab.sjc.redhat.com Ready 13m openshift-125.lab.sjc.redhat.com Ready,SchedulingDisabled 1h [root@openshift-106 ~]# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/pod-for-ping.json -n d1 pod "hello-pod" created 8. Check the pod IP Actual results: The pod IP is 10.1.1.2 which doesn't belong to node subnet. [root@openshift-106 ~]# oc describe pod hello-pod -n d1 Name: hello-pod Namespace: d1 Image(s): bmeng/hello-openshift Node: openshift-113.lab.sjc.redhat.com/10.14.6.113 Start Time: Thu, 25 Feb 2016 12:54:58 +0800 Labels: name=hello-pod Status: Running Reason: Message: IP: 10.1.1.2 Controllers: <none> Containers: hello-pod: Container ID: docker://83e371a7c9b3f60e5edc370a8b2446650a9fc39b298c7b7f98a35d81bc3d7851 Image: bmeng/hello-openshift Image ID: docker://4d2c86486f3ce5f8b3a7c88ebef880740898859e3f68d09e1e2a73c6c1923978 Port: QoS Tier: cpu: BestEffort memory: BestEffort State: Running Started: Thu, 25 Feb 2016 12:55:12 +0800 Ready: True Restart Count: 0 Environment Variables: Conditions: Type Status Ready True Volumes: default-token-xqu8u: Type: Secret (a secret that should populate this volume) SecretName: default-token-xqu8u Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 10m 10m 1 {default-scheduler } Normal Scheduled Successfully assigned hello-pod to openshift-113.lab.sjc.redhat.com 10m 10m 1 {kubelet openshift-113.lab.sjc.redhat.com} spec.containers{hello-pod} Normal Pulling pulling image "bmeng/hello-openshift" 9m 9m 1 {kubelet openshift-113.lab.sjc.redhat.com} spec.containers{hello-pod} Normal Pulled Successfully pulled image "bmeng/hello-openshift" 9m 9m 1 {kubelet openshift-113.lab.sjc.redhat.com} spec.containers{hello-pod} Normal Created Created container with docker id 83e371a7c9b3 9m 9m 1 {kubelet openshift-113.lab.sjc.redhat.com} spec.containers{hello-pod} Normal Started Started container with docker id 83e371a7c9b3 Expected results: The new created pod's IP should belong to new node subnet
This is not actually related to changing ClusterNetworkCIDR, it's just caused by reusing a previously-deleted node without fully cleaning it up first: 1. Create cluster with a single node 2. oc delete node NAME 3. restart atomic-openshift-node on that node 4. Create a pod --> pod will have a bad IP If you reboot the node rather than just restarting atomic-openshift-node, or if you restart docker after restarting atomic-openshift-node, then things will work. We should handle this case correctly, but it shouldn't block acceptance of the trello card for the ClusterNetworkCIDR feature.
I've migrated this to Trello (https://trello.com/c/lCKMyDfs) since this is not a customer bug so it's simpler to just track it there.
*** Bug 1383261 has been marked as a duplicate of this bug. ***
*** Bug 1468207 has been marked as a duplicate of this bug. ***