Bug 1311849 - Need to restart docker after changing lbr0 address
Summary: Need to restart docker after changing lbr0 address
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Dan Winship
QA Contact: Meng Bo
URL:
Whiteboard:
: 1468207 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-25 08:15 UTC by Yan Du
Modified: 2017-07-06 12:38 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-17 17:35:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yan Du 2016-02-25 08:15:24 UTC
Description:
New pod created on enlarged ClusterNetworkCIDR node get an incorrect IP which doesn't belong to the node subnet

Version-Release number of selected component (if applicable):
oc v1.1.3-267-g0842757
kubernetes v1.2.0-alpha.7-703-gbc4550d

How reproducible:
Always

Steps:
1. Setup a new multi-node env , the default clusterNetworkCIDR: 10.1.0.0/16
[root@openshift-106 ~]# oc get hostsubnet
NAME                               HOST                               HOST IP       SUBNET
openshift-106.lab.sjc.redhat.com   openshift-106.lab.sjc.redhat.com   10.14.6.106   10.1.2.0/24
openshift-113.lab.sjc.redhat.com   openshift-113.lab.sjc.redhat.com   10.14.6.113   10.1.1.0/24
openshift-125.lab.sjc.redhat.com   openshift-125.lab.sjc.redhat.com   10.14.6.125   10.1.0.0/24

2. Delete one node
[root@openshift-106 ~]# oc delete node openshift-113.lab.sjc.redhat.com
node "openshift-113.lab.sjc.redhat.com" deleted

3. Enlarge the clusterNetwork in master-config.yaml
 eg, change it to "clusterNetworkCIDR: 10.0.0.0/15"

4. Restart master service and node service on openshift-106.lab.sjc.redhat.com and openshift-125.lab.sjc.redhat.com

5. Restart node service on openshift-113.lab.sjc.redhat.com to add the node
[root@openshift-106 ~]# oc get hostsubnet
NAME                               HOST                               HOST IP       SUBNET
openshift-106.lab.sjc.redhat.com   openshift-106.lab.sjc.redhat.com   10.14.6.106   10.1.2.0/24
openshift-113.lab.sjc.redhat.com   openshift-113.lab.sjc.redhat.com   10.14.6.113   10.0.0.0/24
openshift-125.lab.sjc.redhat.com   openshift-125.lab.sjc.redhat.com   10.14.6.125   10.1.0.0/24

6. Check the new node's network config
[root@openshift-113 ~]# ip a
14: lbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP 
    link/ether c2:c2:40:47:b9:c9 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.1/24 scope global lbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::60bb:42ff:fe58:90b7/64 scope link 
       valid_lft forever preferred_lft forever

7. Create pod on the node (you could set --schedulable=false on other nodes)
[root@openshift-106 ~]# oc get node
NAME                               STATUS                     AGE
openshift-106.lab.sjc.redhat.com   Ready,SchedulingDisabled   1h
openshift-113.lab.sjc.redhat.com   Ready                      13m
openshift-125.lab.sjc.redhat.com   Ready,SchedulingDisabled   1h
[root@openshift-106 ~]# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/pod-for-ping.json -n d1
pod "hello-pod" created

8. Check the pod IP


Actual results:
The pod IP is 10.1.1.2 which doesn't belong to node subnet.

[root@openshift-106 ~]# oc describe pod hello-pod -n d1
Name:        hello-pod
Namespace:    d1
Image(s):    bmeng/hello-openshift
Node:        openshift-113.lab.sjc.redhat.com/10.14.6.113
Start Time:    Thu, 25 Feb 2016 12:54:58 +0800
Labels:        name=hello-pod
Status:        Running
Reason:        
Message:    
IP:        10.1.1.2
Controllers:    <none>
Containers:
  hello-pod:
    Container ID:    docker://83e371a7c9b3f60e5edc370a8b2446650a9fc39b298c7b7f98a35d81bc3d7851
    Image:        bmeng/hello-openshift
    Image ID:        docker://4d2c86486f3ce5f8b3a7c88ebef880740898859e3f68d09e1e2a73c6c1923978
    Port:        
    QoS Tier:
      cpu:        BestEffort
      memory:        BestEffort
    State:        Running
      Started:        Thu, 25 Feb 2016 12:55:12 +0800
    Ready:        True
    Restart Count:    0
    Environment Variables:
Conditions:
  Type        Status
  Ready     True 
Volumes:
  default-token-xqu8u:
    Type:    Secret (a secret that should populate this volume)
    SecretName:    default-token-xqu8u
Events:
  FirstSeen    LastSeen    Count    From                        SubobjectPath            Type        Reason        Message
  ---------    --------    -----    ----                        -------------            --------    ------        -------
  10m        10m        1    {default-scheduler }                                Normal        Scheduled    Successfully assigned hello-pod to openshift-113.lab.sjc.redhat.com
  10m        10m        1    {kubelet openshift-113.lab.sjc.redhat.com}    spec.containers{hello-pod}    Normal        Pulling        pulling image "bmeng/hello-openshift"
  9m        9m        1    {kubelet openshift-113.lab.sjc.redhat.com}    spec.containers{hello-pod}    Normal        Pulled        Successfully pulled image "bmeng/hello-openshift"
  9m        9m        1    {kubelet openshift-113.lab.sjc.redhat.com}    spec.containers{hello-pod}    Normal        Created        Created container with docker id 83e371a7c9b3
  9m        9m        1    {kubelet openshift-113.lab.sjc.redhat.com}    spec.containers{hello-pod}    Normal        Started        Started container with docker id 83e371a7c9b3


Expected results:
The new created pod's IP should belong to  new node subnet

Comment 1 Dan Winship 2016-02-25 15:32:45 UTC
This is not actually related to changing ClusterNetworkCIDR, it's just caused by reusing a previously-deleted node without fully cleaning it up first:

  1. Create cluster with a single node
  2. oc delete node NAME
  3. restart atomic-openshift-node on that node
  4. Create a pod

  --> pod will have a bad IP

If you reboot the node rather than just restarting atomic-openshift-node, or if you restart docker after restarting atomic-openshift-node, then things will work.

We should handle this case correctly, but it shouldn't block acceptance of the trello card for the ClusterNetworkCIDR feature.

Comment 2 Dan Winship 2016-05-17 17:35:32 UTC
I've migrated this to Trello (https://trello.com/c/lCKMyDfs) since this is not a customer bug so it's simpler to just track it there.

Comment 3 Dan Winship 2016-10-12 17:58:40 UTC
*** Bug 1383261 has been marked as a duplicate of this bug. ***

Comment 4 Dan Winship 2017-07-06 12:38:59 UTC
*** Bug 1468207 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.