Bug 1257864 - [isolation] Pod can only reach the pods in same/default project on the same node
Summary: [isolation] Pod can only reach the pods in same/default project on the same node
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Networking
Version: 3.x
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Dan Winship
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-08-28 09:34 UTC by Meng Bo
Modified: 2016-08-24 06:17 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-23 21:16:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
bz1257864 (143.86 KB, application/zip)
2015-09-15 11:07 UTC, Meng Bo
no flags Details

Description Meng Bo 2015-08-28 09:34:11 UTC
Description of problem:
Setup multi-node env with redhat/openshift-ovs-multitenant networking, create pods with same user and same project, make sure the pods are scheduled to different nodes.

Cannot reach pods on other nodes with one of the pods.

Version-Release number of selected component (if applicable):
openshift v1.0.5-75-gbc8b6c2
kubernetes v1.1.0-alpha.0-1605-g44c91b1

How reproducible:
always

Steps to Reproduce:
1. Setup multi-node env with redhat/openshift-ovs-multitenant network plugin set on both master and nodes

2. Create multiple pods with same user and in same project but placed on different nodes
$ oc create -f https://raw.githubusercontent.com/bmeng/mytestfiles/master/pod_bmenghelloopenshift.json
$ oc get po -o wide 
NAME          READY     STATUS    RESTARTS   AGE       NODE
hello-pod     1/1       Running   0          7m        node2.bmeng.local
hello-pod-3   1/1       Running   0          12m       master.bmeng.local

3. Get the IP of each pods
$ oc describe po hello-pod | grep IP
IP:                             10.1.2.2
$ oc describe po hello-pod-3 | grep IP
IP:                             10.1.0.2

4. Try to ping other pods in one of them
$ oc rsh hello-pod
bash-4.3$ ping 10.1.0.2
PING 10.1.0.2 (10.1.0.2) 56(84) bytes of data.
From 10.1.2.2 icmp_seq=1 Destination Host Unreachable
From 10.1.2.2 icmp_seq=2 Destination Host Unreachable
From 10.1.2.2 icmp_seq=3 Destination Host Unreachable
From 10.1.2.2 icmp_seq=4 Destination Host Unreachable


Actual results:
Cannot reach the pods in the same project but placed on the different nodes.

Expected results:
Should be able to reach the pods in the same project.


Additional info:
Can reach all the nodes from the pods.
Can reach the pods in default project on the same node.
Cannot reach the pods in different project on the same node .

Comment 1 Meng Bo 2015-08-28 10:03:56 UTC
# ovs-ofctl dump-flows br0 -O OpenFlow13
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=4750.412s, table=0, n_packets=3971, n_bytes=552441, actions=learn(table=7,hard_timeout=900,priority=200,NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_TUN_IPV4_SRC[]->NXM_NX_TUN_IPV4_DST[],output:NXM_OF_IN_PORT[]),goto_table:1
 cookie=0x0, duration=4750.400s, table=1, n_packets=1853, n_bytes=253757, actions=goto_table:3
 cookie=0x0, duration=4750.407s, table=1, n_packets=0, n_bytes=0, in_port=1 actions=goto_table:2
 cookie=0x0, duration=4750.402s, table=1, n_packets=18, n_bytes=1476, in_port=9 actions=goto_table:4
 cookie=0x0, duration=4750.405s, table=1, n_packets=2072, n_bytes=295996, in_port=2 actions=goto_table:4
 cookie=0x0, duration=4750.409s, table=1, n_packets=27, n_bytes=1134, arp actions=goto_table:7
 cookie=0x0, duration=4750.393s, table=2, n_packets=0, n_bytes=0, tun_id=0 actions=goto_table:4
 cookie=0x0, duration=4750.395s, table=2, n_packets=0, n_bytes=0, priority=200,ip,nw_dst=10.1.2.1 actions=output:2
 cookie=0x0, duration=4750.391s, table=2, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.2.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:5
 cookie=0x0, duration=4750.398s, table=2, n_packets=0, n_bytes=0, arp actions=goto_table:7
 cookie=0x4, duration=2637.753s, table=3, n_packets=1818, n_bytes=250725, priority=100,ip,in_port=4,nw_src=10.1.2.3 actions=load:0->NXM_NX_REG0[],goto_table:4
 cookie=0x3, duration=3597.983s, table=3, n_packets=16, n_bytes=1514, priority=100,ip,in_port=3,nw_src=10.1.2.2 actions=load:0xa->NXM_NX_REG0[],goto_table:4
 cookie=0x5, duration=2424.982s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=5,nw_src=10.1.2.4 actions=load:0xb->NXM_NX_REG0[],goto_table:4
 cookie=0x0, duration=4750.388s, table=4, n_packets=588, n_bytes=90093, priority=200,ip,nw_dst=10.1.2.1 actions=output:2
 cookie=0x0, duration=4734.432s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,nw_dst=172.30.0.1,tp_dst=443 actions=output:2
 cookie=0x0, duration=4750.384s, table=4, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/16 actions=goto_table:6
 cookie=0x0, duration=4750.382s, table=4, n_packets=1239, n_bytes=161460, priority=0,ip actions=output:2
 cookie=0x0, duration=4750.386s, table=4, n_packets=2071, n_bytes=296034, priority=150,ip,nw_dst=10.1.2.0/24 actions=goto_table:5
 cookie=0x4, duration=2637.748s, table=5, n_packets=2, n_bytes=196, priority=150,ip,nw_dst=10.1.2.3 actions=output:4
 cookie=0x0, duration=4750.379s, table=5, n_packets=2066, n_bytes=295544, priority=200,ip,reg0=0 actions=goto_table:7
 cookie=0x3, duration=3597.979s, table=5, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xa,nw_dst=10.1.2.2 actions=output:3
 cookie=0x5, duration=2424.980s, table=5, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xb,nw_dst=10.1.2.4 actions=output:5
 cookie=0x0, duration=2638.029s, table=7, n_packets=2056, n_bytes=293938, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:02:03 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:4
 cookie=0x0, duration=3404.996s, table=7, n_packets=4, n_bytes=168, hard_timeout=900, priority=200,dl_dst=1a:37:8b:9e:96:c1 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:2
 cookie=0xa42812e, duration=155.391s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.3.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.129.46->tun_dst,output:1
 cookie=0xa42812d, duration=155.399s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.1.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.129.45->tun_dst,output:1
 cookie=0xa4280e7, duration=155.411s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.128.231->tun_dst,output:1
 cookie=0x0, duration=4750.377s, table=7, n_packets=13, n_bytes=546, priority=0,arp actions=FLOOD
 cookie=0xa42812e, duration=155.387s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.3.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.129.46->tun_dst,output:1
 cookie=0xa42812d, duration=155.395s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.1.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.129.45->tun_dst,output:1
 cookie=0xa4280e7, duration=155.405s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.128.231->tun_dst,output:1

Comment 2 Dan Winship 2015-09-09 18:08:16 UTC
What does dump-flows show on the other node?

Comment 3 Meng Bo 2015-09-10 08:42:07 UTC
Here is the dump on all the 3 nodes.

[root@node1 ~]# ovs-ofctl dump-flows br0 -O OpenFlow13
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=438.386s, table=0, n_packets=65, n_bytes=4694, actions=learn(table=7,hard_timeout=900,priority=200,NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_TUN_IPV4_SRC[]->NXM_NX_TUN_IPV4_DST[],output:NXM_OF_IN_PORT[]),goto_table:1
 cookie=0x0, duration=438.373s, table=1, n_packets=28, n_bytes=2240, actions=goto_table:3
 cookie=0x0, duration=438.380s, table=1, n_packets=0, n_bytes=0, in_port=1 actions=goto_table:2
 cookie=0x0, duration=438.375s, table=1, n_packets=15, n_bytes=1218, in_port=9 actions=goto_table:4
 cookie=0x0, duration=438.377s, table=1, n_packets=8, n_bytes=648, in_port=2 actions=goto_table:4
 cookie=0x0, duration=438.383s, table=1, n_packets=14, n_bytes=588, arp actions=goto_table:7
 cookie=0x0, duration=438.366s, table=2, n_packets=0, n_bytes=0, tun_id=0 actions=goto_table:4
 cookie=0x0, duration=438.368s, table=2, n_packets=0, n_bytes=0, priority=200,ip,nw_dst=10.1.0.1 actions=output:2
 cookie=0x0, duration=438.364s, table=2, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:5
 cookie=0x0, duration=438.371s, table=2, n_packets=0, n_bytes=0, arp actions=goto_table:7
 cookie=0x6, duration=98.117s, table=3, n_packets=1, n_bytes=98, priority=100,ip,in_port=6,nw_src=10.1.0.5 actions=load:0xe->NXM_NX_REG0[],goto_table:4
 cookie=0x5, duration=158.062s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=5,nw_src=10.1.0.4 actions=load:0xd->NXM_NX_REG0[],goto_table:4
 cookie=0x3, duration=309.049s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=3,nw_src=10.1.0.2 actions=load:0xd->NXM_NX_REG0[],goto_table:4
 cookie=0x4, duration=267.077s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=4,nw_src=10.1.0.3 actions=load:0xa->NXM_NX_REG0[],goto_table:4
 cookie=0x0, duration=438.362s, table=4, n_packets=0, n_bytes=0, priority=200,ip,nw_dst=10.1.0.1 actions=output:2
 cookie=0x0, duration=434.169s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.0.1,tp_dst=443 actions=output:2
 cookie=0x0, duration=438.357s, table=4, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/16 actions=goto_table:6
 cookie=0x0, duration=438.355s, table=4, n_packets=0, n_bytes=0, priority=0,ip actions=output:2
 cookie=0x0, duration=438.359s, table=4, n_packets=1, n_bytes=98, priority=150,ip,nw_dst=10.1.0.0/24 actions=goto_table:5
 cookie=0x0, duration=438.352s, table=5, n_packets=0, n_bytes=0, priority=200,ip,reg0=0 actions=goto_table:7
 cookie=0x5, duration=158.060s, table=5, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xd,nw_dst=10.1.0.4 actions=output:5
 cookie=0x4, duration=267.073s, table=5, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xa,nw_dst=10.1.0.3 actions=output:4
 cookie=0x6, duration=98.114s, table=5, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xe,nw_dst=10.1.0.5 actions=output:6
 cookie=0x3, duration=309.047s, table=5, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xd,nw_dst=10.1.0.2 actions=output:3
 cookie=0x0, duration=308.906s, table=7, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:00:02 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:3
 cookie=0x0, duration=438.050s, table=7, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=b2:16:ac:0d:37:5e actions=load:0->NXM_NX_TUN_IPV4_DST[],output:2
 cookie=0x0, duration=158.298s, table=7, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:00:04 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:5
 cookie=0x0, duration=438.053s, table=7, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=96:f0:97:09:cf:3c actions=load:0->NXM_NX_TUN_IPV4_DST[],output:9
 cookie=0x0, duration=97.849s, table=7, n_packets=1, n_bytes=42, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:00:05 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:6
 cookie=0x0, duration=266.610s, table=7, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:00:03 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:4
 cookie=0xa42803e, duration=193.104s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.1.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.128.62->tun_dst,output:1
 cookie=0xa428078, duration=193.110s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.128.120->tun_dst,output:1
 cookie=0x0, duration=438.350s, table=7, n_packets=13, n_bytes=546, priority=0,arp actions=FLOOD
 cookie=0xa428078, duration=193.107s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.128.120->tun_dst,output:1
 cookie=0xa42803e, duration=193.101s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.1.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.128.62->tun_dst,output:1





# ovs-ofctl dump-flows br0 -O OpenFlow13
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=481.317s, table=0, n_packets=57, n_bytes=4490, actions=learn(table=7,hard_timeout=900,priority=200,NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_TUN_IPV4_SRC[]->NXM_NX_TUN_IPV4_DST[],output:NXM_OF_IN_PORT[]),goto_table:1
 cookie=0x0, duration=481.300s, table=1, n_packets=30, n_bytes=2468, actions=goto_table:3
 cookie=0x0, duration=481.310s, table=1, n_packets=0, n_bytes=0, in_port=1 actions=goto_table:2
 cookie=0x0, duration=481.304s, table=1, n_packets=15, n_bytes=1206, in_port=9 actions=goto_table:4
 cookie=0x0, duration=481.308s, table=1, n_packets=8, n_bytes=648, in_port=2 actions=goto_table:4
 cookie=0x0, duration=481.312s, table=1, n_packets=4, n_bytes=168, arp actions=goto_table:7
 cookie=0x0, duration=481.294s, table=2, n_packets=0, n_bytes=0, tun_id=0 actions=goto_table:4
 cookie=0x0, duration=481.296s, table=2, n_packets=0, n_bytes=0, priority=200,ip,nw_dst=10.1.2.1 actions=output:2
 cookie=0x0, duration=481.291s, table=2, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.2.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:5
 cookie=0x0, duration=481.298s, table=2, n_packets=0, n_bytes=0, arp actions=goto_table:7
 cookie=0x6, duration=189.221s, table=3, n_packets=2, n_bytes=196, priority=100,ip,in_port=6,nw_src=10.1.2.5 actions=load:0xe->NXM_NX_REG0[],goto_table:4
 cookie=0x4, duration=340.532s, table=3, n_packets=2, n_bytes=196, priority=100,ip,in_port=4,nw_src=10.1.2.3 actions=load:0xe->NXM_NX_REG0[],goto_table:4
 cookie=0x3, duration=357.328s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=3,nw_src=10.1.2.2 actions=load:0xd->NXM_NX_REG0[],goto_table:4
 cookie=0x5, duration=201.446s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=5,nw_src=10.1.2.4 actions=load:0xe->NXM_NX_REG0[],goto_table:4
 cookie=0x0, duration=481.288s, table=4, n_packets=0, n_bytes=0, priority=200,ip,nw_dst=10.1.2.1 actions=output:2
 cookie=0x0, duration=479.973s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.0.1,tp_dst=443 actions=output:2
 cookie=0x0, duration=481.283s, table=4, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/16 actions=goto_table:6
 cookie=0x0, duration=481.281s, table=4, n_packets=0, n_bytes=0, priority=0,ip actions=output:2
 cookie=0x0, duration=481.286s, table=4, n_packets=4, n_bytes=392, priority=150,ip,nw_dst=10.1.2.0/24 actions=goto_table:5
 cookie=0x0, duration=481.279s, table=5, n_packets=0, n_bytes=0, priority=200,ip,reg0=0 actions=goto_table:7
 cookie=0x3, duration=357.326s, table=5, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xd,nw_dst=10.1.2.2 actions=output:3
 cookie=0x6, duration=189.218s, table=5, n_packets=2, n_bytes=196, priority=100,ip,reg0=0xe,nw_dst=10.1.2.5 actions=output:6
 cookie=0x5, duration=201.443s, table=5, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xe,nw_dst=10.1.2.4 actions=output:5
 cookie=0x4, duration=340.528s, table=5, n_packets=2, n_bytes=196, priority=100,ip,reg0=0xe,nw_dst=10.1.2.3 actions=output:4
 cookie=0x0, duration=340.249s, table=7, n_packets=1, n_bytes=42, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:02:03 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:4
 cookie=0x0, duration=189.130s, table=7, n_packets=2, n_bytes=84, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:02:05 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:6
 cookie=0x0, duration=201.434s, table=7, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:02:04 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:5
 cookie=0x0, duration=356.971s, table=7, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:02:02 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:3
 cookie=0x0, duration=481.242s, table=7, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=9e:b4:a1:f9:d0:dd actions=load:0->NXM_NX_TUN_IPV4_DST[],output:9
 cookie=0x0, duration=481.239s, table=7, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=1e:72:ac:8e:da:69 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:2
 cookie=0xa42803e, duration=238.940s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.1.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.128.62->tun_dst,output:1
 cookie=0xa4281c5, duration=238.948s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.129.197->tun_dst,output:1
 cookie=0x0, duration=481.277s, table=7, n_packets=1, n_bytes=42, priority=0,arp actions=FLOOD
 cookie=0xa42803e, duration=238.930s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.1.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.128.62->tun_dst,output:1
 cookie=0xa4281c5, duration=238.944s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.129.197->tun_dst,output:1





# ovs-ofctl dump-flows br0 -O OpenFlow13
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=510.049s, table=0, n_packets=202, n_bytes=141135, actions=learn(table=7,hard_timeout=900,priority=200,NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_TUN_IPV4_SRC[]->NXM_NX_TUN_IPV4_DST[],output:NXM_OF_IN_PORT[]),goto_table:1
 cookie=0x0, duration=510.035s, table=1, n_packets=92, n_bytes=28647, actions=goto_table:3
 cookie=0x0, duration=510.042s, table=1, n_packets=0, n_bytes=0, in_port=1 actions=goto_table:2
 cookie=0x0, duration=510.037s, table=1, n_packets=25, n_bytes=2106, in_port=9 actions=goto_table:4
 cookie=0x0, duration=510.039s, table=1, n_packets=81, n_bytes=110214, in_port=2 actions=goto_table:4
 cookie=0x0, duration=510.044s, table=1, n_packets=4, n_bytes=168, arp actions=goto_table:7
 cookie=0x0, duration=510.011s, table=2, n_packets=0, n_bytes=0, tun_id=0 actions=goto_table:4
 cookie=0x0, duration=510.013s, table=2, n_packets=0, n_bytes=0, priority=200,ip,nw_dst=10.1.1.1 actions=output:2
 cookie=0x0, duration=510.008s, table=2, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.1.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:5
 cookie=0x0, duration=510.032s, table=2, n_packets=0, n_bytes=0, arp actions=goto_table:7
 cookie=0x5, duration=233.689s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=5,nw_src=10.1.1.4 actions=load:0xd->NXM_NX_REG0[],goto_table:4
 cookie=0x3, duration=360.400s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=3,nw_src=10.1.1.2 actions=load:0xe->NXM_NX_REG0[],goto_table:4
 cookie=0x6, duration=217.424s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=6,nw_src=10.1.1.5 actions=load:0xe->NXM_NX_REG0[],goto_table:4
 cookie=0x7, duration=166.162s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=7,nw_src=10.1.1.6 actions=load:0xe->NXM_NX_REG0[],goto_table:4
 cookie=0x0, duration=510.006s, table=4, n_packets=0, n_bytes=0, priority=200,ip,nw_dst=10.1.1.1 actions=output:2
 cookie=0x0, duration=504.773s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.0.1,tp_dst=443 actions=output:2
 cookie=0x0, duration=510.001s, table=4, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/16 actions=goto_table:6
 cookie=0x0, duration=509.999s, table=4, n_packets=60, n_bytes=26115, priority=0,ip actions=output:2
 cookie=0x0, duration=510.003s, table=4, n_packets=73, n_bytes=109566, priority=150,ip,nw_dst=10.1.1.0/24 actions=goto_table:5
 cookie=0x0, duration=509.996s, table=5, n_packets=73, n_bytes=109566, priority=200,ip,reg0=0 actions=goto_table:7
 cookie=0x5, duration=233.687s, table=5, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xd,nw_dst=10.1.1.4 actions=output:5
 cookie=0x7, duration=166.160s, table=5, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xe,nw_dst=10.1.1.6 actions=output:7
 cookie=0x6, duration=217.421s, table=5, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xe,nw_dst=10.1.1.5 actions=output:6
 cookie=0x3, duration=360.396s, table=5, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xe,nw_dst=10.1.1.2 actions=output:3
 cookie=0x0, duration=166.327s, table=7, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:01:06 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:7
 cookie=0x0, duration=360.998s, table=7, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:01:02 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:3
 cookie=0x0, duration=509.806s, table=7, n_packets=1, n_bytes=42, hard_timeout=900, priority=200,dl_dst=32:f3:29:82:5a:4b actions=load:0->NXM_NX_TUN_IPV4_DST[],output:2
 cookie=0x0, duration=233.862s, table=7, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:01:04 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:5
 cookie=0x0, duration=346.625s, table=7, n_packets=75, n_bytes=109650, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:01:03 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:4
 cookie=0x0, duration=217.586s, table=7, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:01:05 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:6
 cookie=0x0, duration=509.840s, table=7, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=8e:45:f5:21:77:9b actions=load:0->NXM_NX_TUN_IPV4_DST[],output:9
 cookie=0xa428078, duration=22.640s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.128.120->tun_dst,output:1
 cookie=0xa4281c5, duration=22.648s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.129.197->tun_dst,output:1
 cookie=0x0, duration=509.994s, table=7, n_packets=1, n_bytes=42, priority=0,arp actions=FLOOD
 cookie=0xa428078, duration=22.635s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.128.120->tun_dst,output:1
 cookie=0xa4281c5, duration=22.644s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.129.197->tun_dst,output:1

Comment 4 Meng Bo 2015-09-10 08:46:28 UTC
I can also see many errors like below on each node.


[root@node1 ~]# journalctl -lf -u openshift-node
-- Logs begin at Thu 2015-09-10 16:11:52 CST. --
Sep 10 16:40:57 node1.bmeng.local openshift[2208]: E0910 16:40:57.548306    2208 manager.go:313] NetworkPlugin redhat/openshift-ovs-multitenant failed on the status hook for pod 'hello-pod-6' - exit status 1
Sep 10 16:40:57 node1.bmeng.local openshift[2208]: E0910 16:40:57.554118    2208 manager.go:313] NetworkPlugin redhat/openshift-ovs-multitenant failed on the status hook for pod 'hello-pod-4' - exit status 1
Sep 10 16:41:07 node1.bmeng.local openshift[2208]: E0910 16:41:07.527865    2208 manager.go:313] NetworkPlugin redhat/openshift-ovs-multitenant failed on the status hook for pod 'docker-registry-1-ux8aq' - exit status 1
Sep 10 16:41:07 node1.bmeng.local openshift[2208]: E0910 16:41:07.528196    2208 manager.go:313] NetworkPlugin redhat/openshift-ovs-multitenant failed on the status hook for pod 'hello-pod-4' - exit status 1
Sep 10 16:41:07 node1.bmeng.local openshift[2208]: E0910 16:41:07.528472    2208 manager.go:313] NetworkPlugin redhat/openshift-ovs-multitenant failed on the status hook for pod 'hello-pod-6' - exit status 1
Sep 10 16:41:07 node1.bmeng.local openshift[2208]: E0910 16:41:07.528788    2208 manager.go:313] NetworkPlugin redhat/openshift-ovs-multitenant failed on the status hook for pod 'hello-pod-1' - exit status 1
Sep 10 16:41:07 node1.bmeng.local openshift[2208]: E0910 16:41:07.576796    2208 manager.go:313] NetworkPlugin redhat/openshift-ovs-multitenant failed on the status hook for pod 'hello-pod-4' - exit status 1
Sep 10 16:41:07 node1.bmeng.local openshift[2208]: E0910 16:41:07.578263    2208 manager.go:313] NetworkPlugin redhat/openshift-ovs-multitenant failed on the status hook for pod 'docker-registry-1-ux8aq' - exit status 1
Sep 10 16:41:07 node1.bmeng.local openshift[2208]: E0910 16:41:07.585161    2208 manager.go:313] NetworkPlugin redhat/openshift-ovs-multitenant failed on the status hook for pod 'hello-pod-1' - exit status 1
Sep 10 16:41:07 node1.bmeng.local openshift[2208]: E0910 16:41:07.586083    2208 manager.go:313] NetworkPlugin redhat/openshift-ovs-multitenant failed on the status hook for pod 'hello-pod-6' - exit status 1

Comment 5 Dan Winship 2015-09-10 16:44:51 UTC
(In reply to Meng Bo from comment #4)
> I can also see many errors like below on each node.
> 
> 
> [root@node1 ~]# journalctl -lf -u openshift-node
> -- Logs begin at Thu 2015-09-10 16:11:52 CST. --
> Sep 10 16:40:57 node1.bmeng.local openshift[2208]: E0910 16:40:57.548306   
> 2208 manager.go:313] NetworkPlugin redhat/openshift-ovs-multitenant failed
> on the status hook for pod 'hello-pod-6' - exit status 1

Those are harmless and can be ignored (and it's fixed on master).

Comment 6 Dan Winship 2015-09-10 17:41:39 UTC
> hello-pod-3   1/1       Running   0          12m       master.bmeng.local

Oh, is your openshift master also a node? If so, does the problem go away if you don't do that? (ie, have a dedicated master, and use separate machines for nodes)



Other thoughts:

Are the node IP addresses in the dump-flow output as expected? In particular, two of the nodes are on 10.66.128.x, and the third is on 10.66.129.x. Do the nodes have multiple IP addresses, and if so, is it mixing up "eth0" and "eth1" here?

Is the node routing set up correctly with respect to 10.1.x.x vs 10.66.x.x? (In particular, there should not be any route to 10.0.0.0/8.)

On hello-pod, after trying to ping 10.1.0.2, what does "arp -a" show? Also, what is the output of running this as root:

  ovs-appctl ofproto/trace br0 'in_port=3,arp,arp_spa=10.1.1.2,arp_tpa=10.1.0.2'

and on hello-pod-3, what is the output of

  ovs-appctl ofproto/trace br0 'in_port=1,arp,arp_spa=10.1.1.2,arp_tpa=10.1.0.2,tun_id=0xd'

Comment 7 Meng Bo 2015-09-11 11:09:08 UTC
@Dan

I have tried several times, and the master does not role as node in some of them.
The result is the same.


I have 1 master and 3 nodes setup, the nodes IP are:
10.66.128.214   master.bmeng.local 
10.66.129.197   node1.bmeng.local
10.66.128.120   node2.bmeng.local
10.66.128.62    node3.bmeng.local

And each node has only one IP provided by eth0. And the lbr0 config as below:
9: lbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP 
    link/ether 0e:ba:95:24:8b:db brd ff:ff:ff:ff:ff:ff
    inet 10.1.0.1/24 scope global lbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::f4f7:6dff:fe2b:28a9/64 scope link 
       valid_lft forever preferred_lft forever



There are two pods 
NAME          READY     STATUS    RESTARTS   AGE       NODE
hello-2-pod   1/1       Running   0          17s       node1.bmeng.local
hello-pod     1/1       Running   0          23s       node2.bmeng.local


And the IP of them is:
hello-2-pod    10.1.0.3
hello-pod      10.1.1.3


After ping 10.1.0.3 from hello-pod,
bash-4.3$ arp -a 
gateway (10.1.1.1) at b2:05:8c:29:19:bc [ether] on eth0
? (10.1.0.3) at <incomplete> on eth0



[root@node2 ~]# ovs-appctl ofproto/trace br0 'in_port=3,arp,arp_spa=10.1.1.3,arp_tpa=10.1.0.3'
Bridge: br0
Flow: arp,metadata=0,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,arp_spa=10.1.1.3,arp_tpa=10.1.0.3,arp_sha=00:00:00:00:00:00,arp_tha=00:00:00:00:00:00
Rule: table=0 cookie=0 
OpenFlow actions=learn(table=7,hard_timeout=900,priority=200,NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_TUN_IPV4_SRC[]->NXM_NX_TUN_IPV4_DST[],output:NXM_OF_IN_PORT[]),goto_table:1

        Resubmitted flow: unchanged
        Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0
        Resubmitted  odp: drop
        Resubmitted megaflow: recirc_id=0,skb_priority=0,arp,tun_src=0.0.0.0,in_port=3,dl_src=00:00:00:00:00:00
        Rule: table=1 cookie=0 arp
        OpenFlow actions=goto_table:7

                Resubmitted flow: unchanged
                Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0
                Resubmitted  odp: drop
                Resubmitted megaflow: recirc_id=0,skb_priority=0,arp,tun_src=0.0.0.0,in_port=3,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00
                Rule: table=7 cookie=0 priority=0,arp
                OpenFlow actions=FLOOD
                Not tunneling to our own address

Final flow: arp,metadata=0,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,arp_spa=10.1.1.3,arp_tpa=10.1.0.3,arp_sha=00:00:00:00:00:00,arp_tha=00:00:00:00:00:00
Megaflow: recirc_id=0,skb_priority=0,arp,tun_src=0.0.0.0,in_port=3,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00
Datapath actions: 5,1,4,3




[root@node1 ~]#  ovs-appctl ofproto/trace br0 'in_port=3,arp,arp_spa=10.1.0.2,arp_tpa=10.1.2.2,tun_id=0xd'
Bridge: br0
Flow: arp,tun_id=0xd,metadata=0,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,arp_spa=10.1.0.2,arp_tpa=10.1.2.2,arp_sha=00:00:00:00:00:00,arp_tha=00:00:00:00:00:00
Rule: table=0 cookie=0 
OpenFlow actions=learn(table=7,hard_timeout=900,priority=200,NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_TUN_IPV4_SRC[]->NXM_NX_TUN_IPV4_DST[],output:NXM_OF_IN_PORT[]),goto_table:1

        Resubmitted flow: unchanged
        Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0
        Resubmitted  odp: drop
        Resubmitted megaflow: recirc_id=0,skb_priority=0,arp,tun_src=0.0.0.0,in_port=3,dl_src=00:00:00:00:00:00
        Rule: table=1 cookie=0 arp
        OpenFlow actions=goto_table:7

                Resubmitted flow: unchanged
                Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0
                Resubmitted  odp: drop
                Resubmitted megaflow: recirc_id=0,skb_priority=0,arp,tun_src=0.0.0.0,in_port=3,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00
                Rule: table=7 cookie=0 priority=0,arp
                OpenFlow actions=FLOOD
                Not tunneling to our own address

Final flow: arp,tun_id=0xd,metadata=0,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,arp_spa=10.1.0.2,arp_tpa=10.1.2.2,arp_sha=00:00:00:00:00:00,arp_tha=00:00:00:00:00:00
Megaflow: recirc_id=0,skb_priority=0,arp,tun_src=0.0.0.0,in_port=3,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00
Datapath actions: 5,1,4,3

Comment 8 Dan Winship 2015-09-11 14:39:40 UTC
Argh. OK, I need all of the output to come from a single configuration; in this case the IPs are different when you ran ofproto/trace from what they were when you ran dump-flows, so they're not really comparable. Also, I realize I should have suggested a different packet to trace.

So, I need: (where ${IP1} is the IP address of the first container, and ${IP2} is the IP address of the second container)

  # In the first container:
  ip a
  ping ${IP2}
  arp -a

  # On the first container's node, not in the container
  ip a
  ip r
  arp -a
  ovs-ofctl dump-flows br0 -O OpenFlow13

  c1data=$(ovs-ofctl dump-flows br0 -O OpenFlow13 | grep "nw_dst=${IP1}")
  TUN_ID=$(echo $c1data | sed -e 's/.*reg0=\([^,]*\).*/\1/')
  IN_PORT=$(echo $c1data | sed -e 's/.*output://')

  ovs-appctl ofproto/trace br0 "in_port=${IN_PORT},ip,nw_src=${IP1},nw_dst=${IP2}"
  ovs-appctl ofproto/trace br0 "in_port=1,tun_id=${TUN_ID},ip,nw_src=${IP2},nw_dst=${IP1}"

  # In the second container:
  ip a
  ping ${IP1}
  arp -a

  # On the second container's node, not in the container
  ip a
  ip r
  arp -a
  ovs-ofctl dump-flows br0 -O OpenFlow13

  c2data=$(ovs-ofctl dump-flows br0 -O OpenFlow13 | grep "nw_dst=${IP2}")
  TUN_ID=$(echo $c2data | sed -e 's/.*reg0=\([^,]*\).*/\1/')
  IN_PORT=$(echo $c2data | sed -e 's/.*output://')

  ovs-appctl ofproto/trace br0 "in_port=${IN_PORT},ip,nw_src=${IP2},nw_dst=${IP1}"
  ovs-appctl ofproto/trace br0 "in_port=1,tun_id=${TUN_ID},ip,nw_src=${IP1},nw_dst=${IP2}"

Comment 9 Dan Winship 2015-09-11 16:46:33 UTC
also, the output of "journalctl -t openshift" on the master and one of the nodes

Comment 11 Meng Bo 2015-09-15 11:07:18 UTC
Created attachment 1073591 [details]
bz1257864

Hi Dan,
I have added all the outputs you mentioned in comment#8 and comment#9.

Thanks.

Comment 13 Dan Winship 2015-09-15 20:43:49 UTC
(In reply to Meng Bo from comment #12)
> And for openshift-sdn, I just git clone the openshift-sdn repo and checkout
> the multitenant branch, then install the scripts via make.

UGH! Sorry, the multitenant branch was merged two months ago and should have been deleted. And stuff has changed since then which makes origin master incompatible with that code. You need to be using openshift-sdn git master.

Comment 14 Meng Bo 2015-09-16 08:43:10 UTC
Ahh..

It works after install the openshift-sdn with master branch... 

The pods in same project can reach each other now...

It pains me at least two weeks...

I get the method from vagrant setup script several week ago, and did not notice that the script has been updated...

Can you please move the bug to ON_QA, then I can close it.

BTW, I found the vagrant will install the openshift-sdn from Godeps/_workspace/src/openshift/openshift-sdn directly for now, and it was using the openshift-sdn which git cloned from github, What is the difference between this two methods? and which one is preferred for our testing? 

Thanks.

Comment 15 Dan Winship 2015-09-16 13:12:35 UTC
(In reply to Meng Bo from comment #14)
> BTW, I found the vagrant will install the openshift-sdn from
> Godeps/_workspace/src/openshift/openshift-sdn directly for now, and it was
> using the openshift-sdn which git cloned from github, What is the difference
> between this two methods? and which one is preferred for our testing? 

So, origin always compiles in the openshift-sdn code from Godeps/, so by using the scripts from there as well, it guarantees that you always use scripts that are in sync with the code. So that's the preferred way of testing I guess. You don't need to manually install openshift-sdn at all.

Comment 16 Meng Bo 2015-09-17 02:13:05 UTC
@Dan,

Thanks.

Close this bug.


Note You need to log in before you can comment on or make changes to this bug.