Created attachment 1338222 [details] debug_info Description of problem: Set the egressIPs to the node as egress node, set the egressIPs to netnamespace. Create pods in the namespace. Try to access the network outside the cluster. The pod on the egress node can access the outside with the egressIP as source IP. The pod on the other node will lose the outside network connection. Version-Release number of selected component (if applicable): v3.7.0-0.150.0 How reproducible: always Steps to Reproduce: 1. Setup multi node evn with multitenant plugin 2. Patch the node to add the egressIP # oc patch hostsubnet ose-node2.bmeng.local -p '{"egressIPs":["10.66.140.100"]}' 3. Patch the netnamespace to use the egressIP # oc patch netnamespace u1p1 -p '{"egressIPs": ["10.66.140.100"]}' 4. Create pod in the namespace and make sure the pods landed on each node 5. Try to access the network outside the cluster with all the pods Actual results: The pod on the egress node can access the outside with the egressIP as source IP. The pod on the other node will lose the outside network connection. Expected results: All the pods should be able to access outside by using the egressIP as the source IP. Additional info: Full node info attached in attachment. Try some debug steps, seems the packet was dropped directly when traverse to the egressNode. [root@ose-node1 ~]# ovs-appctl ofproto/trace br0 "in_port=3,ip,nw_dst=10.66.141.175,nw_src=10.129.0.30,ct_state(trk)" Flow: ct_state=trk,ip,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.0.30,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0 bridge("br0") ------------- 0. ip, priority 100 goto_table:20 20. ip,in_port=3,nw_src=10.129.0.30, priority 100 load:0xd09380->NXM_NX_REG0[] goto_table:21 21. priority 0 goto_table:30 30. ip, priority 0 goto_table:100 100. ip,reg0=0xd09380, priority 100 move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31] -> NXM_NX_TUN_ID[0..31] is now 0xd09380 set_field:10.66.140.41->tun_dst output:1 -> output to kernel tunnel Final flow: ct_state=trk,ip,reg0=0xd09380,tun_src=0.0.0.0,tun_dst=10.66.140.41,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_flags=0,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.0.30,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0 Megaflow: recirc_id=0,ip,tun_id=0/0xffffffff,tun_dst=0.0.0.0,in_port=3,nw_src=10.129.0.30,nw_dst=10.0.0.0/9,nw_ecn=0,nw_frag=no Datapath actions: set(tunnel(tun_id=0xd09380,dst=10.66.140.41,ttl=64,tp_dst=4789,flags(df|key))),1 [root@ose-node2 ~]# ovs-appctl ofproto/trace br0 "in_port=1,ip,tun_dst=10.66.140.41,nw_src=10.129.0.30,nw_dst=10.66.141.175,tun_id=0xd09380,ct_state(trk)" Flow: ct_state=trk,ip,tun_src=0.0.0.0,tun_dst=10.66.140.41,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_flags=0,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.0.30,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0 bridge("br0") ------------- 0. in_port=1, priority 150 drop Final flow: unchanged Megaflow: recirc_id=0,ip,tun_id=0xd09380,tun_src=0.0.0.0,tun_dst=10.66.140.41,tun_tos=0,tun_flags=-df-csum-key,in_port=1,nw_dst=10.0.0.0/9,nw_frag=no Datapath actions: drop
Pod on the egressNode can access outside: [root@ose-node2 ~]# ovs-appctl ofproto/trace br0 "in_port=3,ip,nw_dst=10.66.141.175,nw_src=10.128.0.27,ct_state(trk)" Flow: ct_state=trk,ip,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.128.0.27,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0 bridge("br0") ------------- 0. ip, priority 100 goto_table:20 20. ip,in_port=3,nw_src=10.128.0.27, priority 100 load:0xd09380->NXM_NX_REG0[] goto_table:21 21. priority 0 goto_table:30 30. ip, priority 0 goto_table:100 100. ip,reg0=0xd09380, priority 100 set_field:0xa428c64->pkt_mark output:2 Final flow: pkt_mark=0xa428c64,ct_state=trk,ip,reg0=0xd09380,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.128.0.27,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0 Megaflow: pkt_mark=0,recirc_id=0,ip,in_port=3,nw_src=10.128.0.27,nw_dst=10.0.0.0/9,nw_frag=no Datapath actions: set(skb_mark(0xa428c64)),2
> [root@ose-node2 ~]# ovs-appctl ofproto/trace br0 "in_port=1,ip,tun_dst=10.66.140.41,nw_src=10.129.0.30,nw_dst=10.66.141.175,tun_id=0xd09380,ct_state(trk)" FWIW you need to specify tun_src=NODE1-IP as well, or the OVS rules will consider it to be a spoofed vxlan packet. But it would still fail here and I can see why from the OVS flows.
https://github.com/openshift/origin/pull/16866
Tested on ocp v3.7.0-0.174.0 The pod on other node still not work. From the ovs-appctl output, the packet can be sent out from the tun0 on the egress node. But seems the sdn cannot handle the reply from the remote server. On other node: [root@ose-node2 ~]# ovs-appctl ofproto/trace br0 "in_port=3,ip,nw_dst=10.66.141.175,nw_src=10.129.0.2,ct_state(trk)" Flow: ct_state=trk,ip,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.0.2,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0 bridge("br0") ------------- 0. ip, priority 100 goto_table:20 20. ip,in_port=3,nw_src=10.129.0.2, priority 100 load:0xae6535->NXM_NX_REG0[] goto_table:21 21. priority 0 goto_table:30 30. ip, priority 0 goto_table:100 100. ip,reg0=0xae6535, priority 100 move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31] -> NXM_NX_TUN_ID[0..31] is now 0xae6535 set_field:10.66.140.199->tun_dst output:1 -> output to kernel tunnel Final flow: ct_state=trk,ip,reg0=0xae6535,tun_src=0.0.0.0,tun_dst=10.66.140.199,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_flags=0,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.0.2,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0 Megaflow: recirc_id=0,ip,tun_id=0/0xffffffff,tun_dst=0.0.0.0,in_port=3,nw_src=10.129.0.2,nw_dst=10.0.0.0/9,nw_ecn=0,nw_frag=no Datapath actions: set(tunnel(tun_id=0xae6535,dst=10.66.140.199,ttl=64,tp_dst=4789,flags(df|key))),1 On egress node: [root@ose-node1 ~]# ovs-appctl ofproto/trace br0 "in_port=1,ip,tun_src=10.66.140.15,tun_dst=10.66.140.199,tun_id=0xae6535,nw_dst=10.66.141.175,nw_src=10.129.0.2,ct_state(trk)" Flow: ct_state=trk,ip,tun_src=10.66.140.15,tun_dst=10.66.140.199,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_flags=0,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.0.2,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0 bridge("br0") ------------- 0. ip,in_port=1,nw_src=10.128.0.0/14, priority 200 move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[] -> NXM_NX_REG0[] is now 0xae6535 goto_table:10 10. tun_src=10.66.140.15, priority 100 goto_table:30 30. ip, priority 0 goto_table:100 100. ip,reg0=0xae6535, priority 100 set_field:0xa428c64->pkt_mark output:2 Final flow: pkt_mark=0xa428c64,ct_state=trk,ip,reg0=0xae6535,tun_src=10.66.140.15,tun_dst=10.66.140.199,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_flags=0,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.0.2,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0 Megaflow: pkt_mark=0,recirc_id=0,ip,tun_id=0xae6535,tun_src=10.66.140.15,tun_dst=10.66.140.199,tun_tos=0,tun_flags=-df-csum-key,in_port=1,nw_src=10.128.0.0/14,nw_dst=10.0.0.0/9,nw_frag=no Datapath actions: set(skb_mark(0xa428c64)),2 Tcpdump on the egress node when ping outside from the pod on other node: [root@ose-node1 ~]# tcpdump -i vxlan_sys_4789 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vxlan_sys_4789, link-type EN10MB (Ethernet), capture size 262144 bytes 14:46:11.577848 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 27, length 64 14:46:12.577980 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 28, length 64 14:46:13.578071 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 29, length 64 14:46:14.578330 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 30, length 64 14:46:15.578413 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 31, length 64 14:46:16.578575 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 32, length 64 14:46:17.578729 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 33, length 64 [root@ose-node1 ~]# tcpdump -i tun0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on tun0, link-type EN10MB (Ethernet), capture size 262144 bytes 14:51:05.629892 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 321, length 64 14:51:06.630101 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 322, length 64 14:51:07.630250 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 323, length 64 14:51:08.630456 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 324, length 64 14:51:09.630562 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 325, length 64 14:51:10.630758 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 326, length 64
Reply on the egress node: [root@ose-node1 ~]# ovs-appctl ofproto/trace br0 "in_port=2,ip,ct_state(trk),nw_dst=10.129.0.2,nw_src=10.66.141.175,tun_id=0xae6535" Flow: ct_state=trk,ip,tun_id=0xae6535,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.66.141.175,nw_dst=10.129.0.2,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0 bridge("br0") ------------- 0. ip,in_port=2, priority 200 goto_table:30 30. ip,nw_dst=10.128.0.0/14, priority 100 goto_table:90 90. ip,nw_dst=10.129.0.0/23, priority 100 move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31] -> NXM_NX_TUN_ID[0..31] is now 0 set_field:10.66.140.15->tun_dst output:1 -> output to kernel tunnel Final flow: ct_state=trk,ip,tun_src=0.0.0.0,tun_dst=10.66.140.15,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_flags=0,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.66.141.175,nw_dst=10.129.0.2,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0 Megaflow: recirc_id=0,ip,tun_id=0xae6535/0xffffffff,tun_dst=0.0.0.0,in_port=2,nw_dst=10.129.0.0/23,nw_ecn=0,nw_frag=no Datapath actions: set(tunnel(tun_id=0x0,dst=10.66.140.15,ttl=64,tp_dst=4789,flags(df|key))),1 Reply on the other node: [root@ose-node2 ~]# ovs-appctl ofproto/trace br0 "in_port=1,ip,nw_dst=10.129.0.2,nw_src=10.66.141.175,tun_src=10.66.140.199,tun_dst=10.66.140.15,tun_id=0xae6535,ct_state(trk)" Flow: ct_state=trk,ip,tun_src=10.66.140.199,tun_dst=10.66.140.15,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_flags=0,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.66.141.175,nw_dst=10.129.0.2,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0 bridge("br0") ------------- 0. in_port=1, priority 150 drop Final flow: unchanged Megaflow: recirc_id=0,ip,tun_id=0xae6535,tun_src=10.66.140.199,tun_dst=10.66.140.15,tun_tos=0,tun_flags=-df-csum-key,in_port=1,nw_src=10.0.0.0/9,nw_frag=no I am not sure if the packet info are correct in my simulation, the packet has been dropped directly when the reply reached the non-egress node. Datapath actions: drop
@danw, hope you did not miss this. This is still blocking some of the testing.
Yeah, sorry, didn't realize the bug hadn't been updated at all; there's something going wrong with the packets in the kernel and we're still debugging
https://github.com/openshift/origin/pull/17099
Test on build v3.7.0-0.194.0 Issue has been fixed. All the pods can access outside through the egressIP.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188