Bug 1501876 - [hwivBoNF] The pod on the other node will lose the outside connection when enable the egressIP
Summary: [hwivBoNF] The pod on the other node will lose the outside connection when en...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 3.7.0
Assignee: Dan Winship
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-13 11:56 UTC by Meng Bo
Modified: 2017-11-28 22:17 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-11-28 22:17:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
debug_info (14.90 KB, text/plain)
2017-10-13 11:56 UTC, Meng Bo
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Origin (Github) 17099 0 None None None 2017-11-01 12:38:21 UTC
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Meng Bo 2017-10-13 11:56:33 UTC
Created attachment 1338222 [details]
debug_info

Description of problem:
Set the egressIPs to the node as egress node, set the egressIPs to netnamespace. Create pods in the namespace. Try to access the network outside the cluster.
The pod on the egress node can access the outside with the egressIP as source IP.
The pod on the other node will lose the outside network connection.


Version-Release number of selected component (if applicable):
v3.7.0-0.150.0

How reproducible:
always

Steps to Reproduce:
1. Setup multi node evn with multitenant plugin
2. Patch the node to add the egressIP
# oc patch hostsubnet ose-node2.bmeng.local -p '{"egressIPs":["10.66.140.100"]}'
3. Patch the netnamespace to use the egressIP
# oc patch netnamespace u1p1 -p '{"egressIPs": ["10.66.140.100"]}'
4. Create pod in the namespace and make sure the pods landed on each node
5. Try to access the network outside the cluster with all the pods

Actual results:
The pod on the egress node can access the outside with the egressIP as source IP.
The pod on the other node will lose the outside network connection.


Expected results:
All the pods should be able to access outside by using the egressIP as the source IP.

Additional info:
Full node info attached in attachment.

Try some debug steps, seems the packet was dropped directly when traverse to the egressNode.

[root@ose-node1 ~]# ovs-appctl ofproto/trace br0 "in_port=3,ip,nw_dst=10.66.141.175,nw_src=10.129.0.30,ct_state(trk)"
Flow: ct_state=trk,ip,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.0.30,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0

bridge("br0")
-------------
 0. ip, priority 100
    goto_table:20
20. ip,in_port=3,nw_src=10.129.0.30, priority 100
    load:0xd09380->NXM_NX_REG0[]
    goto_table:21
21. priority 0
    goto_table:30
30. ip, priority 0
    goto_table:100
100. ip,reg0=0xd09380, priority 100
    move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31]
     -> NXM_NX_TUN_ID[0..31] is now 0xd09380
    set_field:10.66.140.41->tun_dst
    output:1
     -> output to kernel tunnel

Final flow: ct_state=trk,ip,reg0=0xd09380,tun_src=0.0.0.0,tun_dst=10.66.140.41,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_flags=0,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.0.30,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
Megaflow: recirc_id=0,ip,tun_id=0/0xffffffff,tun_dst=0.0.0.0,in_port=3,nw_src=10.129.0.30,nw_dst=10.0.0.0/9,nw_ecn=0,nw_frag=no
Datapath actions: set(tunnel(tun_id=0xd09380,dst=10.66.140.41,ttl=64,tp_dst=4789,flags(df|key))),1



[root@ose-node2 ~]# ovs-appctl ofproto/trace br0 "in_port=1,ip,tun_dst=10.66.140.41,nw_src=10.129.0.30,nw_dst=10.66.141.175,tun_id=0xd09380,ct_state(trk)"
Flow: ct_state=trk,ip,tun_src=0.0.0.0,tun_dst=10.66.140.41,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_flags=0,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.0.30,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0

bridge("br0")
-------------
 0. in_port=1, priority 150
    drop

Final flow: unchanged
Megaflow: recirc_id=0,ip,tun_id=0xd09380,tun_src=0.0.0.0,tun_dst=10.66.140.41,tun_tos=0,tun_flags=-df-csum-key,in_port=1,nw_dst=10.0.0.0/9,nw_frag=no
Datapath actions: drop

Comment 1 Meng Bo 2017-10-13 11:58:18 UTC
Pod on the egressNode can access outside:

[root@ose-node2 ~]# ovs-appctl ofproto/trace br0 "in_port=3,ip,nw_dst=10.66.141.175,nw_src=10.128.0.27,ct_state(trk)" 
Flow: ct_state=trk,ip,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.128.0.27,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0

bridge("br0")
-------------
 0. ip, priority 100
    goto_table:20
20. ip,in_port=3,nw_src=10.128.0.27, priority 100
    load:0xd09380->NXM_NX_REG0[]
    goto_table:21
21. priority 0
    goto_table:30
30. ip, priority 0
    goto_table:100
100. ip,reg0=0xd09380, priority 100
    set_field:0xa428c64->pkt_mark
    output:2

Final flow: pkt_mark=0xa428c64,ct_state=trk,ip,reg0=0xd09380,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.128.0.27,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
Megaflow: pkt_mark=0,recirc_id=0,ip,in_port=3,nw_src=10.128.0.27,nw_dst=10.0.0.0/9,nw_frag=no
Datapath actions: set(skb_mark(0xa428c64)),2

Comment 2 Dan Winship 2017-10-13 14:18:51 UTC
> [root@ose-node2 ~]# ovs-appctl ofproto/trace br0 "in_port=1,ip,tun_dst=10.66.140.41,nw_src=10.129.0.30,nw_dst=10.66.141.175,tun_id=0xd09380,ct_state(trk)"

FWIW you need to specify tun_src=NODE1-IP as well, or the OVS rules will consider it to be a spoofed vxlan packet. But it would still fail here and I can see why from the OVS flows.

Comment 3 Dan Winship 2017-10-13 16:58:15 UTC
https://github.com/openshift/origin/pull/16866

Comment 4 Meng Bo 2017-10-24 06:52:33 UTC
Tested on ocp v3.7.0-0.174.0

The pod on other node still not work. From the ovs-appctl output, the packet can be sent out from the tun0 on the egress node. But seems the sdn cannot handle the reply from the remote server.


On other node:
[root@ose-node2 ~]# ovs-appctl ofproto/trace br0 "in_port=3,ip,nw_dst=10.66.141.175,nw_src=10.129.0.2,ct_state(trk)"
Flow: ct_state=trk,ip,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.0.2,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0

bridge("br0")
-------------
 0. ip, priority 100
    goto_table:20
20. ip,in_port=3,nw_src=10.129.0.2, priority 100
    load:0xae6535->NXM_NX_REG0[]
    goto_table:21
21. priority 0
    goto_table:30
30. ip, priority 0
    goto_table:100
100. ip,reg0=0xae6535, priority 100
    move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31]
     -> NXM_NX_TUN_ID[0..31] is now 0xae6535
    set_field:10.66.140.199->tun_dst
    output:1
     -> output to kernel tunnel

Final flow: ct_state=trk,ip,reg0=0xae6535,tun_src=0.0.0.0,tun_dst=10.66.140.199,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_flags=0,in_port=3,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.0.2,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
Megaflow: recirc_id=0,ip,tun_id=0/0xffffffff,tun_dst=0.0.0.0,in_port=3,nw_src=10.129.0.2,nw_dst=10.0.0.0/9,nw_ecn=0,nw_frag=no
Datapath actions: set(tunnel(tun_id=0xae6535,dst=10.66.140.199,ttl=64,tp_dst=4789,flags(df|key))),1


On egress node:
[root@ose-node1 ~]# ovs-appctl ofproto/trace br0 "in_port=1,ip,tun_src=10.66.140.15,tun_dst=10.66.140.199,tun_id=0xae6535,nw_dst=10.66.141.175,nw_src=10.129.0.2,ct_state(trk)"
Flow: ct_state=trk,ip,tun_src=10.66.140.15,tun_dst=10.66.140.199,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_flags=0,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.0.2,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0

bridge("br0")
-------------
 0. ip,in_port=1,nw_src=10.128.0.0/14, priority 200
    move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[]
     -> NXM_NX_REG0[] is now 0xae6535
    goto_table:10
10. tun_src=10.66.140.15, priority 100
    goto_table:30
30. ip, priority 0
    goto_table:100
100. ip,reg0=0xae6535, priority 100
    set_field:0xa428c64->pkt_mark
    output:2

Final flow: pkt_mark=0xa428c64,ct_state=trk,ip,reg0=0xae6535,tun_src=10.66.140.15,tun_dst=10.66.140.199,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_flags=0,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.129.0.2,nw_dst=10.66.141.175,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
Megaflow: pkt_mark=0,recirc_id=0,ip,tun_id=0xae6535,tun_src=10.66.140.15,tun_dst=10.66.140.199,tun_tos=0,tun_flags=-df-csum-key,in_port=1,nw_src=10.128.0.0/14,nw_dst=10.0.0.0/9,nw_frag=no
Datapath actions: set(skb_mark(0xa428c64)),2



Tcpdump on the egress node when ping outside from the pod on other node:

[root@ose-node1 ~]# tcpdump -i vxlan_sys_4789
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vxlan_sys_4789, link-type EN10MB (Ethernet), capture size 262144 bytes
14:46:11.577848 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 27, length 64
14:46:12.577980 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 28, length 64
14:46:13.578071 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 29, length 64
14:46:14.578330 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 30, length 64
14:46:15.578413 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 31, length 64
14:46:16.578575 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 32, length 64
14:46:17.578729 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 33, length 64


[root@ose-node1 ~]# tcpdump -i tun0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tun0, link-type EN10MB (Ethernet), capture size 262144 bytes
14:51:05.629892 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 321, length 64
14:51:06.630101 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 322, length 64
14:51:07.630250 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 323, length 64
14:51:08.630456 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 324, length 64
14:51:09.630562 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 325, length 64
14:51:10.630758 IP 10.129.0.2 > dhcp-141-175.nay.redhat.com: ICMP echo request, id 5120, seq 326, length 64

Comment 5 Meng Bo 2017-10-24 07:00:20 UTC
Reply on the egress node:
[root@ose-node1 ~]# ovs-appctl ofproto/trace br0 "in_port=2,ip,ct_state(trk),nw_dst=10.129.0.2,nw_src=10.66.141.175,tun_id=0xae6535"
Flow: ct_state=trk,ip,tun_id=0xae6535,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.66.141.175,nw_dst=10.129.0.2,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0

bridge("br0")
-------------
 0. ip,in_port=2, priority 200
    goto_table:30
30. ip,nw_dst=10.128.0.0/14, priority 100
    goto_table:90
90. ip,nw_dst=10.129.0.0/23, priority 100
    move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31]
     -> NXM_NX_TUN_ID[0..31] is now 0
    set_field:10.66.140.15->tun_dst
    output:1
     -> output to kernel tunnel

Final flow: ct_state=trk,ip,tun_src=0.0.0.0,tun_dst=10.66.140.15,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_flags=0,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.66.141.175,nw_dst=10.129.0.2,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
Megaflow: recirc_id=0,ip,tun_id=0xae6535/0xffffffff,tun_dst=0.0.0.0,in_port=2,nw_dst=10.129.0.0/23,nw_ecn=0,nw_frag=no
Datapath actions: set(tunnel(tun_id=0x0,dst=10.66.140.15,ttl=64,tp_dst=4789,flags(df|key))),1


Reply on the other node:
[root@ose-node2 ~]# ovs-appctl ofproto/trace br0 "in_port=1,ip,nw_dst=10.129.0.2,nw_src=10.66.141.175,tun_src=10.66.140.199,tun_dst=10.66.140.15,tun_id=0xae6535,ct_state(trk)"
Flow: ct_state=trk,ip,tun_src=10.66.140.199,tun_dst=10.66.140.15,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_flags=0,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.66.141.175,nw_dst=10.129.0.2,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0

bridge("br0")
-------------
 0. in_port=1, priority 150
    drop

Final flow: unchanged
Megaflow: recirc_id=0,ip,tun_id=0xae6535,tun_src=10.66.140.199,tun_dst=10.66.140.15,tun_tos=0,tun_flags=-df-csum-key,in_port=1,nw_src=10.0.0.0/9,nw_frag=no



I am not sure if the packet info are correct in my simulation, the packet has been dropped directly when the reply reached the non-egress node.
Datapath actions: drop

Comment 6 Meng Bo 2017-10-30 03:13:09 UTC
@danw, hope you did not miss this.
This is still blocking some of the testing.

Comment 7 Dan Winship 2017-10-30 13:19:43 UTC
Yeah, sorry, didn't realize the bug hadn't been updated at all; there's something going wrong with the packets in the kernel and we're still debugging

Comment 8 Dan Winship 2017-10-31 02:33:40 UTC
https://github.com/openshift/origin/pull/17099

Comment 10 Meng Bo 2017-11-06 10:07:42 UTC
Test on build v3.7.0-0.194.0

Issue has been fixed. All the pods can access outside through the egressIP.

Comment 13 errata-xmlrpc 2017-11-28 22:17:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.