Description of problem: When sending packets which are part of an existing conntrack session, invalid packets will fall through the OpenFlow pipeline and still be sent to the destination. Consider the following: client ----> k8s service ----server (OVN load balancer) client opens a TCP connection with the k8s service (ovn load balancer VIP). OVN DNATs the packet and sends it to the backend server. The server responds and the connection is established. The server then sends a packet with an invalid sequence number. The packet is sent to CT as it enters OVN, CT marks the packet as invalid and sends it to table 43. However table 43 only has flows matching on positive CT states (not +inv). Therefore the packet falls through to the default priority 0 flows in the end of table 43 and the packet ends up getting sent out of the pipeline back to the client. The client then receives a packet that was never unDNAT'ed from the server, and responds with a TCP RST. Table 43 should have a high priority flow to match on +inv+trk and drop the packet: cookie=0x4f0293c3, duration=22115.967s, table=43, n_packets=0, n_bytes=0, idle_age=22115, priority=65534,ct_state=-new+est-rel-inv+trk,ct_label=0x2/0x2,metadata=0x5 actions=load:0x1->NXM_NX_XXREG0[98],resubmit(,44) cookie=0xad5674dc, duration=22115.966s, table=43, n_packets=0, n_bytes=0, idle_age=22115, priority=65534,ct_state=-new+est-rel-inv+trk,ct_label=0x2/0x2,metadata=0x3 actions=load:0x1->NXM_NX_XXREG0[98],resubmit(,44) cookie=0xfa6f76a2, duration=22115.965s, table=43, n_packets=788, n_bytes=108679, idle_age=1730, priority=65534,ct_state=-new+est-rel-inv+trk,ct_label=0x2/0x2,metadata=0x4 actions=load:0x1->NXM_NX_XXREG0[98],resubmit(,44) ofproto trace showing the invalid packet making it to the client (port31): Flow: tcp,in_port=30,vlan_tci=0x0000,dl_src=0a:58:0a:f4:01:1b,dl_dst=0a:58:0a:f4:01:1c,nw_src=10.244.1.27,nw_dst=10.244.1.28,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=9000,tp_dst=3131,tcp_flags=0 bridge("br-int") ---------------- 0. in_port=30, priority 100, cookie 0xd6e5226c set_field:0x1f->reg13 set_field:0x8->reg11 set_field:0x3->reg12 set_field:0x4->metadata set_field:0x4->reg14 resubmit(,8) 8. reg14=0x4,metadata=0x4,dl_src=0a:58:0a:f4:01:1b, priority 50, cookie 0x89bba098 resubmit(,9) 9. ip,reg14=0x4,metadata=0x4,dl_src=0a:58:0a:f4:01:1b,nw_src=10.244.1.27, priority 90, cookie 0x68f65df9 resubmit(,10) 10. metadata=0x4, priority 0, cookie 0xbd3212b resubmit(,11) 11. metadata=0x4, priority 0, cookie 0x1105bf77 resubmit(,12) 12. metadata=0x4, priority 0, cookie 0x98ecb9aa resubmit(,13) 13. metadata=0x4, priority 0, cookie 0xf7f53d2b resubmit(,14) 14. metadata=0x4, priority 0, cookie 0xe038fa3f resubmit(,15) 15. metadata=0x4, priority 0, cookie 0x99688f3a resubmit(,16) 16. metadata=0x4, priority 0, cookie 0xd3a49ca9 resubmit(,17) 17. metadata=0x4, priority 0, cookie 0xd1edadb resubmit(,18) 18. metadata=0x4, priority 0, cookie 0x37209040 resubmit(,19) 19. metadata=0x4, priority 0, cookie 0x62776d1b resubmit(,20) 20. metadata=0x4, priority 0, cookie 0xc2cc5b08 resubmit(,21) 21. metadata=0x4, priority 0, cookie 0xf759e368 resubmit(,22) 22. metadata=0x4, priority 0, cookie 0x8b4faee3 resubmit(,23) 23. metadata=0x4, priority 0, cookie 0xa73b5704 resubmit(,24) 24. metadata=0x4, priority 0, cookie 0xab02af23 resubmit(,25) 25. metadata=0x4, priority 0, cookie 0xb8b6401d resubmit(,26) 26. metadata=0x4, priority 0, cookie 0x2dc0adf6 resubmit(,27) 27. metadata=0x4,dl_dst=0a:58:0a:f4:01:1c, priority 50, cookie 0x58d49bbf set_field:0x5->reg15 resubmit(,32) 32. priority 0 resubmit(,33) 33. reg15=0x5,metadata=0x4, priority 100 set_field:0x20->reg13 set_field:0x8->reg11 set_field:0x3->reg12 resubmit(,34) 34. priority 0 set_field:0->reg0 set_field:0->reg1 set_field:0->reg2 set_field:0->reg3 set_field:0->reg4 set_field:0->reg5 set_field:0->reg6 set_field:0->reg7 set_field:0->reg8 set_field:0->reg9 resubmit(,40) 40. ip,metadata=0x4, priority 100, cookie 0x2ec3d7b9 set_field:0x1000000000000000000000000/0x1000000000000000000000000->xxreg0 resubmit(,41) 41. metadata=0x4, priority 0, cookie 0xe46d6be1 resubmit(,42) 42. ip,reg0=0x1/0x1,metadata=0x4, priority 100, cookie 0xcdf67991 ct(table=43,zone=NXM_NX_REG13[0..15]) drop -> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 43. -> Sets the packet to an untracked state, and clears all the conntrack fields. Final flow: tcp,reg0=0x1,reg11=0x8,reg12=0x3,reg13=0x20,reg14=0x4,reg15=0x5,metadata=0x4,in_port=30,vlan_tci=0x0000,dl_src=0a:58:0a:f4:01:1b,dl_dst=0a:58:0a:f4:01:1c,nw_src=10.244.1.27,nw_dst=10.244.1.28,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=9000,tp_dst=3131,tcp_flags=0 Megaflow: recirc_id=0,ct_state=-new-est-rel-inv-trk,ct_label=0/0x2,eth,ip,in_port=30,vlan_tci=0x0000/0x1000,dl_src=0a:58:0a:f4:01:1b,dl_dst=0a:58:0a:f4:01:1c,nw_src=10.244.1.27,nw_dst=10.244.1.28/30,nw_frag=no Datapath actions: ct(zone=32),recirc(0x104) =============================================================================== recirc(0x104) - resume conntrack with ct_state=inv|trk =============================================================================== Flow: recirc_id=0x104,ct_state=inv|trk,ct_zone=32,eth,tcp,reg0=0x1,reg11=0x8,reg12=0x3,reg13=0x20,reg14=0x4,reg15=0x5,metadata=0x4,in_port=30,vlan_tci=0x0000,dl_src=0a:58:0a:f4:01:1b,dl_dst=0a:58:0a:f4:01:1c,nw_src=10.244.1.27,nw_dst=10.244.1.28,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=9000,tp_dst=3131,tcp_flags=0 bridge("br-int") ---------------- thaw Resuming from table 43 43. metadata=0x4, priority 0, cookie 0x9e861491 resubmit(,44) 44. metadata=0x4, priority 0, cookie 0xcd7b2b8d resubmit(,45) 45. metadata=0x4, priority 0, cookie 0x26b52d0e resubmit(,46) 46. metadata=0x4, priority 0, cookie 0x184ffe2f resubmit(,47) 47. metadata=0x4, priority 0, cookie 0xac3acd4 resubmit(,48) 48. ip,reg15=0x5,metadata=0x4,dl_dst=0a:58:0a:f4:01:1c,nw_dst=10.244.1.28, priority 90, cookie 0x719d0c53 resubmit(,49) 49. reg15=0x5,metadata=0x4,dl_dst=0a:58:0a:f4:01:1c, priority 50, cookie 0xd19055ca resubmit(,64) 64. priority 0 resubmit(,65) 65. reg15=0x5,metadata=0x4, priority 100, cookie 0xe269684c output:31 Final flow: unchanged Megaflow: recirc_id=0x104,ct_state=-new-est-rel+inv+trk,ct_label=0/0x2,eth,ip,in_port=30,dl_src=0a:58:0a:f4:01:1b,dl_dst=0a:58:0a:f4:01:1c,nw_src=10.244.1.16/28,nw_dst=10.244.1.28,nw_frag=no Datapath actions: 10
Created attachment 1711943 [details] OVN DBs
Numan and I discussed that we simply cannot add a flow to match +inv and drop. This is because all return traffic will be sent to CT, but only some CT traffic may really be part of a previous session. Therefore it will be marked as inv and should be sent on its way and not dropped. To put it more simply we need a way to differentiate invalid packets that were part of a session vs invalid packets that are invalid because CT has no current session. To do this we can leverage CT_MARK and CT_LABEL to match on as well, which will be zero if there was no previous session for this traffic.
Submitted the patch to fix this issue by sending all the packets to conntrack if LB is associated with a logical switch. Not good interms of performance if a logical switch has no ACLs with allow-related configured. https://patchwork.ozlabs.org/project/ovn/patch/20200907124320.830247-1-numans@ovn.org/ But it needs to be accurate.
setup lb with following script: server: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:1.1.23.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.23.25 systemctl restart ovn-controller ip netns add server0 ip link add veth0_s0 netns server0 type veth peer name veth0_s0_p ip netns exec server0 ip link set lo up ip netns exec server0 ip link set veth0_s0 up ip netns exec server0 ip link set veth0_s0 address 00:00:00:01:01:02 ip netns exec server0 ip addr add 192.168.1.1/24 dev veth0_s0 ip netns exec server0 ip -6 addr add 2001::1/64 dev veth0_s0 ip netns exec server0 ip route add default via 192.168.1.254 dev veth0_s0 ip netns exec server0 ip -6 route add default via 2001::a dev veth0_s0 ovs-vsctl add-port br-int veth0_s0_p ip link set veth0_s0_p up ovs-vsctl set interface veth0_s0_p external_ids:iface-id=ls1p1 ovn-nbctl ls-add ls1 ovn-nbctl lsp-add ls1 ls1p1 #ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:02 2001::1 192.168.1.1" ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:02 192.168.1.1 2001::1" ovn-nbctl lsp-add ls1 ls1p2 ovn-nbctl lsp-set-addresses ls1p2 "00:00:00:01:02:02 192.168.1.2 2001::2" ovn-nbctl lr-add lr1 ovn-nbctl lrp-add lr1 lr1-ls1 00:00:00:00:00:01 192.168.1.254/24 2001::a/64 ovn-nbctl lsp-add ls1 ls1-lr1 ovn-nbctl lsp-set-addresses ls1-lr1 "00:00:00:00:00:01 192.168.1.254 2001::a" ovn-nbctl lsp-set-type ls1-lr1 router ovn-nbctl lsp-set-options ls1-lr1 router-port=lr1-ls1 ovn-nbctl lrp-add lr1 lr1-ls2 00:00:00:00:00:02 192.168.2.254/24 2002::a/64 ovn-nbctl ls-add ls2 ovn-nbctl lsp-add ls2 ls2-lr1 ovn-nbctl lsp-set-addresses ls2-lr1 "00:00:00:00:00:02 192.168.2.254 2002::a" ovn-nbctl lsp-set-type ls2-lr1 router ovn-nbctl lsp-set-options ls2-lr1 router-port=lr1-ls2 ovn-nbctl lsp-add ls2 ls2p1 ovn-nbctl lsp-set-addresses ls2p1 "00:00:00:02:01:02 192.168.2.1 2002::1" ovn-nbctl lsp-add ls1 ls1p3 ovn-nbctl lsp-set-addresses ls1p3 "00:00:00:01:03:02 192.168.1.3 2001::3" ip netns add server2 ip link add veth0_s2 netns server2 type veth peer name veth0_s2_p ip netns exec server2 ip link set lo up ip netns exec server2 ip link set veth0_s2 up ip netns exec server2 ip link set veth0_s2 address 00:00:00:01:03:02 ip netns exec server2 ip addr add 192.168.1.3/24 dev veth0_s2 ip netns exec server2 ip -6 addr add 2001::3/64 dev veth0_s2 ip netns exec server2 ip route add default via 192.168.1.254 dev veth0_s2 ip netns exec server2 ip -6 route add default via 2001::a dev veth0_s2 ovs-vsctl add-port br-int veth0_s2_p ip link set veth0_s2_p up ovs-vsctl set interface veth0_s2_p external_ids:iface-id=ls1p3 ovn-nbctl lb-add lb0 192.168.1.100 192.168.1.1,192.168.1.2 ovn-nbctl ls-lb-add ls2 lb0 client: #!/bin/bash systemctl start openvswitch ovs-vsctl set open . external_ids:system-id=hv0 external_ids:ovn-remote=tcp:1.1.23.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.23.26 systemctl start ovn-controller ip netns add server1 ip link add veth0_s1 netns server1 type veth peer name veth0_s1_p ip netns exec server1 ip link set lo up ip netns exec server1 ip link set veth0_s1 up ip netns exec server1 ip link set veth0_s1 address 00:00:00:01:02:02 ip netns exec server1 ip addr add 192.168.1.2/24 dev veth0_s1 ip netns exec server1 ip -6 addr add 2001::2/64 dev veth0_s1 ip netns exec server1 ip route add default via 192.168.1.254 dev veth0_s1 ip netns exec server1 ip -6 route add default via 2001::a dev veth0_s1 ovs-vsctl add-port br-int veth0_s1_p ip link set veth0_s1_p up ovs-vsctl set interface veth0_s1_p external_ids:iface-id=ls1p2 ip netns add client0 ip link add veth0_c0 netns client0 type veth peer name veth0_c0_p ip netns exec client0 ip link set lo up ip netns exec client0 ip link set veth0_c0 up ip netns exec client0 ip link set veth0_c0 address 00:00:00:02:01:02 ip netns exec client0 ip addr add 192.168.2.1/24 dev veth0_c0 ip netns exec client0 ip -6 addr add 2002::1/64 dev veth0_c0 ip netns exec client0 ip route add default via 192.168.2.254 dev veth0_c0 ip netns exec client0 ip -6 route add default via 2002::a dev veth0_c0 ovs-vsctl add-port br-int veth0_c0_p ip link set veth0_c0_p up ovs-vsctl set interface veth0_c0_p external_ids:iface-id=ls2p1 reproduced on ovn20.06.2-11: [root@wsfd-advnetlab19 bz1870359]# rpm -qa | grep -E "openvswitch|ovn" openvswitch2.13-2.13.0-51.el7fdp.x86_64 ovn2.13-host-20.06.2-11.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch ovn2.13-central-20.06.2-11.el7fdp.x86_64 ovn2.13-20.06.2-11.el7fdp.x86_64 ping lb on client in background: [root@wsfd-advnetlab19 bz1870359]# ip netns exec client0 ping -q 192.168.1.100 & ping lb backend in background on client: [root@wsfd-advnetlab19 bz1870359]# ip netns exec client0 ping -q 192.168.1.1 & check conntrack: [root@wsfd-advnetlab19 bz1870359]# ovs-appctl -t ovs-vswitchd dpctl/dump-conntrack | grep 192.168.1.1 icmp,orig=(src=192.168.2.1,dst=192.168.1.100,id=23730,type=8,code=0),reply=(src=192.168.1.2,dst=192.168.2.1,id=23730,type=0,code=0),zone=2,labels=0x2 <=== on dst=192.168.1.100 is conntracked, dst=192.168.1.1 is not conntracked Verified on ovn20.09.0-2: [root@wsfd-advnetlab19 ovn20.09.0-2]# rpm -qa | grep -E "openvswitch|ovn" openvswitch2.13-2.13.0-51.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch ovn2.13-20.09.0-2.el7fdp.x86_64 ovn2.13-host-20.09.0-2.el7fdp.x86_64 ovn2.13-central-20.09.0-2.el7fdp.x86_64 [root@wsfd-advnetlab19 ovn20.09.0-2]# ovs-appctl -t ovs-vswitchd dpctl/dump-conntrack | grep 192.168.1.1 icmp,orig=(src=192.168.2.1,dst=192.168.1.100,id=23730,type=8,code=0),reply=(src=192.168.1.2,dst=192.168.2.1,id=23730,type=0,code=0),zone=2,labels=0x2 icmp,orig=(src=192.168.2.1,dst=192.168.1.1,id=23731,type=8,code=0),reply=(src=192.168.1.1,dst=192.168.2.1,id=23731,type=0,code=0),zone=2 <==== dst=192.168.1.1 is also conntracked
Verified on rhel8 version: [root@wsfd-advnetlab19 bz1870359]# rpm -qa | grep -E "openvswitch|ovn" openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch openvswitch2.13-2.13.0-61.el8fdp.x86_64 ovn2.13-host-20.09.0-2.el8fdp.x86_64 ovn2.13-20.09.0-2.el8fdp.x86_64 ovn2.13-central-20.09.0-2.el8fdp.x86_64 [root@wsfd-advnetlab19 bz1870359]# ovs-appctl -t ovs-vswitchd dpctl/dump-conntrack | grep 192.168.1.1 icmp,orig=(src=192.168.2.1,dst=192.168.1.100,id=24651,type=8,code=0),reply=(src=192.168.1.2,dst=192.168.2.1,id=24651,type=0,code=0),zone=1,labels=0x2 icmp,orig=(src=192.168.2.1,dst=192.168.1.1,id=24656,type=8,code=0),reply=(src=192.168.1.1,dst=192.168.2.1,id=24656,type=0,code=0),zone=1 icmp,orig=(src=192.168.2.1,dst=192.168.1.1,id=24648,type=8,code=0),reply=(src=192.168.1.1,dst=192.168.2.1,id=24648,type=0,code=0),zone=1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4356