Version-Release number of selected component (if applicable): 21.09.0-13.el8fdp Description of problem: When adding OVN Loadbalancers to a Logical Router if a LB with the `skip_snat="true` option and a LB without that option are added in various orders the correct OVS flows are sometimes not generated. Example Logical_router ``` _uuid : 363adb55-8d84-496a-bffb-ff153cfeaa73 copp : [] enabled : [] external_ids : {physical_ip="10.0.165.95", physical_ips="10.0.165.95"} load_balancer : [0785a8c0-ee31-4e27-be53-41aae17b4877, 1abae76e-e7be-442a-ae9e-8497d368e444, 1dde3366-8739-432b-aba0-2c80f2773ffb, 546bbf09-fae1-4c0b-9e74-d2a912390bf4, 81befdbd-3415-42ba-85f7-2aaae911bc7f, 8307ea8d-140c-4488-9e53-99ffbe9d6071, 867e0306-5b3b-4234-8948-3606035f3fdd, ab5c9b5e-196e-4f7d-8f8a-54c77491888d, b2f14eab-3616-4863-b8c5-45cf65222d4e, b6c9f15c-21d1-4dd2-b93f-223cf1423741, c206104d-323b-402d-8093-4d8267329cf3, c3bde290-8bd9-491e-96ad-bd63ce36db59] name : GR_ip-10-0-165-95.us-east-2.compute.internal nat : [4a5d34a2-0e02-49d9-aa83-007173453d97] options : {always_learn_from_arp_request="false", chassis="64f5e462-ac14-45cd-a693-bfe4adc7ac4e", dynamic_neigh_routers="true", lb_force_snat_ip=router_ip, snat-ct-zone="0"} policies : [] ports : [09280e87-4eeb-43f2-a330-29c4fb27cd01, d4ab0d41-0ea8-4c7e-8dd3-13b1f772850d] static_routes : [621d563f-8d60-4a4f-8b53-b7cd3a3cbc94, ddcfe429-6107-4104-b137-eaf36cabcd49] ``` Example LBs ``` _uuid : 867e0306-5b3b-4234-8948-3606035f3fdd external_ids : {TCP_lb_gateway_router=GR_ip-10-0-165-95.us-east-2.compute.internal_local} health_check : [] ip_port_mappings : {} name : "" options : {skip_snat="true"} protocol : tcp selection_fields : [] vips : {"10.0.165.95:30393"="", "10.0.165.95:32494"=""} ``` ```_uuid : 0785a8c0-ee31-4e27-be53-41aae17b4877 external_ids : {TCP_lb_gateway_router=GR_ip-10-0-165-95.us-east-2.compute.internal} health_check : [] ip_port_mappings : {} name : "" options : {} protocol : tcp selection_fields : [] vips : {"10.0.165.95:32612"="10.128.10.51:8080"} ``` If 867e0306-5b3b-4234-8948-3606035f3fdd gets added to the Logical Router before 0785a8c0-ee31-4e27-be53-41aae17b4877 then the lflow to OVS mapping for established traffic looks like table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 10.0.165.95 && ct_label.natted == 1 && tcp), action=(flags.force_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 10.0.165.95 && ct_label.natted == 1 && tcp), action=(flags.skip_snat_for_lb = 1; next;) table=14 ct_state=+est+trk,ct_label=0x2/0x2,tcp,reg0=0xa00a55f,metadata=0xa actions=set_field:0x200/0x200->reg10,resubmit(,15) And traffic cannot flow to 10.0.165.95:32612 on 0785a8c0-ee31-4e27-be53-41aae17b4877 because the first packet is SNATED while ensuing packets in the connection are not SNATed If 0785a8c0-ee31-4e27-be53-41aae17b4877 is added to the Logical Router before 867e0306-5b3b-4234-8948-3606035f3fdd. (or 867e0306-5b3b-4234-8948-3606035f3fdd is removed and added again) Then the same lflow to OVS mapping for established traffic looks like: ``` table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 10.0.165.95 && ct_label.natted == 1 && tcp), action=(flags.force_snat_for_lb = 1; next;) table=14 ct_state=+est+trk,ct_label=0x2/0x2,tcp,reg0=0xa00a55f,metadata=0xa actions=set_field:0x8/0x8->reg10,resubmit(,15) table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 10.0.165.95 && ct_label.natted == 1 && tcp), action=(flags.skip_snat_for_lb = 1; next;) And traffic to 10.0.165.95:32612 can flow correctly We expected the same OVS physical rules to be generated by OVN-Controller regardless of the order in which we add LBs to the Logical Router.
the est flows for the vip need to also match on port since they are "per vip" to distinguish between lbs with force or with skip i think
Yes, the established flow should specify port in order to differentiate between the two load balancers. When a packet matches two flows with the same match criteria and priority in an OpenFlow table, the behaviour is undefined. A simple reproducer follows: ` # Create the first logical switch with one port ovn-nbctl ls-add sw0 ovn-nbctl lsp-add sw0 sw0-port1 ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2" # Create the second logical switch with one port ovn-nbctl ls-add sw1 ovn-nbctl lsp-add sw1 sw1-port1 ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2" # Create a logical router and attach both logical switches ovn-nbctl lr-add lr0 ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24 ovn-nbctl lsp-add sw0 lrp0-attachment ovn-nbctl lsp-set-type lrp0-attachment router ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01 ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0 ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24 ovn-nbctl lsp-add sw1 lrp1-attachment ovn-nbctl lsp-set-type lrp1-attachment router ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02 ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1 ovs-vsctl add-port br-int p1 -- \ set Interface p1 external_ids:iface-id=sw0-port1 ovs-vsctl add-port br-int p2 -- \ set Interface p2 external_ids:iface-id=sw1-port1 ovn-nbctl set Logical_Router lr0 options:chassis=hv1 ovn-nbctl set Logical_Router lr0 options:lb_force_snat_ip=router_ip ovn-nbctl lb-add lb0 11.0.0.200:1234 192.168.0.2:8080 ovn-nbctl lb-add lb1 11.0.0.200:4321 192.168.0.2:8088 ovn-nbctl set Load_Balancer lb0 options:skip_snat=true ovn-nbctl lr-lb-add lr0 lb0 ovn-nbctl lr-lb-add lr0 lb1 ` Logical Flows should look like this: $ ovn-sbctl dump-flows lr0 | grep lr_in_dnat table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 1234 && ct_label.natted == 1), action=(flags.skip_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 4321 && ct_label.natted == 1), action=(flags.force_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 1234), action=(flags.skip_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8080);) table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 4321), action=(flags.force_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8088);) table=6 (lr_in_dnat ), priority=0 , match=(1), action=(next;) I have a patch, I just need to update tests, etc and I will post it.
https://patchwork.ozlabs.org/project/ovn/list/?series=258850
Just a small update on the reproducer above. The final implementation required loading the pre-NAT destination port into a register and then checking that register instead of "tcp.dst" in the example above. This is because the defrag table DNATs the flow before it reaches the DNAT table. This reproducer above just displays/confirms the logical flows. However, it could be extended by QA to properly test the change.
tested with following script: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.40.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.40.25 systemctl restart ovn-controller # Create the first logical switch with one port ovn-nbctl ls-add sw0 ovn-nbctl lsp-add sw0 sw0-port1 ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2" ovs-vsctl add-port br-int sw0-port1 -- set interface sw0-port1 type=internal external_ids:iface-id=sw0-port1 ip netns add sw0-port1 ip link set sw0-port1 netns sw0-port1 ip netns exec sw0-port1 ip link set sw0-port1 address 50:54:00:00:00:01 ip netns exec sw0-port1 ip link set sw0-port1 up ip netns exec sw0-port1 ip addr add 192.168.0.2/24 dev sw0-port1 ip netns exec sw0-port1 ip route add default via 192.168.0.1 # Create the second logical switch with one port ovn-nbctl ls-add sw1 ovn-nbctl lsp-add sw1 sw1-port1 ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2" ovs-vsctl add-port br-int sw1-port1 -- set interface sw1-port1 type=internal external_ids:iface-id=sw1-port1 ip netns add sw1-port1 ip link set sw1-port1 netns sw1-port1 ip netns exec sw1-port1 ip link set sw1-port1 address 50:54:00:00:00:03 ip netns exec sw1-port1 ip link set sw1-port1 up ip netns exec sw1-port1 ip addr add 11.0.0.2/24 dev sw1-port1 ip netns exec sw1-port1 ip route add default via 11.0.0.1 # Create a logical router and attach both logical switches ovn-nbctl lr-add lr0 ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24 ovn-nbctl lsp-add sw0 lrp0-attachment ovn-nbctl lsp-set-type lrp0-attachment router ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01 ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0 ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24 ovn-nbctl lsp-add sw1 lrp1-attachment ovn-nbctl lsp-set-type lrp1-attachment router ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02 ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1 ovn-nbctl set Logical_Router lr0 options:chassis=hv1 ovn-nbctl set Logical_Router lr0 options:lb_force_snat_ip=router_ip ovn-nbctl lb-add lb0 11.0.0.200:1234 192.168.0.2:8080 ovn-nbctl lb-add lb1 11.0.0.200:4321 192.168.0.2:8088 ovn-nbctl set Load_Balancer lb0 options:skip_snat=true ovn-nbctl lr-lb-add lr0 lb0 ovn-nbctl lr-lb-add lr0 lb1 ovn-sbctl dump-flows lr0 | grep lr_in_dnat ovn-nbctl --wait=hv sync ovs-ofctl dump-flows br-int | grep 11.0.0.200 ip netns exec sw0-port1 nc -k -l 8080 & ip netns exec sw0-port1 nc -k -l 8088 & sleep 1 ip netns exec sw1-port1 nc 11.0.0.200 1234 <<< h ip netns exec sw1-port1 nc 11.0.0.200 1234 <<< h ip netns exec sw1-port1 nc 11.0.0.200 4321 <<< h jobs -p | xargs kill result on ovn21.09-21.09-13: [root@dell-per740-12 bz1995326]# rpm -qa | grep -E "openvswitch2.16|ovn21.09" ovn21.09-central-21.09.0-13.el8fdp.x86_64 ovn21.09-host-21.09.0-13.el8fdp.x86_64 python3-openvswitch2.16-2.16.0-16.el8fdp.x86_64 openvswitch2.16-2.16.0-16.el8fdp.x86_64 openvswitch2.16-test-2.16.0-16.el8fdp.noarch ovn21.09-21.09.0-13.el8fdp.x86_64 + ovn-sbctl dump-flows lr0 + grep lr_in_dnat table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && ct_label.natted == 1 && tcp), action=(flags.force_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && ct_label.natted == 1 && tcp), action=(flags.skip_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 1234), action=(flags.skip_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8080);) table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 4321), action=(flags.force_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8088);) table=6 (lr_in_dnat ), priority=0 , match=(1), action=(next;) + ovn-nbctl --wait=hv sync + ovs-ofctl dump-flows br-int + grep 11.0.0.200 cookie=0x93b1a82c, duration=0.017s, table=11, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg14=0x2,metadata=0x3,arp_tpa=11.0.0.200,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_XXREG0[64..111]->NXM_OF_ETH_SRC[],load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_NX_XXREG0[64..111]->NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],push:NXM_OF_ARP_TPA[],pop:NXM_OF_ARP_SPA[],pop:NXM_OF_ARP_TPA[],move:NXM_NX_REG14[]->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0],resubmit(,37) cookie=0x2e0f22e8, duration=0.015s, table=11, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg14=0x1,metadata=0x3,arp_tpa=11.0.0.200,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_XXREG0[64..111]->NXM_OF_ETH_SRC[],load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_NX_XXREG0[64..111]->NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],push:NXM_OF_ARP_TPA[],pop:NXM_OF_ARP_SPA[],pop:NXM_OF_ARP_TPA[],move:NXM_NX_REG14[]->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0],resubmit(,37) cookie=0xbda5672b, duration=0.018s, table=13, n_packets=0, n_bytes=0, idle_age=0, priority=110,tcp,metadata=0x3,nw_dst=11.0.0.200 actions=load:0xb0000c8->NXM_NX_XXREG0[96..127],ct(table=14,zone=NXM_NX_REG11[0..15],nat) cookie=0x73184e1f, duration=0.018s, table=30, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg10=0/0x2,metadata=0x1,arp_tpa=11.0.0.200,arp_op=1 actions=load:0x8000->NXM_NX_REG15[],resubmit(,37) cookie=0xb34e802d, duration=0.018s, table=30, n_packets=0, n_bytes=0, idle_age=0, priority=80,arp,reg10=0/0x2,metadata=0x2,arp_tpa=11.0.0.200,arp_op=1 actions=clone(load:0x2->NXM_NX_REG15[],resubmit(,37)),load:0x8005->NXM_NX_REG15[],resubmit(,37) + ip netns exec sw0-port1 nc -k -l 8080 + sleep 1 + ip netns exec sw0-port1 nc -k -l 8088 + ip netns exec sw1-port1 nc 11.0.0.200 1234 h + ip netns exec sw1-port1 nc 11.0.0.200 1234 h + ip netns exec sw1-port1 nc 11.0.0.200 4321 Ncat: Connection reset by peer. <=== failed result on ovn-2021-21.09.0-12: [root@dell-per740-12 bz1995326]# rpm -qa | grep -E "openvswitch2.16|ovn-2021" ovn-2021-central-21.09.0-12.el8fdp.x86_64 ovn-2021-host-21.09.0-12.el8fdp.x86_64 python3-openvswitch2.16-2.16.0-16.el8fdp.x86_64 ovn-2021-21.09.0-12.el8fdp.x86_64 openvswitch2.16-2.16.0-16.el8fdp.x86_64 openvswitch2.16-test-2.16.0-16.el8fdp.noarch + ovn-sbctl dump-flows lr0 + grep lr_in_dnat table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && tcp && reg9[16..31] == 1234 && ct_label.natted == 1), action=(flags.skip_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && tcp && reg9[16..31] == 4321 && ct_label.natted == 1), action=(flags.force_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && reg9[16..31] == 1234), action=(flags.skip_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8080);) table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && reg9[16..31] == 4321), action=(flags.force_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8088);) table=6 (lr_in_dnat ), priority=0 , match=(1), action=(next;) + ovn-nbctl --wait=hv sync + ovs-ofctl dump-flows br-int + grep 11.0.0.200 cookie=0x462f8d03, duration=0.016s, table=11, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg14=0x1,metadata=0x3,arp_tpa=11.0.0.200,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_XXREG0[64..111]->NXM_OF_ETH_SRC[],load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_NX_XXREG0[64..111]->NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],push:NXM_OF_ARP_TPA[],pop:NXM_OF_ARP_SPA[],pop:NXM_OF_ARP_TPA[],move:NXM_NX_REG14[]->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0],resubmit(,37) cookie=0x4fd1bbfa, duration=0.013s, table=11, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg14=0x2,metadata=0x3,arp_tpa=11.0.0.200,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_XXREG0[64..111]->NXM_OF_ETH_SRC[],load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_NX_XXREG0[64..111]->NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],push:NXM_OF_ARP_TPA[],pop:NXM_OF_ARP_SPA[],pop:NXM_OF_ARP_TPA[],move:NXM_NX_REG14[]->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0],resubmit(,37) cookie=0xf8645487, duration=0.016s, table=13, n_packets=0, n_bytes=0, idle_age=0, priority=110,tcp,metadata=0x3,nw_dst=11.0.0.200 actions=load:0xb0000c8->NXM_NX_XXREG0[96..127],move:NXM_OF_TCP_DST[]->OXM_OF_PKT_REG4[16..31],ct(table=14,zone=NXM_NX_REG11[0..15],nat) cookie=0x29ebc057, duration=0.014s, table=30, n_packets=0, n_bytes=0, idle_age=0, priority=80,arp,reg10=0/0x2,metadata=0x2,arp_tpa=11.0.0.200,arp_op=1 actions=clone(load:0x2->NXM_NX_REG15[],resubmit(,37)),load:0x8005->NXM_NX_REG15[],resubmit(,37) cookie=0xe6f9062c, duration=0.014s, table=30, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg10=0/0x2,metadata=0x1,arp_tpa=11.0.0.200,arp_op=1 actions=load:0x8000->NXM_NX_REG15[],resubmit(,37) + ip netns exec sw0-port1 nc -k -l 8080 + sleep 1 + ip netns exec sw0-port1 nc -k -l 8088 + ip netns exec sw1-port1 nc 11.0.0.200 1234 h + ip netns exec sw1-port1 nc 11.0.0.200 1234 h + ip netns exec sw1-port1 nc 11.0.0.200 4321 h
set VERIFIED per comment 5
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:5059