Bug 1995326
| Summary: | Loadbalancer `skip_snat="true"` Option causes OVN-Controller race when adding LB to Logical_Router | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Andrew Stoycos <astoycos> |
| Component: | OVN | Assignee: | OVN Team <ovnteam> |
| Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | RHEL 8.0 | CC: | ctrautma, jiji, jishi, kfida, mmichels, trozet |
| Target Milestone: | --- | ||
| Target Release: | FDP 21.I | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-12-09 15:37:27 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
the est flows for the vip need to also match on port since they are "per vip" to distinguish between lbs with force or with skip i think Yes, the established flow should specify port in order to differentiate between the two load balancers. When a packet matches two flows with the same match criteria and priority in an OpenFlow table, the behaviour is undefined. A simple reproducer follows:
`
# Create the first logical switch with one port
ovn-nbctl ls-add sw0
ovn-nbctl lsp-add sw0 sw0-port1
ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2"
# Create the second logical switch with one port
ovn-nbctl ls-add sw1
ovn-nbctl lsp-add sw1 sw1-port1
ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2"
# Create a logical router and attach both logical switches
ovn-nbctl lr-add lr0
ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24
ovn-nbctl lsp-add sw0 lrp0-attachment
ovn-nbctl lsp-set-type lrp0-attachment router
ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01
ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0
ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24
ovn-nbctl lsp-add sw1 lrp1-attachment
ovn-nbctl lsp-set-type lrp1-attachment router
ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02
ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1
ovs-vsctl add-port br-int p1 -- \
set Interface p1 external_ids:iface-id=sw0-port1
ovs-vsctl add-port br-int p2 -- \
set Interface p2 external_ids:iface-id=sw1-port1
ovn-nbctl set Logical_Router lr0 options:chassis=hv1
ovn-nbctl set Logical_Router lr0 options:lb_force_snat_ip=router_ip
ovn-nbctl lb-add lb0 11.0.0.200:1234 192.168.0.2:8080
ovn-nbctl lb-add lb1 11.0.0.200:4321 192.168.0.2:8088
ovn-nbctl set Load_Balancer lb0 options:skip_snat=true
ovn-nbctl lr-lb-add lr0 lb0
ovn-nbctl lr-lb-add lr0 lb1
`
Logical Flows should look like this:
$ ovn-sbctl dump-flows lr0 | grep lr_in_dnat
table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 1234 && ct_label.natted == 1), action=(flags.skip_snat_for_lb = 1; next;)
table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 4321 && ct_label.natted == 1), action=(flags.force_snat_for_lb = 1; next;)
table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 1234), action=(flags.skip_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8080);)
table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 4321), action=(flags.force_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8088);)
table=6 (lr_in_dnat ), priority=0 , match=(1), action=(next;)
I have a patch, I just need to update tests, etc and I will post it.
Just a small update on the reproducer above. The final implementation required loading the pre-NAT destination port into a register and then checking that register instead of "tcp.dst" in the example above. This is because the defrag table DNATs the flow before it reaches the DNAT table. This reproducer above just displays/confirms the logical flows. However, it could be extended by QA to properly test the change. tested with following script:
systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.40.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.40.25
systemctl restart ovn-controller
# Create the first logical switch with one port
ovn-nbctl ls-add sw0
ovn-nbctl lsp-add sw0 sw0-port1
ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2"
ovs-vsctl add-port br-int sw0-port1 -- set interface sw0-port1 type=internal external_ids:iface-id=sw0-port1
ip netns add sw0-port1
ip link set sw0-port1 netns sw0-port1
ip netns exec sw0-port1 ip link set sw0-port1 address 50:54:00:00:00:01
ip netns exec sw0-port1 ip link set sw0-port1 up
ip netns exec sw0-port1 ip addr add 192.168.0.2/24 dev sw0-port1
ip netns exec sw0-port1 ip route add default via 192.168.0.1
# Create the second logical switch with one port
ovn-nbctl ls-add sw1
ovn-nbctl lsp-add sw1 sw1-port1
ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2"
ovs-vsctl add-port br-int sw1-port1 -- set interface sw1-port1 type=internal external_ids:iface-id=sw1-port1
ip netns add sw1-port1
ip link set sw1-port1 netns sw1-port1
ip netns exec sw1-port1 ip link set sw1-port1 address 50:54:00:00:00:03
ip netns exec sw1-port1 ip link set sw1-port1 up
ip netns exec sw1-port1 ip addr add 11.0.0.2/24 dev sw1-port1
ip netns exec sw1-port1 ip route add default via 11.0.0.1
# Create a logical router and attach both logical switches
ovn-nbctl lr-add lr0
ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24
ovn-nbctl lsp-add sw0 lrp0-attachment
ovn-nbctl lsp-set-type lrp0-attachment router
ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01
ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0
ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24
ovn-nbctl lsp-add sw1 lrp1-attachment
ovn-nbctl lsp-set-type lrp1-attachment router
ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02
ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1
ovn-nbctl set Logical_Router lr0 options:chassis=hv1
ovn-nbctl set Logical_Router lr0 options:lb_force_snat_ip=router_ip
ovn-nbctl lb-add lb0 11.0.0.200:1234 192.168.0.2:8080
ovn-nbctl lb-add lb1 11.0.0.200:4321 192.168.0.2:8088
ovn-nbctl set Load_Balancer lb0 options:skip_snat=true
ovn-nbctl lr-lb-add lr0 lb0
ovn-nbctl lr-lb-add lr0 lb1
ovn-sbctl dump-flows lr0 | grep lr_in_dnat
ovn-nbctl --wait=hv sync
ovs-ofctl dump-flows br-int | grep 11.0.0.200
ip netns exec sw0-port1 nc -k -l 8080 &
ip netns exec sw0-port1 nc -k -l 8088 &
sleep 1
ip netns exec sw1-port1 nc 11.0.0.200 1234 <<< h
ip netns exec sw1-port1 nc 11.0.0.200 1234 <<< h
ip netns exec sw1-port1 nc 11.0.0.200 4321 <<< h
jobs -p | xargs kill
result on ovn21.09-21.09-13:
[root@dell-per740-12 bz1995326]# rpm -qa | grep -E "openvswitch2.16|ovn21.09"
ovn21.09-central-21.09.0-13.el8fdp.x86_64
ovn21.09-host-21.09.0-13.el8fdp.x86_64
python3-openvswitch2.16-2.16.0-16.el8fdp.x86_64
openvswitch2.16-2.16.0-16.el8fdp.x86_64
openvswitch2.16-test-2.16.0-16.el8fdp.noarch
ovn21.09-21.09.0-13.el8fdp.x86_64
+ ovn-sbctl dump-flows lr0
+ grep lr_in_dnat
table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && ct_label.natted == 1 && tcp), action=(flags.force_snat_for_lb = 1; next;)
table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && ct_label.natted == 1 && tcp), action=(flags.skip_snat_for_lb = 1; next;)
table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 1234), action=(flags.skip_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8080);)
table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 4321), action=(flags.force_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8088);)
table=6 (lr_in_dnat ), priority=0 , match=(1), action=(next;)
+ ovn-nbctl --wait=hv sync
+ ovs-ofctl dump-flows br-int
+ grep 11.0.0.200
cookie=0x93b1a82c, duration=0.017s, table=11, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg14=0x2,metadata=0x3,arp_tpa=11.0.0.200,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_XXREG0[64..111]->NXM_OF_ETH_SRC[],load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_NX_XXREG0[64..111]->NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],push:NXM_OF_ARP_TPA[],pop:NXM_OF_ARP_SPA[],pop:NXM_OF_ARP_TPA[],move:NXM_NX_REG14[]->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0],resubmit(,37)
cookie=0x2e0f22e8, duration=0.015s, table=11, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg14=0x1,metadata=0x3,arp_tpa=11.0.0.200,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_XXREG0[64..111]->NXM_OF_ETH_SRC[],load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_NX_XXREG0[64..111]->NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],push:NXM_OF_ARP_TPA[],pop:NXM_OF_ARP_SPA[],pop:NXM_OF_ARP_TPA[],move:NXM_NX_REG14[]->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0],resubmit(,37)
cookie=0xbda5672b, duration=0.018s, table=13, n_packets=0, n_bytes=0, idle_age=0, priority=110,tcp,metadata=0x3,nw_dst=11.0.0.200 actions=load:0xb0000c8->NXM_NX_XXREG0[96..127],ct(table=14,zone=NXM_NX_REG11[0..15],nat)
cookie=0x73184e1f, duration=0.018s, table=30, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg10=0/0x2,metadata=0x1,arp_tpa=11.0.0.200,arp_op=1 actions=load:0x8000->NXM_NX_REG15[],resubmit(,37)
cookie=0xb34e802d, duration=0.018s, table=30, n_packets=0, n_bytes=0, idle_age=0, priority=80,arp,reg10=0/0x2,metadata=0x2,arp_tpa=11.0.0.200,arp_op=1 actions=clone(load:0x2->NXM_NX_REG15[],resubmit(,37)),load:0x8005->NXM_NX_REG15[],resubmit(,37)
+ ip netns exec sw0-port1 nc -k -l 8080
+ sleep 1
+ ip netns exec sw0-port1 nc -k -l 8088
+ ip netns exec sw1-port1 nc 11.0.0.200 1234
h
+ ip netns exec sw1-port1 nc 11.0.0.200 1234
h
+ ip netns exec sw1-port1 nc 11.0.0.200 4321
Ncat: Connection reset by peer.
<=== failed
result on ovn-2021-21.09.0-12:
[root@dell-per740-12 bz1995326]# rpm -qa | grep -E "openvswitch2.16|ovn-2021"
ovn-2021-central-21.09.0-12.el8fdp.x86_64
ovn-2021-host-21.09.0-12.el8fdp.x86_64
python3-openvswitch2.16-2.16.0-16.el8fdp.x86_64
ovn-2021-21.09.0-12.el8fdp.x86_64
openvswitch2.16-2.16.0-16.el8fdp.x86_64
openvswitch2.16-test-2.16.0-16.el8fdp.noarch
+ ovn-sbctl dump-flows lr0
+ grep lr_in_dnat
table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && tcp && reg9[16..31] == 1234 && ct_label.natted == 1), action=(flags.skip_snat_for_lb = 1; next;)
table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && tcp && reg9[16..31] == 4321 && ct_label.natted == 1), action=(flags.force_snat_for_lb = 1; next;)
table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && reg9[16..31] == 1234), action=(flags.skip_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8080);)
table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && reg9[16..31] == 4321), action=(flags.force_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8088);)
table=6 (lr_in_dnat ), priority=0 , match=(1), action=(next;)
+ ovn-nbctl --wait=hv sync
+ ovs-ofctl dump-flows br-int
+ grep 11.0.0.200
cookie=0x462f8d03, duration=0.016s, table=11, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg14=0x1,metadata=0x3,arp_tpa=11.0.0.200,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_XXREG0[64..111]->NXM_OF_ETH_SRC[],load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_NX_XXREG0[64..111]->NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],push:NXM_OF_ARP_TPA[],pop:NXM_OF_ARP_SPA[],pop:NXM_OF_ARP_TPA[],move:NXM_NX_REG14[]->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0],resubmit(,37)
cookie=0x4fd1bbfa, duration=0.013s, table=11, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg14=0x2,metadata=0x3,arp_tpa=11.0.0.200,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_XXREG0[64..111]->NXM_OF_ETH_SRC[],load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_NX_XXREG0[64..111]->NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],push:NXM_OF_ARP_TPA[],pop:NXM_OF_ARP_SPA[],pop:NXM_OF_ARP_TPA[],move:NXM_NX_REG14[]->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0],resubmit(,37)
cookie=0xf8645487, duration=0.016s, table=13, n_packets=0, n_bytes=0, idle_age=0, priority=110,tcp,metadata=0x3,nw_dst=11.0.0.200 actions=load:0xb0000c8->NXM_NX_XXREG0[96..127],move:NXM_OF_TCP_DST[]->OXM_OF_PKT_REG4[16..31],ct(table=14,zone=NXM_NX_REG11[0..15],nat)
cookie=0x29ebc057, duration=0.014s, table=30, n_packets=0, n_bytes=0, idle_age=0, priority=80,arp,reg10=0/0x2,metadata=0x2,arp_tpa=11.0.0.200,arp_op=1 actions=clone(load:0x2->NXM_NX_REG15[],resubmit(,37)),load:0x8005->NXM_NX_REG15[],resubmit(,37)
cookie=0xe6f9062c, duration=0.014s, table=30, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg10=0/0x2,metadata=0x1,arp_tpa=11.0.0.200,arp_op=1 actions=load:0x8000->NXM_NX_REG15[],resubmit(,37)
+ ip netns exec sw0-port1 nc -k -l 8080
+ sleep 1
+ ip netns exec sw0-port1 nc -k -l 8088
+ ip netns exec sw1-port1 nc 11.0.0.200 1234
h
+ ip netns exec sw1-port1 nc 11.0.0.200 1234
h
+ ip netns exec sw1-port1 nc 11.0.0.200 4321
h
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:5059 |
Version-Release number of selected component (if applicable): 21.09.0-13.el8fdp Description of problem: When adding OVN Loadbalancers to a Logical Router if a LB with the `skip_snat="true` option and a LB without that option are added in various orders the correct OVS flows are sometimes not generated. Example Logical_router ``` _uuid : 363adb55-8d84-496a-bffb-ff153cfeaa73 copp : [] enabled : [] external_ids : {physical_ip="10.0.165.95", physical_ips="10.0.165.95"} load_balancer : [0785a8c0-ee31-4e27-be53-41aae17b4877, 1abae76e-e7be-442a-ae9e-8497d368e444, 1dde3366-8739-432b-aba0-2c80f2773ffb, 546bbf09-fae1-4c0b-9e74-d2a912390bf4, 81befdbd-3415-42ba-85f7-2aaae911bc7f, 8307ea8d-140c-4488-9e53-99ffbe9d6071, 867e0306-5b3b-4234-8948-3606035f3fdd, ab5c9b5e-196e-4f7d-8f8a-54c77491888d, b2f14eab-3616-4863-b8c5-45cf65222d4e, b6c9f15c-21d1-4dd2-b93f-223cf1423741, c206104d-323b-402d-8093-4d8267329cf3, c3bde290-8bd9-491e-96ad-bd63ce36db59] name : GR_ip-10-0-165-95.us-east-2.compute.internal nat : [4a5d34a2-0e02-49d9-aa83-007173453d97] options : {always_learn_from_arp_request="false", chassis="64f5e462-ac14-45cd-a693-bfe4adc7ac4e", dynamic_neigh_routers="true", lb_force_snat_ip=router_ip, snat-ct-zone="0"} policies : [] ports : [09280e87-4eeb-43f2-a330-29c4fb27cd01, d4ab0d41-0ea8-4c7e-8dd3-13b1f772850d] static_routes : [621d563f-8d60-4a4f-8b53-b7cd3a3cbc94, ddcfe429-6107-4104-b137-eaf36cabcd49] ``` Example LBs ``` _uuid : 867e0306-5b3b-4234-8948-3606035f3fdd external_ids : {TCP_lb_gateway_router=GR_ip-10-0-165-95.us-east-2.compute.internal_local} health_check : [] ip_port_mappings : {} name : "" options : {skip_snat="true"} protocol : tcp selection_fields : [] vips : {"10.0.165.95:30393"="", "10.0.165.95:32494"=""} ``` ```_uuid : 0785a8c0-ee31-4e27-be53-41aae17b4877 external_ids : {TCP_lb_gateway_router=GR_ip-10-0-165-95.us-east-2.compute.internal} health_check : [] ip_port_mappings : {} name : "" options : {} protocol : tcp selection_fields : [] vips : {"10.0.165.95:32612"="10.128.10.51:8080"} ``` If 867e0306-5b3b-4234-8948-3606035f3fdd gets added to the Logical Router before 0785a8c0-ee31-4e27-be53-41aae17b4877 then the lflow to OVS mapping for established traffic looks like table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 10.0.165.95 && ct_label.natted == 1 && tcp), action=(flags.force_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 10.0.165.95 && ct_label.natted == 1 && tcp), action=(flags.skip_snat_for_lb = 1; next;) table=14 ct_state=+est+trk,ct_label=0x2/0x2,tcp,reg0=0xa00a55f,metadata=0xa actions=set_field:0x200/0x200->reg10,resubmit(,15) And traffic cannot flow to 10.0.165.95:32612 on 0785a8c0-ee31-4e27-be53-41aae17b4877 because the first packet is SNATED while ensuing packets in the connection are not SNATed If 0785a8c0-ee31-4e27-be53-41aae17b4877 is added to the Logical Router before 867e0306-5b3b-4234-8948-3606035f3fdd. (or 867e0306-5b3b-4234-8948-3606035f3fdd is removed and added again) Then the same lflow to OVS mapping for established traffic looks like: ``` table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 10.0.165.95 && ct_label.natted == 1 && tcp), action=(flags.force_snat_for_lb = 1; next;) table=14 ct_state=+est+trk,ct_label=0x2/0x2,tcp,reg0=0xa00a55f,metadata=0xa actions=set_field:0x8/0x8->reg10,resubmit(,15) table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 10.0.165.95 && ct_label.natted == 1 && tcp), action=(flags.skip_snat_for_lb = 1; next;) And traffic to 10.0.165.95:32612 can flow correctly We expected the same OVS physical rules to be generated by OVN-Controller regardless of the order in which we add LBs to the Logical Router.