Bug 1995326
Summary: | Loadbalancer `skip_snat="true"` Option causes OVN-Controller race when adding LB to Logical_Router | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Andrew Stoycos <astoycos> |
Component: | OVN | Assignee: | OVN Team <ovnteam> |
Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | RHEL 8.0 | CC: | ctrautma, jiji, jishi, kfida, mmichels, trozet |
Target Milestone: | --- | ||
Target Release: | FDP 21.I | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-12-09 15:37:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Andrew Stoycos
2021-08-18 20:06:51 UTC
the est flows for the vip need to also match on port since they are "per vip" to distinguish between lbs with force or with skip i think Yes, the established flow should specify port in order to differentiate between the two load balancers. When a packet matches two flows with the same match criteria and priority in an OpenFlow table, the behaviour is undefined. A simple reproducer follows: ` # Create the first logical switch with one port ovn-nbctl ls-add sw0 ovn-nbctl lsp-add sw0 sw0-port1 ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2" # Create the second logical switch with one port ovn-nbctl ls-add sw1 ovn-nbctl lsp-add sw1 sw1-port1 ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2" # Create a logical router and attach both logical switches ovn-nbctl lr-add lr0 ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24 ovn-nbctl lsp-add sw0 lrp0-attachment ovn-nbctl lsp-set-type lrp0-attachment router ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01 ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0 ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24 ovn-nbctl lsp-add sw1 lrp1-attachment ovn-nbctl lsp-set-type lrp1-attachment router ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02 ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1 ovs-vsctl add-port br-int p1 -- \ set Interface p1 external_ids:iface-id=sw0-port1 ovs-vsctl add-port br-int p2 -- \ set Interface p2 external_ids:iface-id=sw1-port1 ovn-nbctl set Logical_Router lr0 options:chassis=hv1 ovn-nbctl set Logical_Router lr0 options:lb_force_snat_ip=router_ip ovn-nbctl lb-add lb0 11.0.0.200:1234 192.168.0.2:8080 ovn-nbctl lb-add lb1 11.0.0.200:4321 192.168.0.2:8088 ovn-nbctl set Load_Balancer lb0 options:skip_snat=true ovn-nbctl lr-lb-add lr0 lb0 ovn-nbctl lr-lb-add lr0 lb1 ` Logical Flows should look like this: $ ovn-sbctl dump-flows lr0 | grep lr_in_dnat table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 1234 && ct_label.natted == 1), action=(flags.skip_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 4321 && ct_label.natted == 1), action=(flags.force_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 1234), action=(flags.skip_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8080);) table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 4321), action=(flags.force_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8088);) table=6 (lr_in_dnat ), priority=0 , match=(1), action=(next;) I have a patch, I just need to update tests, etc and I will post it. Just a small update on the reproducer above. The final implementation required loading the pre-NAT destination port into a register and then checking that register instead of "tcp.dst" in the example above. This is because the defrag table DNATs the flow before it reaches the DNAT table. This reproducer above just displays/confirms the logical flows. However, it could be extended by QA to properly test the change. tested with following script: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.40.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.40.25 systemctl restart ovn-controller # Create the first logical switch with one port ovn-nbctl ls-add sw0 ovn-nbctl lsp-add sw0 sw0-port1 ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2" ovs-vsctl add-port br-int sw0-port1 -- set interface sw0-port1 type=internal external_ids:iface-id=sw0-port1 ip netns add sw0-port1 ip link set sw0-port1 netns sw0-port1 ip netns exec sw0-port1 ip link set sw0-port1 address 50:54:00:00:00:01 ip netns exec sw0-port1 ip link set sw0-port1 up ip netns exec sw0-port1 ip addr add 192.168.0.2/24 dev sw0-port1 ip netns exec sw0-port1 ip route add default via 192.168.0.1 # Create the second logical switch with one port ovn-nbctl ls-add sw1 ovn-nbctl lsp-add sw1 sw1-port1 ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2" ovs-vsctl add-port br-int sw1-port1 -- set interface sw1-port1 type=internal external_ids:iface-id=sw1-port1 ip netns add sw1-port1 ip link set sw1-port1 netns sw1-port1 ip netns exec sw1-port1 ip link set sw1-port1 address 50:54:00:00:00:03 ip netns exec sw1-port1 ip link set sw1-port1 up ip netns exec sw1-port1 ip addr add 11.0.0.2/24 dev sw1-port1 ip netns exec sw1-port1 ip route add default via 11.0.0.1 # Create a logical router and attach both logical switches ovn-nbctl lr-add lr0 ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24 ovn-nbctl lsp-add sw0 lrp0-attachment ovn-nbctl lsp-set-type lrp0-attachment router ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01 ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0 ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24 ovn-nbctl lsp-add sw1 lrp1-attachment ovn-nbctl lsp-set-type lrp1-attachment router ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02 ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1 ovn-nbctl set Logical_Router lr0 options:chassis=hv1 ovn-nbctl set Logical_Router lr0 options:lb_force_snat_ip=router_ip ovn-nbctl lb-add lb0 11.0.0.200:1234 192.168.0.2:8080 ovn-nbctl lb-add lb1 11.0.0.200:4321 192.168.0.2:8088 ovn-nbctl set Load_Balancer lb0 options:skip_snat=true ovn-nbctl lr-lb-add lr0 lb0 ovn-nbctl lr-lb-add lr0 lb1 ovn-sbctl dump-flows lr0 | grep lr_in_dnat ovn-nbctl --wait=hv sync ovs-ofctl dump-flows br-int | grep 11.0.0.200 ip netns exec sw0-port1 nc -k -l 8080 & ip netns exec sw0-port1 nc -k -l 8088 & sleep 1 ip netns exec sw1-port1 nc 11.0.0.200 1234 <<< h ip netns exec sw1-port1 nc 11.0.0.200 1234 <<< h ip netns exec sw1-port1 nc 11.0.0.200 4321 <<< h jobs -p | xargs kill result on ovn21.09-21.09-13: [root@dell-per740-12 bz1995326]# rpm -qa | grep -E "openvswitch2.16|ovn21.09" ovn21.09-central-21.09.0-13.el8fdp.x86_64 ovn21.09-host-21.09.0-13.el8fdp.x86_64 python3-openvswitch2.16-2.16.0-16.el8fdp.x86_64 openvswitch2.16-2.16.0-16.el8fdp.x86_64 openvswitch2.16-test-2.16.0-16.el8fdp.noarch ovn21.09-21.09.0-13.el8fdp.x86_64 + ovn-sbctl dump-flows lr0 + grep lr_in_dnat table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && ct_label.natted == 1 && tcp), action=(flags.force_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && ct_label.natted == 1 && tcp), action=(flags.skip_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 1234), action=(flags.skip_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8080);) table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && tcp.dst == 4321), action=(flags.force_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8088);) table=6 (lr_in_dnat ), priority=0 , match=(1), action=(next;) + ovn-nbctl --wait=hv sync + ovs-ofctl dump-flows br-int + grep 11.0.0.200 cookie=0x93b1a82c, duration=0.017s, table=11, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg14=0x2,metadata=0x3,arp_tpa=11.0.0.200,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_XXREG0[64..111]->NXM_OF_ETH_SRC[],load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_NX_XXREG0[64..111]->NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],push:NXM_OF_ARP_TPA[],pop:NXM_OF_ARP_SPA[],pop:NXM_OF_ARP_TPA[],move:NXM_NX_REG14[]->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0],resubmit(,37) cookie=0x2e0f22e8, duration=0.015s, table=11, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg14=0x1,metadata=0x3,arp_tpa=11.0.0.200,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_XXREG0[64..111]->NXM_OF_ETH_SRC[],load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_NX_XXREG0[64..111]->NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],push:NXM_OF_ARP_TPA[],pop:NXM_OF_ARP_SPA[],pop:NXM_OF_ARP_TPA[],move:NXM_NX_REG14[]->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0],resubmit(,37) cookie=0xbda5672b, duration=0.018s, table=13, n_packets=0, n_bytes=0, idle_age=0, priority=110,tcp,metadata=0x3,nw_dst=11.0.0.200 actions=load:0xb0000c8->NXM_NX_XXREG0[96..127],ct(table=14,zone=NXM_NX_REG11[0..15],nat) cookie=0x73184e1f, duration=0.018s, table=30, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg10=0/0x2,metadata=0x1,arp_tpa=11.0.0.200,arp_op=1 actions=load:0x8000->NXM_NX_REG15[],resubmit(,37) cookie=0xb34e802d, duration=0.018s, table=30, n_packets=0, n_bytes=0, idle_age=0, priority=80,arp,reg10=0/0x2,metadata=0x2,arp_tpa=11.0.0.200,arp_op=1 actions=clone(load:0x2->NXM_NX_REG15[],resubmit(,37)),load:0x8005->NXM_NX_REG15[],resubmit(,37) + ip netns exec sw0-port1 nc -k -l 8080 + sleep 1 + ip netns exec sw0-port1 nc -k -l 8088 + ip netns exec sw1-port1 nc 11.0.0.200 1234 h + ip netns exec sw1-port1 nc 11.0.0.200 1234 h + ip netns exec sw1-port1 nc 11.0.0.200 4321 Ncat: Connection reset by peer. <=== failed result on ovn-2021-21.09.0-12: [root@dell-per740-12 bz1995326]# rpm -qa | grep -E "openvswitch2.16|ovn-2021" ovn-2021-central-21.09.0-12.el8fdp.x86_64 ovn-2021-host-21.09.0-12.el8fdp.x86_64 python3-openvswitch2.16-2.16.0-16.el8fdp.x86_64 ovn-2021-21.09.0-12.el8fdp.x86_64 openvswitch2.16-2.16.0-16.el8fdp.x86_64 openvswitch2.16-test-2.16.0-16.el8fdp.noarch + ovn-sbctl dump-flows lr0 + grep lr_in_dnat table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && tcp && reg9[16..31] == 1234 && ct_label.natted == 1), action=(flags.skip_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && tcp && reg9[16..31] == 4321 && ct_label.natted == 1), action=(flags.force_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && reg9[16..31] == 1234), action=(flags.skip_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8080);) table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip4 && reg0 == 11.0.0.200 && tcp && reg9[16..31] == 4321), action=(flags.force_snat_for_lb = 1; ct_lb(backends=192.168.0.2:8088);) table=6 (lr_in_dnat ), priority=0 , match=(1), action=(next;) + ovn-nbctl --wait=hv sync + ovs-ofctl dump-flows br-int + grep 11.0.0.200 cookie=0x462f8d03, duration=0.016s, table=11, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg14=0x1,metadata=0x3,arp_tpa=11.0.0.200,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_XXREG0[64..111]->NXM_OF_ETH_SRC[],load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_NX_XXREG0[64..111]->NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],push:NXM_OF_ARP_TPA[],pop:NXM_OF_ARP_SPA[],pop:NXM_OF_ARP_TPA[],move:NXM_NX_REG14[]->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0],resubmit(,37) cookie=0x4fd1bbfa, duration=0.013s, table=11, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg14=0x2,metadata=0x3,arp_tpa=11.0.0.200,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_XXREG0[64..111]->NXM_OF_ETH_SRC[],load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_NX_XXREG0[64..111]->NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],push:NXM_OF_ARP_TPA[],pop:NXM_OF_ARP_SPA[],pop:NXM_OF_ARP_TPA[],move:NXM_NX_REG14[]->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0],resubmit(,37) cookie=0xf8645487, duration=0.016s, table=13, n_packets=0, n_bytes=0, idle_age=0, priority=110,tcp,metadata=0x3,nw_dst=11.0.0.200 actions=load:0xb0000c8->NXM_NX_XXREG0[96..127],move:NXM_OF_TCP_DST[]->OXM_OF_PKT_REG4[16..31],ct(table=14,zone=NXM_NX_REG11[0..15],nat) cookie=0x29ebc057, duration=0.014s, table=30, n_packets=0, n_bytes=0, idle_age=0, priority=80,arp,reg10=0/0x2,metadata=0x2,arp_tpa=11.0.0.200,arp_op=1 actions=clone(load:0x2->NXM_NX_REG15[],resubmit(,37)),load:0x8005->NXM_NX_REG15[],resubmit(,37) cookie=0xe6f9062c, duration=0.014s, table=30, n_packets=0, n_bytes=0, idle_age=0, priority=90,arp,reg10=0/0x2,metadata=0x1,arp_tpa=11.0.0.200,arp_op=1 actions=load:0x8000->NXM_NX_REG15[],resubmit(,37) + ip netns exec sw0-port1 nc -k -l 8080 + sleep 1 + ip netns exec sw0-port1 nc -k -l 8088 + ip netns exec sw1-port1 nc 11.0.0.200 1234 h + ip netns exec sw1-port1 nc 11.0.0.200 1234 h + ip netns exec sw1-port1 nc 11.0.0.200 4321 h Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:5059 |