Bug 1881826
Summary: | RFE: support ECMP for logical router policies | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Alexander Constantinescu <aconstan> |
Component: | ovn2.13 | Assignee: | Numan Siddique <nusiddiq> |
Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | RHEL 8.0 | CC: | ctrautma, dcbw, jishi, nusiddiq, ralongi |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-03 21:55:09 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alexander Constantinescu
2020-09-23 07:28:35 UTC
I'm working on this feature. Few observations: 1. It is possible to add ecmp support to policies. I'm planning to add a new column - nexthops to the table - Logical_Router_Policy where the user can set multiple hops. (unlike static_routes where you need to add 2 static routes.). Does this look fine ? Eg. ovn-nbctl lr-policy-add lr0 900 "ip4.src == 10.0.0.5" reroute 172.168.0.201,172.168.0.202 2. It is not possible to support ecmp-symmetric-reply. We could support that in static routes because user specifies prefix and nexthop and we can easily match for prefix in the reverse direction. But in the case of policy, user gives the match string. This match can be anything and OVN doesn't lool into the match. Is this fine ? Hi Numan > Does this look fine ? > Eg. ovn-nbctl lr-policy-add lr0 900 "ip4.src == 10.0.0.5" reroute 172.168.0.201,172.168.0.202 I think it does. > 2. It is not possible to support ecmp-symmetric-reply. We could support that in static routes because user specifies prefix and nexthop and we can easily match for prefix in the reverse direction. But in the case of policy, user gives the match string. This match can be anything and OVN doesn't lool into the match. So, as mentioned during our call a couple of min ago. The use case is the following (using your example above): 10.0.0.5 (client), 8.8.8.8 (server) + 172.168.0.201, 172.168.0.202 (2 nexthops). Let's suppose in this case that 172.168.0.202 is down and can't serve traffic. 10.0.0.5 initiates a connection to 8.8.8.8 (server), the outgoing packets will be routed through 172.168.0.201 and we want to make sure that the reply goes through 172.168.0.201 when coming back. So with ecmp we want to increase the probability that 172.168.0.201 is __sometimes__ used as nexthop. Right now we can end up in a situation where it is never used and traffic continues to be pushed out to 172.168.0.202 even though it can't serve traffic. The point with ecmp-symmetric-reply is that if 10.0.0.5 instantiates a connection to 8.8.8.8 and goes through 172.168.0.201 we don't want any packet in that connection to come back through 172.168.0.202. If this use case is already covered by OVN with ecmp only, then I am fine with not supporting ecmp-symmetric-reply. I hope that is clearer. /Alexander I have a WIP patch ready - https://github.com/numansiddique/ovn/commit/5eb8ab36163c63229cb80ce992b345a21f1746c9 Before submitting the patch upstream, It would be great if this can be tested with ovn-k8s kind setup, The scratch build for fedora 32 with this patch can be found here - https://download.copr.fedorainfracloud.org/results/numans/ovn_test/fedora-32-x86_64/01811309-ovn/ https://copr.fedorainfracloud.org/coprs/numans/ovn_test/build/1811309/ @Alex - I think after testing with ovn-k8s we can confirm if ecmp-symmetric-reply works as expected or not. Thanks Fix available in upstream and in d/s version ovn2.13-20.09.0-23 tested with following script: # foo -- R1 -- join - R2 -- alice -- | # | | server # bar ---- - R3 --- bob ---- | # systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.50.26:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.50.26 systemctl restart ovn-controller ovn-nbctl lr-add R1 ovn-nbctl lr-add R2 ovn-nbctl lr-add R3 ovn-nbctl set logical_router R1 options:chassis=hv1 ovn-nbctl set logical_router R2 options:chassis=hv1 ovn-nbctl set logical_router R3 options:chassis=hv1 ovn-nbctl ls-add foo ovn-nbctl ls-add bar ovn-nbctl ls-add alice ovn-nbctl ls-add bob ovn-nbctl ls-add join ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24 2001::1/64 ovn-nbctl lsp-add foo rp-foo -- set logical_switch_port rp-foo \ type=router options:router-port=foo addresses=\"00:00:01:01:02:03\" ovn-nbctl lrp-add R1 bar 00:00:01:01:02:04 192.168.2.1/24 2002::1/64 ovn-nbctl lsp-add bar rp-bar -- set Logical_Switch_Port rp-bar \ type=router options:router-port=bar addresses=\"00:00:01:01:02:04\" ovn-nbctl lrp-add R2 alice 00:00:02:01:02:03 172.16.1.1/24 3001::1/64 ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \ type=router options:router-port=alice addresses=\"00:00:02:01:02:03\" ovn-nbctl lrp-add R3 bob 00:00:03:01:02:03 172.17.1.1/24 3002::1/64 ovn-nbctl lsp-add bob rp-bob -- set Logical_Switch_Port rp-bob \ type=router options:router-port=bob addresses=\"00:00:03:01:02:03\" ovn-nbctl lrp-add R1 R1_join 00:00:04:01:02:03 20.0.0.1/24 4000::1/64 ovn-nbctl lsp-add join r1-join -- set Logical_Switch_Port r1-join \ type=router options:router-port=R1_join addresses='"00:00:04:01:02:03"' ovn-nbctl lrp-add R2 R2_join 00:00:04:01:02:04 20.0.0.2/24 4000::2/64 ovn-nbctl lsp-add join r2-join -- set Logical_Switch_Port r2-join \ type=router options:router-port=R2_join addresses='"00:00:04:01:02:04"' ovn-nbctl lrp-add R3 R3_join 00:00:04:01:02:05 20.0.0.3/24 4000::3/64 ovn-nbctl lsp-add join r3-join -- set Logical_Switch_Port r3-join \ type=router options:router-port=R3_join addresses='"00:00:04:01:02:05"' ovn-nbctl lr-route-add R2 192.168.0.0/16 20.0.0.1 ovn-nbctl lr-route-add R3 192.168.0.0/16 20.0.0.1 ovn-nbctl lr-route-add R2 2001::/64 4000::1 ovn-nbctl lr-route-add R2 2002::/64 4000::1 ovn-nbctl lr-route-add R3 2001::/64 4000::1 ovn-nbctl lr-route-add R3 2002::/64 4000::1 ovn-nbctl lr-route-add R2 1.1.1.0/24 172.16.1.3 ovn-nbctl lr-route-add R3 1.1.1.0/24 172.17.1.4 ovn-nbctl lr-route-add R2 1111::/64 3001::3 ovn-nbctl lr-route-add R3 1111::/64 3002::4 ip netns add foo1 ovs-vsctl add-port br-int foo1 -- set interface foo1 type=internal ip link set foo1 netns foo1 ip netns exec foo1 ip link set foo1 address f0:00:00:01:02:03 ip netns exec foo1 ip link set foo1 up ip netns exec foo1 ip addr add 192.168.1.2/24 dev foo1 ip netns exec foo1 ip -6 addr add 2001::2/64 dev foo1 ip netns exec foo1 ip route add default via 192.168.1.1 dev foo1 ip netns exec foo1 ip -6 route add default via 2001::1 dev foo1 ovs-vsctl set interface foo1 external_ids:iface-id=foo1 ovn-nbctl lsp-add foo foo1 -- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2 2001::2" ip netns add bar1 ip link add bar1 netns bar1 type veth peer name bar1_br ip netns exec bar1 ip link set bar1 address f0:00:00:01:02:05 ip netns exec bar1 ip link set bar1 up ip netns exec bar1 ip addr add 192.168.2.2/24 dev bar1 ip netns exec bar1 ip -6 addr add 2002::2/64 dev bar1 ip netns exec bar1 ip route add default via 192.168.2.1 dev bar1 ip netns exec bar1 ip -6 route add default via 2002::1 dev bar1 ip link set bar1_br up ovs-vsctl add-port br-int bar1_br ovs-vsctl set interface bar1_br external_ids:iface-id=bar1 ovn-nbctl lsp-add bar bar1 -- lsp-set-addresses bar1 "f0:00:00:01:02:05 192.168.2.2 2002::2" ovs-vsctl add-br br_alice ovs-vsctl add-br br_bob ovs-vsctl set open . external-ids:ovn-bridge-mappings=net_alice:br_alice,net_bob:br_bob ovn-nbctl lsp-add alice ln_alice ovn-nbctl lsp-set-type ln_alice localnet ovn-nbctl lsp-set-addresses ln_alice unknown ovn-nbctl lsp-set-options ln_alice network_name=net_alice ip netns add alice1 ovs-vsctl add-port br_alice alice1 -- set interface alice1 type=internal ip link set alice1 netns alice1 ip netns exec alice1 ip link set alice1 address f0:00:00:01:02:04 ip netns exec alice1 ip link set alice1 up ip netns exec alice1 ip addr add 172.16.1.3/24 dev alice1 ip netns exec alice1 ip -6 addr add 3001::3/64 dev alice1 ip netns exec alice1 ip route add default via 172.16.1.1 dev alice1 ip netns exec alice1 ip -6 route add default via 3001::1 dev alice1 ovn-nbctl lsp-add bob ln_bob ovn-nbctl lsp-set-type ln_bob localnet ovn-nbctl lsp-set-addresses ln_bob unknown ovn-nbctl lsp-set-options ln_bob network_name=net_bob ip netns add bob1 ip link add bob1 netns bob1 type veth peer name bob1_br ip netns exec bob1 ip link set bob1 address f0:00:00:01:02:06 ip netns exec bob1 ip link set bob1 up ip netns exec bob1 ip addr add 172.17.1.4/24 dev bob1 ip netns exec bob1 ip -6 addr add 3002::4/64 dev bob1 ip netns exec bob1 ip route add default via 172.17.1.1 dev bob1 ip netns exec bob1 ip -6 route add default via 3002::1 dev bob1 ip link set bob1_br up ovs-vsctl add-port br_bob bob1_br ip link add br_test type bridge ip link set br_test up ip link add a1 netns alice1 type veth peer name a1_br ip link add b1 netns bob1 type veth peer name b1_br ip link set a1_br master br_test ip link set b1_br master br_test ip link set a1_br up ip link set b1_br up ip netns exec alice1 ip link set a1 up ip netns exec bob1 ip link set b1 up ip netns exec alice1 ip addr add 1.1.1.1/24 dev a1 ip netns exec alice1 ip -6 addr add 1111::1/64 dev a1 ip netns exec bob1 ip addr add 1.1.1.2/24 dev b1 ip netns exec bob1 ip -6 addr add 1111::2/64 dev b1 ip netns exec alice1 sysctl -w net.ipv4.conf.all.forwarding=1 ip netns exec bob1 sysctl -w net.ipv4.conf.all.forwarding=1 ip netns exec alice1 sysctl -w net.ipv6.conf.all.forwarding=1 ip netns exec bob1 sysctl -w net.ipv6.conf.all.forwarding=1 ip netns add server ip link add s1 netns server type veth peer name s1_br ip link set s1_br master br_test ip link set s1_br up ip netns exec server ip link set s1 up ip netns exec server ip addr add 1.1.1.10/24 dev s1 ip netns exec server ip route add default via 1.1.1.1 dev s1 ip netns exec server ip -6 addr add 1111::10/64 dev s1 ip netns exec server ip -6 route add default via 1111::1 dev s1 ip netns exec server sysctl -w net.ipv4.conf.all.rp_filter=0 ip netns exec server sysctl -w net.ipv4.conf.default.rp_filter=0 ovn-nbctl lr-route-add R1 0.0.0.0/0 20.0.0.1 ovn-nbctl lr-route-add R1 ::/0 4000::1 ovn-nbctl lr-policy-add R1 1000 'ip4 && ip4.dst == 1.1.1.10' reroute 20.0.0.2,20.0.0.3 ovn-nbctl lr-policy-add R1 1000 'ip6 && ip6.dst == 1111::10' reroute 4000::2,4000::3 result on 20.12.0-1: [root@wsfd-advnetlab18 bz1881826]# rpm -qa | grep -E "openvswitch2.13|ovn2.13" ovn2.13-central-20.12.0-1.el8fdp.x86_64 openvswitch2.13-2.13.0-77.el8fdp.x86_64 ovn2.13-host-20.12.0-1.el8fdp.x86_64 python3-openvswitch2.13-2.13.0-77.el8fdp.x86_64 ovn2.13-20.12.0-1.el8fdp.x86_64 [root@wsfd-advnetlab18 bz1881826]# ip netns exec foo1 ping 1.1.1.10 -c 3 PING 1.1.1.10 (1.1.1.10) 56(84) bytes of data. 64 bytes from 1.1.1.10: icmp_seq=1 ttl=61 time=2.62 ms 64 bytes from 1.1.1.10: icmp_seq=2 ttl=61 time=0.116 ms 64 bytes from 1.1.1.10: icmp_seq=3 ttl=61 time=0.103 ms --- 1.1.1.10 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 30ms rtt min/avg/max/mdev = 0.103/0.947/2.624/1.185 ms [root@wsfd-advnetlab18 bz1881826]# ip netns exec bar1 ping 1.1.1.10 -c 3 PING 1.1.1.10 (1.1.1.10) 56(84) bytes of data. 64 bytes from 1.1.1.10: icmp_seq=1 ttl=61 time=2.72 ms 64 bytes from 1.1.1.10: icmp_seq=2 ttl=61 time=0.130 ms 64 bytes from 1.1.1.10: icmp_seq=3 ttl=61 time=0.108 ms --- 1.1.1.10 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 61ms rtt min/avg/max/mdev = 0.108/0.985/2.717/1.224 ms [root@wsfd-advnetlab18 bz1881826]# ip netns exec foo1 ping6 1111::10 -c 3 PING 1111::10(1111::10) 56 data bytes 64 bytes from 1111::10: icmp_seq=1 ttl=61 time=2.33 ms 64 bytes from 1111::10: icmp_seq=2 ttl=61 time=0.135 ms 64 bytes from 1111::10: icmp_seq=3 ttl=61 time=0.119 ms --- 1111::10 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 45ms rtt min/avg/max/mdev = 0.119/0.862/2.333/1.040 ms [root@wsfd-advnetlab18 bz1881826]# ip netns exec bar1 ping6 1111::10 -c 3 PING 1111::10(1111::10) 56 data bytes 64 bytes from 1111::10: icmp_seq=1 ttl=61 time=3.37 ms 64 bytes from 1111::10: icmp_seq=2 ttl=61 time=2.06 ms 64 bytes from 1111::10: icmp_seq=3 ttl=61 time=0.146 ms --- 1111::10 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 5ms rtt min/avg/max/mdev = 0.146/1.857/3.367/1.322 ms [root@wsfd-advnetlab18 bz1881826]# ip netns exec server nc -l 10011 -k & [root@wsfd-advnetlab18 bz1881826]# for i in {1..10}; do ip netns exec foo1 nc 1.1.1.10 10011 <<< h; done [root@wsfd-advnetlab18 ~]# ip netns exec bob1 tcpdump -i any -nnle src host 192.168.1.2 dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes 02:43:56.640091 In 00:00:03:01:02:03 ethertype IPv4 (0x0800), length 76: 192.168.1.2.58444 > 1.1.1.10.10011: Flags [S], seq 2845916388, win 29200, options [mss 1460,sackOK,TS val 3874618623 ecr 0,nop,wsc ale 7], length 0 02:43:56.640127 Out 5a:c6:11:43:39:e6 ethertype IPv4 (0x0800), length 76: 192.168.1.2.58444 > 1.1.1.10.10011: Flags [S], seq 2845916388, win 29200, options [mss 1460,sackOK,TS val 3874618623 ecr 0,nop,wscale 7], length 0 <=== packet through R3 to bob1 [root@wsfd-advnetlab18 ~]# ip netns exec alice1 tcpdump -i any -nnle src host 192.168.1.2 dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes 02:43:59.281978 In 00:00:02:01:02:03 ethertype IPv4 (0x0800), length 76: 192.168.1.2.58446 > 1.1.1.10.10011: Flags [S], seq 2965973518, win 29200, options [mss 1460,sackOK,TS val 3874621268 ecr 0,nop,wsc ale 7], length 0 02:43:59.282013 Out fa:b7:41:3b:1a:e9 ethertype IPv4 (0x0800), length 76: 192.168.1.2.58446 > 1.1.1.10.10011: Flags [S], seq 2965973518, win 29200, options [mss 1460,sackOK,TS val 3874621268 ecr 0,nop,wsc ale 7], length 0 <=== packet through R2 to alice1 [root@wsfd-advnetlab18 bz1881826]# for i in {1..10}; do ip netns exec foo1 nc 1111::10 10011 <<< h; done [root@wsfd-advnetlab18 ~]# ip netns exec alice1 tcpdump -i any -nnle src host 2001::2 -c 10 dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes 02:48:26.698940 In 00:00:02:01:02:03 ethertype IPv6 (0x86dd), length 96: 2001::2.52668 > 1111::10.10011: Flags [S], seq 3495914229, win 28800, options [mss 1440,sackOK,TS val 1076377846 ecr 0,nop,wscale 7], length 0 02:48:26.698972 Out fa:b7:41:3b:1a:e9 ethertype IPv6 (0x86dd), length 96: 2001::2.52668 > 1111::10.10011: Flags [S], seq 3495914229, win 28800, options [mss 1440,sackOK,TS val 1076377846 ecr 0,nop,wscale 7], length 0 [root@wsfd-advnetlab18 ~]# ip netns exec bob1 tcpdump -i any -nnle src host 2001::2 -c 10 dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes 02:48:26.604092 In 00:00:03:01:02:03 ethertype IPv6 (0x86dd), length 96: 2001::2.52664 > 1111::10.10011: Flags [S], seq 1766838939, win 28800, options [mss 1440,sackOK,TS val 1076377751 ecr 0,nop,wscale 7], length 0 02:48:26.604125 Out 5a:c6:11:43:39:e6 ethertype IPv6 (0x86dd), length 96: 2001::2.52664 > 1111::10.10011: Flags [S], seq 1766838939, win 28800, options [mss 1440,sackOK,TS val 1076377751 ecr 0,nop,wscale 7], length 0 it doesn't work as ecmp-symmetric-reply: [root@wsfd-advnetlab18 bz1881826]# ip netns exec foo1 nc -l 10120 -k & [root@wsfd-advnetlab18 bz1881826]# for i in {1..10}; do ip netns exec server nc 192.168.1.2 10120 <<< h; done [root@wsfd-advnetlab18 ~]# ip netns exec bob1 tcpdump -i any -nnle src host 192.168.1.2 -c 10 dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes 02:49:37.718879 In 00:00:03:01:02:03 ethertype IPv4 (0x0800), length 76: 192.168.1.2.10120 > 1.1.1.10.37856: Flags [S.], seq 1970308250, ack 115583270, win 28960, options [mss 1460,sackOK,TS val 3874959678 ecr 4272757522,nop,wscale 7], length 0 02:49:37.718911 Out 5a:c6:11:43:39:e6 ethertype IPv4 (0x0800), length 76: 192.168.1.2.10120 > 1.1.1.10.37856: Flags [S.], seq 1970308250, ack 115583270, win 28960, options [mss 1460,sackOK,TS val 3874959678 ecr 4272757522,nop,wscale 7], length 0 <=== still get packets on bob1 which is sent through R3 also Verified on rhel7 version: [root@wsfd-advnetlab19 bz1881826]# rpm -qa | grep -E "openvswitch2.13|ovn2.13" openvswitch2.13-2.13.0-70.el7fdp.x86_64 ovn2.13-20.12.0-1.el7fdp.x86_64 ovn2.13-host-20.12.0-1.el7fdp.x86_64 python3-openvswitch2.13-2.13.0-70.el7fdp.x86_64 ovn2.13-central-20.12.0-1.el7fdp.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0407 |