In the test performed in https://bugzilla.redhat.com/show_bug.cgi?id=1859924 : each of 2000 ACLs produces 8 lflows for each of 100 logical switches ( --> logical datapaths), i.e 1.6M of lflows in total. This takes ~800MB of a disk space. But these lflows are equal for each logical datapath, i.e 'logical_datapath' column is the only difference between DB rows. If we could store somehow only one lflow for all of them, this will practically reduce the memory consumption of SB DB by almost 100 times. i.e. 800 MB --> 8 MB (I'm speculating here, but the real impact should not be much different, IIUC) Possible solutions: 1. Create a new Datapath_Group table with rows - sets of logical datapaths, and have a reference to datapath group from the logical flow. 2. Have a special mark in ACL that this ACL should be applied to all datapaths at once. --> This approach was suggested by Han, but it might not cover cases where ACL assigned to a Port Group that doesn't cover all logical datapaths and we will have to duplicate logical flows anyway. Solution #1 sounds more flexible, but it's not very clear how to efficiently implement it inside ovn-northd. Variants are: a. It might make sense to just combine all the ACL related logical flows for one Port Group, this will cover ovn-k8s usecase if they will start using Port Groups for reject ACLs. b. Another option is that we could apply this schema not only to ACL flows, but to all logical flows. We could actually do this blindly inside ovn_lflow_add_at. We just need to check if we already have exactly same flow but for different logical datapath. However, not constructing them at all might save a lot of processing time. --- There were some discussions that ovn-k8s might avoid creation of reject ACLs for all logical switches by setting up gateway ports, but this still should make sense to combine other types of flows. For example Sb DB from the above test had 318 flows that looks exactly like this: "match":"1","pipeline":"ingress","logical_datapath":["uuid","<some-uuid>"],\ "external_ids":["map",[["source","ovn-northd.c:5560"],["stage-name","ls_in_lb"]]],\ "table_id":9,"actions":"next;" There are 2.5K flows that has 'match:1' for different stages for logical switches. It's 8 groups with 318 flows in each that only differs by logical datapath. And 9 more groups with same 'match:1' for different logical router stages with ~100-200 lfows in each. 1.2K lflows in total. There are 100 flows like this: "match":"((ip4.src == 10.129.8.10 && tcp.src == 8080)) && ip4.dst == 172.30.194.88",\ "pipeline":"ingress","logical_datapath":["uuid","<some-uuid>"],"priority":1,\ "external_ids":["map",[["source","ovn-northd.c:5646"],["stage-hint","98416181"],\ ["stage-name","ls_in_pre_hairpin"]]],"table_id":11,"actions":"reg0[6] = 1; ct_snat;"} 32K of such flows in total. 100 for each ip4.src/dst pair. There are 100 flows like this: "match":"reg9[3] == 1 || reg9[2] == 1","pipeline":"ingress","logical_datapath":\ ["uuid","<some-uuid>"],"priority":100,"external_ids":["map",[["source","ovn-northd.c:8009"],\ ["stage-name","lr_in_learn_neighbor"]]],"table_id":2,"actions":"next;"} And so on.
This week I tried to implement a PoC patch for this, i.e. to have a separate Logical Datapath Group table in Southbound DB and have a reference to it from the logical flow. Changes in northd are mostly straightforward and doesn't require much work (I started with 1 datapath group per lflow to simplify things for PoC, i.e. to not have several lfows referencing same datapath group). But ovn-controller turned out to be too hard to modify. Few main issues of the ovn-controller implementation: 1. ovn-controller works directly on the database all the time, it doesn't have any abstraction layer that could hide database schema modifications. Almost all parts of ovn-controller should be modified instead. 2. Unfortunately, almost all parts of lflow processing code depends on fact that each lflow corresponds to exactly one logical datapath. Even match expression parsing code requires to know for which logical datapath this expression is parse. Because of this it's required to re-write half of the lflow processing code in order to implement the change. So, because of the current ovn-controller code design it's very hard to implement the feature. Dumitru said that he will take a look on how to remove dependency on logical datapath from the expression parsing code. I'll wait for this to happen before continuing this work. Current unfinished PoC patches available here: https://github.com/igsilya/ovn/tree/logical-datapath-group
(In reply to Ilya Maximets from comment #1) [...] > > 2. Unfortunately, almost all parts of lflow processing code depends on fact > that each lflow corresponds to exactly one logical datapath. Even match > expression parsing code requires to know for which logical datapath this > expression is parse. Because of this it's required to re-write half of > the lflow processing code in order to implement the change. As discussed offline, even though it's not ideal that match expression parsing depends on the datapath, I think we still need to call expr_to_matches() for each logical datapath when parsing the match expression for a flow because it filters out the ports that are not part of the datapath (e.g., if a port group includes ports from different logical switches): https://github.com/ovn-org/ovn/blob/e14b52a9f6ff455ccc0c8e6616ab474baa14643f/controller/lflow.c#L907 > > So, because of the current ovn-controller code design it's very hard to > implement the feature. Dumitru said that he will take a look on how to > remove > dependency on logical datapath from the expression parsing code. I'll wait > for this to happen before continuing this work. > I started looking at options but this might turn out to be quite a large change. As discussed offline, it's probably worth unblocking the PoC and just call consider_logical_flow() for every logical datapath referred by the logical_flow. This means that ovn-controller will be as inefficient as it is today (parsing the same flow multiple times for multiple datapaths) but it should show if the optimization is worth to alleviate the load on the Southbound DB. Once bug 1897201 is fixed we can take this a step further and optimize ovn-controller so that it parses a logical flow match exactly once. > Current unfinished PoC patches available here: > https://github.com/igsilya/ovn/tree/logical-datapath-group
Thanks, Dumitru, for looking at the issue and summarizing our discussion here. I took the approach of calling consider_logical_flow() for every logical datapath referred by the logical_flow. This seems to work functionally. Patches sent upstream for review: https://patchwork.ozlabs.org/project/ovn/list/?series=214426 Fix allowes to reduce size of a database and number of logical flows in DBs from BZ 1859924 up to ~5.5x times. Few unit tests are failing, I need to look at them. Also, I didn't run any performance tests yet. And this is a required next step because processing steps in both northd and ovn-controller were modified.
v2 of the patch set sent upstream: https://patchwork.ozlabs.org/project/ovn/list/?series=218369&state=* Together with Dumitru we fixed or worked around all the problems, so that this version is fully functional and can be accepted as is. Waiting for review. However, while debugging, we discovered many problems with the current implementation of ovn-controller that prevented us from implementing a better solution for handling datapath groups. Basically, everything that is said in comment #1 is true or even worse. Since we were unable to implement incremental processing in the ovn-controller that would only perform the necessary flow updates (code re-considers logical flow for all datapaths in a group instead of re-considering only for new/deleted datapath), this feature may have corner cases with poor performance on the ovn controller side. Presumably, such a case could be if we had many logical datapaths served by one ovn-controller. IIUC, this is not the case for ovn-k8s, but may be for OpenStack, so enabling this feature should be evaluated for specific use case/reference architecture. Feature could be enabled by updating NB_Global options: ovn-nbctl set NB_Global . options:use_logical_dp_groups=true ovn-controller needs a big rework, e.g. abstraction layer between the database and the application logic and many other stuff, to be flexible and extendible.
v3 of the series is: https://patchwork.ozlabs.org/project/ovn/list/?series=218683&archive=both&state=*
Current status update: - v3 of the patch-set merged upstream: https://patchwork.ozlabs.org/project/ovn/list/?series=218683&archive=both&state=* - One follow-up fix for multicast flows also merged: https://patchwork.ozlabs.org/project/ovn/patch/20201207115106.533958-1-i.maximets@ovn.org/ - One more fix for the performance regression in case where datapath groups disabled sent to the mail list and reviewed. Waiting for acceptance: https://patchwork.ozlabs.org/project/ovn/patch/20201211180115.3829253-1-i.maximets@ovn.org/
Everything is merged upstream and should be available in ovn2.13-20.12.0-1 package.
build topo with following script: server: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.163.101:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.163.101 systemctl restart ovn-controller ip netns add server0 ip link add veth0_s0 netns server0 type veth peer name veth0_s0_p ip netns exec server0 ip link set lo up ip netns exec server0 ip link set veth0_s0 up ip netns exec server0 ip link set veth0_s0 address 00:00:00:01:01:02 ip netns exec server0 ip addr add 192.168.1.1/24 dev veth0_s0 ip netns exec server0 ip -6 addr add 2001::1/64 dev veth0_s0 ip netns exec server0 ip route add default via 192.168.1.254 dev veth0_s0 ip netns exec server0 ip -6 route add default via 2001::a dev veth0_s0 ovs-vsctl add-port br-int veth0_s0_p ip link set veth0_s0_p up ovs-vsctl set interface veth0_s0_p external_ids:iface-id=ls1p1 ovn-nbctl ls-add ls1 ovn-nbctl lsp-add ls1 ls1p1 #ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:02 2001::1 192.168.1.1" ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:02 192.168.1.1 2001::1" ovn-nbctl lsp-add ls1 ls1p2 ovn-nbctl lsp-set-addresses ls1p2 "00:00:00:01:02:02 192.168.1.2 2001::2" ovn-nbctl lr-add lr1 ovn-nbctl lrp-add lr1 lr1-ls1 00:00:00:00:00:01 192.168.1.254/24 2001::a/64 ovn-nbctl lsp-add ls1 ls1-lr1 ovn-nbctl lsp-set-addresses ls1-lr1 "00:00:00:00:00:01 192.168.1.254 2001::a" ovn-nbctl lsp-set-type ls1-lr1 router ovn-nbctl lsp-set-options ls1-lr1 router-port=lr1-ls1 ovn-nbctl lrp-add lr1 lr1-ls2 00:00:00:00:00:02 192.168.2.254/24 2002::a/64 ovn-nbctl ls-add ls2 ovn-nbctl lsp-add ls2 ls2-lr1 ovn-nbctl lsp-set-addresses ls2-lr1 "00:00:00:00:00:02 192.168.2.254 2002::a" ovn-nbctl lsp-set-type ls2-lr1 router ovn-nbctl lsp-set-options ls2-lr1 router-port=lr1-ls2 ovn-nbctl lsp-add ls2 ls2p1 ovn-nbctl lsp-set-addresses ls2p1 "00:00:00:02:01:02 192.168.2.1 2002::1" ovn-nbctl lsp-add ls1 ls1p3 ovn-nbctl lsp-set-addresses ls1p3 "00:00:00:01:03:02 192.168.1.3 2001::3" ovn-nbctl lb-add lb0 192.168.1.100 192.168.1.1,192.168.1.2 ovn-nbctl ls-lb-add ls2 lb0 client: #!/bin/bash systemctl start openvswitch ovs-vsctl set open . external_ids:system-id=hv0 external_ids:ovn-remote=tcp:20.0.163.101:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.163.13 systemctl start ovn-controller ip netns add server1 ip link add veth0_s1 netns server1 type veth peer name veth0_s1_p ip netns exec server1 ip link set lo up ip netns exec server1 ip link set veth0_s1 up ip netns exec server1 ip link set veth0_s1 address 00:00:00:01:02:02 ip netns exec server1 ip addr add 192.168.1.2/24 dev veth0_s1 ip netns exec server1 ip -6 addr add 2001::2/64 dev veth0_s1 ip netns exec server1 ip route add default via 192.168.1.254 dev veth0_s1 ip netns exec server1 ip -6 route add default via 2001::a dev veth0_s1 ovs-vsctl add-port br-int veth0_s1_p ip link set veth0_s1_p up ovs-vsctl set interface veth0_s1_p external_ids:iface-id=ls1p2 ip netns add client0 ip link add veth0_c0 netns client0 type veth peer name veth0_c0_p ip netns exec client0 ip link set lo up ip netns exec client0 ip link set veth0_c0 up ip netns exec client0 ip link set veth0_c0 address 00:00:00:02:01:02 ip netns exec client0 ip addr add 192.168.2.1/24 dev veth0_c0 ip netns exec client0 ip -6 addr add 2002::1/64 dev veth0_c0 ip netns exec client0 ip route add default via 192.168.2.254 dev veth0_c0 ip netns exec client0 ip -6 route add default via 2002::a dev veth0_c0 ovs-vsctl add-port br-int veth0_c0_p ip link set veth0_c0_p up ovs-vsctl set interface veth0_c0_p external_ids:iface-id=ls2p1 result on 20.12.0-1: [root@wsfd-advnetlab16 bz1877002]# ovn-nbctl list nb_global _uuid : 6843ccad-a484-4449-808a-9c111a2cf235 connections : [3e7e5245-8b15-4c82-a23e-191e926cf419] external_ids : {} hv_cfg : 0 hv_cfg_timestamp : 0 ipsec : false name : "" nb_cfg : 0 nb_cfg_timestamp : 0 options : {mac_prefix="2e:9e:70", max_tunid="16711680", northd_internal_version="20.12.0-20.12.0-51.0", svc_monitor_mac="42:e7:5e:38:5a:16"} sb_cfg : 0 sb_cfg_timestamp : 0 ssl : [] [root@wsfd-advnetlab16 bz1877002]# ovn-sbctl list logical_dp_group <=== no logical_dp_group [root@wsfd-advnetlab16 bz1877002]# ovn-sbctl list logical_flow | wc -l 3343 [root@wsfd-advnetlab16 bz1877002]# ovn-nbctl --wait=hv set NB_Global . options:use_logical_dp_groups=true [root@wsfd-advnetlab16 bz1877002]# ovn-sbctl list logical_flow | wc -l 2661 <==== less flows [root@wsfd-advnetlab16 bz1877002]# ovn-sbctl list logical_dp_group _uuid : 944c3cf1-0b44-4d70-8c1b-c9ce0659db0c datapaths : [61dc573f-5e71-42ea-882f-96f3f4a29d7f, e472006a-aed2-4d82-9028-9bfacbac383f] [root@wsfd-advnetlab16 bz1877002]# ovn-sbctl list datapath_binding _uuid : ac83c9d2-8e94-4b45-96a6-7254b61ae556 external_ids : {logical-router="c790eba4-c177-49e6-a142-943df5e7afa1", name=lr1} load_balancers : [] tunnel_key : 2 _uuid : e472006a-aed2-4d82-9028-9bfacbac383f external_ids : {logical-switch="9ad7ebac-ad6d-4e81-9c40-1e45c712d660", name=ls1} load_balancers : [] tunnel_key : 1 _uuid : 61dc573f-5e71-42ea-882f-96f3f4a29d7f external_ids : {logical-switch="1e9421e7-b10e-4c74-aa76-4d40b31dbf6e", name=ls2} load_balancers : [2bf87375-7e32-43ec-bfad-85b8f58e40ed] tunnel_key : 3 <==== logical_dp_group contains dp for ls1 and ls2 [root@wsfd-advnetlab16 bz1877002]# ovn-sbctl find logical_flow logical_dp_group=944c3cf1-0b44-4d70-8c1b-c9ce0659db0c | wc -l 681 <=== flows contain the logical_dp_group [root@wsfd-advnetlab19 bz1877002]# ip netns exec client0 ping 192.168.1.3 -c 1 PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data. 64 bytes from 192.168.1.3: icmp_seq=1 ttl=63 time=6.42 ms --- 192.168.1.3 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 6.425/6.425/6.425/0.000 ms [root@wsfd-advnetlab19 bz1877002]# ip netns exec client0 ping 192.168.1.2 -c 1 PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. 64 bytes from 192.168.1.2: icmp_seq=1 ttl=63 time=1.60 ms --- 192.168.1.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.603/1.603/1.603/0.000 ms [root@wsfd-advnetlab19 bz1877002]# ip netns exec client0 ping 192.168.1.1 -c 1 PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. 64 bytes from 192.168.1.1: icmp_seq=1 ttl=63 time=1.69 ms --- 192.168.1.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.698/1.698/1.698/0.000 ms [root@wsfd-advnetlab19 bz1877002]# ip netns exec client0 ping 192.168.1.100 -c 1 PING 192.168.1.100 (192.168.1.100) 56(84) bytes of data. 64 bytes from 192.168.1.100: icmp_seq=1 ttl=63 time=1.37 ms --- 192.168.1.100 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.378/1.378/1.378/0.000 ms <== ping works well [root@wsfd-advnetlab16 bz1877002]# rpm -qa | grep -E "ovn2.13|openvswitch2.13" openvswitch2.13-2.13.0-71.el7fdp.x86_64 ovn2.13-host-20.12.0-1.el7fdp.x86_64 ovn2.13-central-20.12.0-1.el7fdp.x86_64 ovn2.13-20.12.0-1.el7fdp.x86_64
set VERIFIED per comment 8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0836