Bug 1877002

Summary: [OVN SCALE] Combine Logical Flows inside Southbound DB.
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Ilya Maximets <i.maximets>
Component: OVNAssignee: Ilya Maximets <i.maximets>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: high Docs Contact:
Priority: high    
Version: FDP 20.ECC: ctrautma, dblack, dcbw, dceara, jlema, mark.d.gray, nusiddiq, rsevilla
Target Milestone: ---Keywords: TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: aos-scalability-47
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1943314 (view as bug list) Environment:
Last Closed: 2021-03-15 14:36:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1859924, 1903265, 1943314    

Description Ilya Maximets 2020-09-08 17:01:16 UTC
In the test performed in https://bugzilla.redhat.com/show_bug.cgi?id=1859924 :
each of 2000 ACLs produces 8 lflows for each of 100 logical switches
( --> logical datapaths), i.e 1.6M of lflows in total.  This takes ~800MB of a
disk space.

But these lflows are equal for each logical datapath, i.e 'logical_datapath'
column is the only difference between DB rows.  If we could store somehow
only one lflow for all of them, this will practically reduce the memory
consumption of SB DB by almost 100 times. i.e. 800 MB --> 8 MB (I'm speculating
here, but the real impact should not be much different, IIUC)

Possible solutions:

1. Create a new Datapath_Group table with rows - sets of logical datapaths,
   and have a reference to datapath group from the logical flow.

2. Have a special mark in ACL that this ACL should be applied to all datapaths
   at once.
   --> This approach was suggested by Han, but it might not cover cases where
   ACL assigned to a Port Group that doesn't cover all logical datapaths and
   we will have to duplicate logical flows anyway.
   
Solution #1 sounds more flexible, but it's not very clear how to efficiently
implement it inside ovn-northd.  Variants are:

a. It might make sense to just combine all the ACL related logical flows for
   one Port Group, this will cover ovn-k8s usecase if they will start using Port
   Groups for reject ACLs.
   
b. Another option is that we could apply this schema not only to ACL flows, but
   to all logical flows.  We could actually do this blindly inside
   ovn_lflow_add_at.  We just need to check if we already have exactly same flow
   but for different logical datapath.  However, not constructing them at all
   might save a lot of processing time.

---

There were some discussions that ovn-k8s might avoid creation of reject ACLs
for all logical switches by setting up gateway ports, but this still should
make sense to combine other types of flows.  For example Sb DB from the above
test had 318 flows that looks exactly like this:

"match":"1","pipeline":"ingress","logical_datapath":["uuid","<some-uuid>"],\
"external_ids":["map",[["source","ovn-northd.c:5560"],["stage-name","ls_in_lb"]]],\
"table_id":9,"actions":"next;"

There are 2.5K flows that has 'match:1' for different stages for logical switches.
It's 8 groups with 318 flows in each that only differs by logical datapath.

And 9 more groups with same 'match:1' for different logical router stages with
~100-200 lfows in each.  1.2K lflows in total.

There are 100 flows like this:

"match":"((ip4.src == 10.129.8.10 && tcp.src == 8080)) && ip4.dst == 172.30.194.88",\
"pipeline":"ingress","logical_datapath":["uuid","<some-uuid>"],"priority":1,\
"external_ids":["map",[["source","ovn-northd.c:5646"],["stage-hint","98416181"],\
["stage-name","ls_in_pre_hairpin"]]],"table_id":11,"actions":"reg0[6] = 1; ct_snat;"}

32K of such flows in total. 100 for each ip4.src/dst pair.

There are 100 flows like this:

"match":"reg9[3] == 1 || reg9[2] == 1","pipeline":"ingress","logical_datapath":\
["uuid","<some-uuid>"],"priority":100,"external_ids":["map",[["source","ovn-northd.c:8009"],\
["stage-name","lr_in_learn_neighbor"]]],"table_id":2,"actions":"next;"}

And so on.

Comment 1 Ilya Maximets 2020-11-12 14:37:52 UTC
This week I tried to implement a PoC patch for this, i.e. to have a separate
Logical Datapath Group table in Southbound DB and have a reference to it
from the logical flow.  Changes in northd are mostly straightforward and doesn't
require much work (I started with 1 datapath group per lflow to simplify things
for PoC, i.e. to not have several lfows referencing same datapath group).
But ovn-controller turned out to be too hard to modify.

Few main issues of the ovn-controller implementation:

1. ovn-controller works directly on the database all the time, it doesn't have
   any abstraction layer that could hide database schema modifications.  Almost
   all parts of ovn-controller should be modified instead.

2. Unfortunately, almost all parts of lflow processing code depends on fact
   that each lflow corresponds to exactly one logical datapath.  Even match
   expression parsing code requires to know for which logical datapath this
   expression is parse.  Because of this it's required to re-write half of
   the lflow processing code in order to implement the change.

So, because of the current ovn-controller code design it's very hard to
implement the feature.  Dumitru said that he will take a look on how to remove
dependency on logical datapath from the expression parsing code.  I'll wait
for this to happen before continuing this work.

Current unfinished PoC patches available here:
https://github.com/igsilya/ovn/tree/logical-datapath-group

Comment 2 Dumitru Ceara 2020-11-13 17:12:22 UTC
(In reply to Ilya Maximets from comment #1)
[...]
> 
> 2. Unfortunately, almost all parts of lflow processing code depends on fact
>    that each lflow corresponds to exactly one logical datapath.  Even match
>    expression parsing code requires to know for which logical datapath this
>    expression is parse.  Because of this it's required to re-write half of
>    the lflow processing code in order to implement the change.

As discussed offline, even though it's not ideal that match expression parsing
depends on the datapath, I think we still need to call expr_to_matches() for each
logical datapath when parsing the match expression for a flow because it
filters out the ports that are not part of the datapath (e.g., if a port
group includes ports from different logical switches):

https://github.com/ovn-org/ovn/blob/e14b52a9f6ff455ccc0c8e6616ab474baa14643f/controller/lflow.c#L907

> 
> So, because of the current ovn-controller code design it's very hard to
> implement the feature.  Dumitru said that he will take a look on how to
> remove
> dependency on logical datapath from the expression parsing code.  I'll wait
> for this to happen before continuing this work.
> 

I started looking at options but this might turn out to be quite a large change.
As discussed offline, it's probably worth unblocking the PoC and just call
consider_logical_flow() for every logical datapath referred by the logical_flow.

This means that ovn-controller will be as inefficient as it is today (parsing
the same flow multiple times for multiple datapaths) but it should show if the
optimization is worth to alleviate the load on the Southbound DB.

Once bug 1897201 is fixed we can take this a step further and optimize
ovn-controller so that it parses a logical flow match exactly once.

> Current unfinished PoC patches available here:
> https://github.com/igsilya/ovn/tree/logical-datapath-group

Comment 3 Ilya Maximets 2020-11-16 11:56:33 UTC
Thanks, Dumitru, for looking at the issue and summarizing our discussion here.
I took the approach of calling consider_logical_flow() for every logical
datapath referred by the logical_flow.  This seems to work functionally.

Patches sent upstream for review:
  https://patchwork.ozlabs.org/project/ovn/list/?series=214426

Fix allowes to reduce size of a database and number of logical flows in DBs
from BZ 1859924 up to ~5.5x times.

Few unit tests are failing, I need to look at them.
Also, I didn't run any performance tests yet.  And this is a required next
step because processing steps in both northd and ovn-controller were modified.

Comment 4 Ilya Maximets 2020-12-03 20:28:35 UTC
v2 of the patch set sent upstream:
  https://patchwork.ozlabs.org/project/ovn/list/?series=218369&state=*

Together with Dumitru we fixed or worked around all the problems, so that this
version is fully functional and can be accepted as is. Waiting for review.

However, while debugging, we discovered many problems with the current
implementation of ovn-controller that prevented us from implementing a better
solution for handling datapath groups.  Basically, everything that is said in
comment #1 is true or even worse.  Since we were unable to implement incremental
processing in the ovn-controller that would only perform the necessary flow
updates (code re-considers logical flow for all datapaths in a group instead
of re-considering only for new/deleted datapath), this feature may have corner
cases with poor performance on the ovn controller side.  Presumably, such a case
could be if we had many logical datapaths served by one ovn-controller.  IIUC,
this is not the case for ovn-k8s, but may be for OpenStack, so enabling this
feature should be evaluated for specific use case/reference architecture.

Feature could be enabled by updating NB_Global options:
  ovn-nbctl set NB_Global . options:use_logical_dp_groups=true

ovn-controller needs a big rework, e.g. abstraction layer between the database
and the application logic and many other stuff, to be flexible and extendible.

Comment 5 Dan Williams 2020-12-04 21:15:05 UTC
v3 of the series is:

https://patchwork.ozlabs.org/project/ovn/list/?series=218683&archive=both&state=*

Comment 6 Ilya Maximets 2020-12-11 21:05:24 UTC
Current status update:

- v3 of the patch-set merged upstream:
  https://patchwork.ozlabs.org/project/ovn/list/?series=218683&archive=both&state=*

- One follow-up fix for multicast flows also merged:
  https://patchwork.ozlabs.org/project/ovn/patch/20201207115106.533958-1-i.maximets@ovn.org/

- One more fix for the performance regression in case where datapath groups
  disabled sent to the mail list and reviewed.  Waiting for acceptance:
  https://patchwork.ozlabs.org/project/ovn/patch/20201211180115.3829253-1-i.maximets@ovn.org/

Comment 7 Ilya Maximets 2021-01-04 14:10:39 UTC
Everything is merged upstream and should be available in ovn2.13-20.12.0-1 package.

Comment 8 Jianlin Shi 2021-01-07 03:40:05 UTC
build topo with following script:

server:

systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641                                                                    
ovn-sbctl set-connection ptcp:6642                                                                    
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.163.101:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.163.101                                    
systemctl restart ovn-controller
ip netns add server0
ip link add veth0_s0 netns server0 type veth peer name veth0_s0_p
ip netns exec server0 ip link set lo up
ip netns exec server0 ip link set veth0_s0 up
ip netns exec server0 ip link set veth0_s0 address 00:00:00:01:01:02
ip netns exec server0 ip addr add 192.168.1.1/24 dev veth0_s0
ip netns exec server0 ip -6 addr add 2001::1/64 dev veth0_s0                                          
ip netns exec server0 ip route add default via 192.168.1.254 dev veth0_s0
ip netns exec server0 ip -6 route add default via 2001::a dev veth0_s0
ovs-vsctl add-port br-int veth0_s0_p                                                                  
ip link set veth0_s0_p up
ovs-vsctl set interface veth0_s0_p external_ids:iface-id=ls1p1                                        

ovn-nbctl ls-add ls1
ovn-nbctl lsp-add ls1 ls1p1
#ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:02 2001::1 192.168.1.1"
ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:02 192.168.1.1 2001::1"
ovn-nbctl lsp-add ls1 ls1p2
ovn-nbctl lsp-set-addresses ls1p2 "00:00:00:01:02:02 192.168.1.2 2001::2"                             
ovn-nbctl lr-add lr1
ovn-nbctl lrp-add lr1 lr1-ls1 00:00:00:00:00:01 192.168.1.254/24 2001::a/64                           
ovn-nbctl lsp-add ls1 ls1-lr1
ovn-nbctl lsp-set-addresses ls1-lr1 "00:00:00:00:00:01 192.168.1.254 2001::a"
ovn-nbctl lsp-set-type ls1-lr1 router
ovn-nbctl lsp-set-options ls1-lr1 router-port=lr1-ls1                                                 

ovn-nbctl lrp-add lr1 lr1-ls2 00:00:00:00:00:02 192.168.2.254/24 2002::a/64

ovn-nbctl ls-add ls2
ovn-nbctl lsp-add ls2 ls2-lr1
ovn-nbctl lsp-set-addresses ls2-lr1 "00:00:00:00:00:02 192.168.2.254 2002::a"
ovn-nbctl lsp-set-type ls2-lr1 router
ovn-nbctl lsp-set-options ls2-lr1 router-port=lr1-ls2
ovn-nbctl lsp-add ls2 ls2p1
ovn-nbctl lsp-set-addresses ls2p1 "00:00:00:02:01:02 192.168.2.1 2002::1"

ovn-nbctl lsp-add ls1 ls1p3
ovn-nbctl lsp-set-addresses ls1p3 "00:00:00:01:03:02 192.168.1.3 2001::3"
ovn-nbctl lb-add lb0 192.168.1.100 192.168.1.1,192.168.1.2
ovn-nbctl ls-lb-add ls2 lb0

client:

#!/bin/bash

systemctl start openvswitch                                                                           
ovs-vsctl set open . external_ids:system-id=hv0 external_ids:ovn-remote=tcp:20.0.163.101:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.163.13

systemctl start ovn-controller

ip netns add server1
ip link add veth0_s1 netns server1 type veth peer name veth0_s1_p                                     
ip netns exec server1 ip link set lo up
ip netns exec server1 ip link set veth0_s1 up                                                         
ip netns exec server1 ip link set veth0_s1 address 00:00:00:01:02:02
ip netns exec server1 ip addr add 192.168.1.2/24 dev veth0_s1                                         
ip netns exec server1 ip -6 addr add 2001::2/64 dev veth0_s1
ip netns exec server1 ip route add default via 192.168.1.254 dev veth0_s1
ip netns exec server1 ip -6 route add default via 2001::a dev veth0_s1

ovs-vsctl add-port br-int veth0_s1_p
ip link set veth0_s1_p up
ovs-vsctl set interface veth0_s1_p external_ids:iface-id=ls1p2                                        

ip netns add client0                                                                                  
ip link add veth0_c0 netns client0 type veth peer name veth0_c0_p
ip netns exec client0 ip link set lo up
ip netns exec client0 ip link set veth0_c0 up                                                         
ip netns exec client0 ip link set veth0_c0 address 00:00:00:02:01:02                                  
ip netns exec client0 ip addr add 192.168.2.1/24 dev veth0_c0                                         
ip netns exec client0 ip -6 addr add 2002::1/64 dev veth0_c0
ip netns exec client0 ip route add default via 192.168.2.254 dev veth0_c0
ip netns exec client0 ip -6 route add default via 2002::a dev veth0_c0

ovs-vsctl add-port br-int veth0_c0_p
ip link set veth0_c0_p up                                                                             
ovs-vsctl set interface veth0_c0_p external_ids:iface-id=ls2p1


result on 20.12.0-1:


[root@wsfd-advnetlab16 bz1877002]# ovn-nbctl list nb_global                                           
_uuid               : 6843ccad-a484-4449-808a-9c111a2cf235                                            
connections         : [3e7e5245-8b15-4c82-a23e-191e926cf419]                                          
external_ids        : {}                                                                              
hv_cfg              : 0                                                                               
hv_cfg_timestamp    : 0                                                                               
ipsec               : false                                                                           
name                : ""                                                                              
nb_cfg              : 0
nb_cfg_timestamp    : 0                                                                               
options             : {mac_prefix="2e:9e:70", max_tunid="16711680", northd_internal_version="20.12.0-20.12.0-51.0", svc_monitor_mac="42:e7:5e:38:5a:16"}
sb_cfg              : 0                                                                               
sb_cfg_timestamp    : 0
ssl                 : []

[root@wsfd-advnetlab16 bz1877002]# ovn-sbctl list logical_dp_group
<=== no logical_dp_group
[root@wsfd-advnetlab16 bz1877002]# ovn-sbctl list logical_flow | wc -l
3343

[root@wsfd-advnetlab16 bz1877002]# ovn-nbctl --wait=hv set NB_Global . options:use_logical_dp_groups=true
[root@wsfd-advnetlab16 bz1877002]# ovn-sbctl list logical_flow | wc -l
2661
<==== less flows
[root@wsfd-advnetlab16 bz1877002]# ovn-sbctl list logical_dp_group
_uuid               : 944c3cf1-0b44-4d70-8c1b-c9ce0659db0c
datapaths           : [61dc573f-5e71-42ea-882f-96f3f4a29d7f, e472006a-aed2-4d82-9028-9bfacbac383f]
[root@wsfd-advnetlab16 bz1877002]# ovn-sbctl list datapath_binding
_uuid               : ac83c9d2-8e94-4b45-96a6-7254b61ae556
external_ids        : {logical-router="c790eba4-c177-49e6-a142-943df5e7afa1", name=lr1}
load_balancers      : []
tunnel_key          : 2

_uuid               : e472006a-aed2-4d82-9028-9bfacbac383f
external_ids        : {logical-switch="9ad7ebac-ad6d-4e81-9c40-1e45c712d660", name=ls1}
load_balancers      : []
tunnel_key          : 1

_uuid               : 61dc573f-5e71-42ea-882f-96f3f4a29d7f
external_ids        : {logical-switch="1e9421e7-b10e-4c74-aa76-4d40b31dbf6e", name=ls2}
load_balancers      : [2bf87375-7e32-43ec-bfad-85b8f58e40ed] 
tunnel_key          : 3

<==== logical_dp_group contains dp for ls1 and ls2

[root@wsfd-advnetlab16 bz1877002]# ovn-sbctl find logical_flow logical_dp_group=944c3cf1-0b44-4d70-8c1b-c9ce0659db0c | wc -l
681

<=== flows contain the logical_dp_group

[root@wsfd-advnetlab19 bz1877002]# ip netns exec client0 ping 192.168.1.3 -c 1
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.                                                  
64 bytes from 192.168.1.3: icmp_seq=1 ttl=63 time=6.42 ms                                             

--- 192.168.1.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 6.425/6.425/6.425/0.000 ms
[root@wsfd-advnetlab19 bz1877002]# ip netns exec client0 ping 192.168.1.2 -c 1
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.                                                  
64 bytes from 192.168.1.2: icmp_seq=1 ttl=63 time=1.60 ms                                             

--- 192.168.1.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.603/1.603/1.603/0.000 ms
[root@wsfd-advnetlab19 bz1877002]# ip netns exec client0 ping 192.168.1.1 -c 1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.                                                  
64 bytes from 192.168.1.1: icmp_seq=1 ttl=63 time=1.69 ms                                             

--- 192.168.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.698/1.698/1.698/0.000 ms
[root@wsfd-advnetlab19 bz1877002]# ip netns exec client0 ping 192.168.1.100 -c 1
PING 192.168.1.100 (192.168.1.100) 56(84) bytes of data.                                              
64 bytes from 192.168.1.100: icmp_seq=1 ttl=63 time=1.37 ms                                           

--- 192.168.1.100 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms                                           
rtt min/avg/max/mdev = 1.378/1.378/1.378/0.000 ms

<== ping works well

[root@wsfd-advnetlab16 bz1877002]# rpm -qa | grep -E "ovn2.13|openvswitch2.13"                        
openvswitch2.13-2.13.0-71.el7fdp.x86_64                                                               
ovn2.13-host-20.12.0-1.el7fdp.x86_64
ovn2.13-central-20.12.0-1.el7fdp.x86_64
ovn2.13-20.12.0-1.el7fdp.x86_64

Comment 11 Jianlin Shi 2021-02-18 00:56:53 UTC
set VERIFIED per comment 8

Comment 13 errata-xmlrpc 2021-03-15 14:36:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0836