Created attachment 1767290 [details] OVN NB database. Description of problem: In ovn-kubernetes (or similar) deployments, ACLs used for implementing network policies are applied to port groups that include all ports of the namespace. This translates to ACLs being applied to independently to all logical switches that have ports included in the port gorup. To differentiate between the logical datapath on which the ACL is applied ovn-controller generates one flow per datapath appending the additional "metadata=<datapath-tunnel-key>" match to the match expression parsed from the ACL's logical flow match. This duplication of OF rules (once for each logical switch) creates an OF rule explosion in ovn-controller/ovs-vswitchd. For example, with the attached OVN NB database extracted from a scale test run, and with the following interfaces bound to a single node OVN deployment: lports=(lp_17.1.0.9 lp_17.1.0.10 lp_17.1.0.11 lp_17.1.0.12 lp_17.1.0.13 lp_17.1.0.14 lp_17.1.0.15 lp_17.1.0.16 lp_17.1.0.17 lp_17.1.0.18) for lp in ${lports[@]}; do ovs-vsctl add-port br-int $lp \ -- set interface $lp type=internal \ -- set interface $lp external_ids:iface-id=$lp done To avoid SB/OVS disconnects also increase timeouts: ovn-sbctl set connection . inactivity_probe=180000 ovs-vsctl set open . external_ids:ovn-openflow-probe-interval=180 ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000 We notice in the ovn-controller log: 2021-03-26T22:36:48.436Z|24385|timeval|WARN|Unreasonably long 47246ms poll interval (45010ms user, 1997ms system) ... 2021-03-26T22:51:55.727Z|24839|memory|INFO|peak resident set size grew 53% in last 21677.7 seconds, from 3855720 kB to 5881796 kB 2021-03-26T22:51:55.727Z|24840|memory|INFO|lflow-cache-entries-cache-conj-id:16 lflow-cache-entries-cache-matches:164344 lflow-cache-size-KB:785612 Focusing on the OF rules generated by ovn-controller from ACLs: # grep conj /tmp/OF-rules | grep -e 'conjunction(18,' -e 'conj_id=18' | grep "17.143.0.5" | head -10 cookie=0x0, duration=181.726s, table=45, n_packets=0, n_bytes=0, idle_age=181, priority=2010,ip,reg0=0x80/0x80,metadata=0x3ea,nw_dst=17.143.0.5 actions=conjunction(18,1/2) cookie=0x0, duration=181.420s, table=45, n_packets=0, n_bytes=0, idle_age=181, priority=2010,ip,reg0=0x80/0x80,metadata=0x430,nw_dst=17.143.0.5 actions=conjunction(18,1/2) cookie=0x0, duration=180.893s, table=45, n_packets=0, n_bytes=0, idle_age=180, priority=2010,ip,reg0=0x80/0x80,metadata=0x297,nw_dst=17.143.0.5 actions=conjunction(18,1/2) cookie=0x0, duration=180.777s, table=45, n_packets=0, n_bytes=0, idle_age=180, priority=2010,ip,reg0=0x80/0x80,metadata=0x3cf,nw_dst=17.143.0.5 actions=conjunction(18,1/2) cookie=0x0, duration=180.519s, table=45, n_packets=0, n_bytes=0, idle_age=180, priority=2010,ip,reg0=0x80/0x80,metadata=0x43f,nw_dst=17.143.0.5 actions=conjunction(18,1/2) cookie=0x0, duration=180.349s, table=45, n_packets=0, n_bytes=0, idle_age=180, priority=2010,ip,reg0=0x80/0x80,metadata=0x3fd,nw_dst=17.143.0.5 actions=conjunction(18,1/2) cookie=0x0, duration=179.866s, table=45, n_packets=0, n_bytes=0, idle_age=179, priority=2010,ip,reg0=0x80/0x80,metadata=0x49c,nw_dst=17.143.0.5 actions=conjunction(18,1/2) cookie=0x0, duration=179.698s, table=45, n_packets=0, n_bytes=0, idle_age=179, priority=2010,ip,reg0=0x80/0x80,metadata=0x3be,nw_dst=17.143.0.5 actions=conjunction(18,1/2) cookie=0x0, duration=179.637s, table=45, n_packets=0, n_bytes=0, idle_age=179, priority=2010,ip,reg0=0x80/0x80,metadata=0x488,nw_dst=17.143.0.5 actions=conjunction(18,1/2) cookie=0x0, duration=179.556s, table=45, n_packets=0, n_bytes=0, idle_age=179, priority=2010,ip,reg0=0x80/0x80,metadata=0x40d,nw_dst=17.143.0.5 actions=conjunction(18,1/2) The only difference between the above flow matches is the metadata value (logical datapath tunnel key). # grep conj /tmp/OF-rules | grep -e 'conjunction(18,' -e 'conj_id=18' | grep "17.143.0.5" | wc -l 200 This is repeated 200 times (as the PG includes ports from 200 logical switches). As ACLs are very similar (just different port groups and address sets) this scenario happens for all ACLs. The total number of conjunctive match OF rules is: # grep -c conj /tmp/OF-rules 8004048 On this specific setup, if the metadata match would be included in the conjunctive match the number of OF rules would decrease by a factor of x200. The same issue was also reported upstream: https://mail.openvswitch.org/pipermail/ovs-dev/2021-March/381082.html
Upstream fix: https://github.com/ovn-org/ovn/commit/0cfeba6b55e3b8cc93b7a077f53cb78a9678905c
I think 1-4 in http://patchwork.ozlabs.org/project/ovn/list/?series=241037&state=%2A&archive=both are prereqs of that commit, right?
(In reply to Dan Williams from comment #2) > I think 1-4 in > http://patchwork.ozlabs.org/project/ovn/list/ > ?series=241037&state=%2A&archive=both are prereqs of that commit, right? Correct, the corresponding upstream commits are: https://github.com/ovn-org/ovn/commit/db41da34323c80692a6556a7c5aea3360e7877d2 https://github.com/ovn-org/ovn/commit/de3ca51a886493e6d0b2cd2bc85354e3808d7cbf https://github.com/ovn-org/ovn/commit/e2393241e977a8fce931e29cf03ecc182f610f57 https://github.com/ovn-org/ovn/commit/6a14469280585bd87f54c35ee572c64c9db134f6 https://github.com/ovn-org/ovn/commit/0cfeba6b55e3b8cc93b7a077f53cb78a9678905c
The following patch needs to be accepted upstream too and also backported along with the aforementioned ones: http://patchwork.ozlabs.org/project/ovn/patch/20210602070731.3736171-1-hzhou@ovn.org/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2969