+++ This bug was initially created as a clone of Bug #1795697 +++ Hi all, Based on some problems that we've detected at scale, I've been doing an analysis of how logical flows are distributed on a system which makes heavy use of Floating IPs (dnat_and_snat NAT entries) and DVR. [root@central ~]# ovn-nbctl list NAT|grep dnat_and_snat -c 985 With 985 Floating IPs (and ~1.2K ACLs), I can see that 680K logical flows are generated. This is creating a terribly stress everywhere (ovsdb-server, ovn-northd, ovn-controller) especially upon reconnection of ovn-controllers to the SB database which have to read ~0.7 million of logical flows and process them: [root@central ~]# time ovn-sbctl list logical_flow > logical_flows.txt real 1m17.465s user 0m41.916s sys 0m1.996s [root@central ~]# grep _uuid logical_flows.txt -c 680276 The problem is even worse when a lot of clients are simultaneously reading the dump from the SB DB server (this could be certainly alleviated by using RAFT but we're not there yet) causing even OOM killers on ovsdb-server/ovn-northd and a severe delay of the control plane to be operational again. I have investigated a little bit the lflows generated and their distribution per stage finding that 62.2% are in the lr_out_egr_loop and 31.1% are in the lr_in_ip_routing stage: [root@central ~]# head -n 10 logical_flows_distribution_sorted.txt lr_out_egr_loop: 423414 62.24% lr_in_ip_routing: 212199 31.19% lr_in_ip_input: 10831 1.59% ls_out_acl: 4831 0.71% ls_in_port_sec_ip: 3471 0.51% ls_in_l2_lkup: 2360 0.34% .... Tackling first the lflows in lr_out_egr_loop I can see that there are mainly two lflow types: 1) external_ids : {source="ovn-northd.c:8807", stage-name=lr_out_egr_loop} logical_datapath : 261206d2-72c5-4e79-ae5c-669e6ee4e71a match : "ip4.src == 10.142.140.39 && ip4.dst == 10.142.140.112" pipeline : egress priority : 200 table_id : 2 hash : 0 2) actions : "inport = outport; outport = \"\"; flags = 0; flags.loopback = 1; reg9[1] = 1; next(pipeline=ingress, table=0); " external_ids : {source="ovn-northd.c:8799", stage-name=lr_out_egr_loop} logical_datapath : 161206d2-72c5-4e79-ae5c-669e6ee4e71a match : "is_chassis_resident(\"42f64a6c-a52d-4712-8c56-876e8fb30c03\") && ip4.src == 10.142.140.39 && ip4.dst == 10.142.141.19" pipeline : egress priority : 300 Looks like these lflows are added by this commit: https://github.com/ovn-org/ovn/commit/551e3d989557bd2249d5bbe0978b44b775c5e619 And each Floating IP contributes to ~1.2K lflows (of course this grows as the number of FIPs grow): [root@central ~]# grep 10.142.140.39 lr_out_egr_loop.txt |grep match -c 1233 Similarly, for the lr_in_ip_routing stage, we find the same pattern: 1) actions : "outport = \"lrp-d2d745f5-91f0-4626-81c0-715c63d35716\"; eth.src = fa:16:3e:22:02:29; eth.dst = fa:16:5e:6f:36:e4; reg0 = ip4.dst; reg1 = 10.142.143.147; reg9[2] = 1; reg9[0] = 0; next;" external_ids : {source="ovn-northd.c:6782", stage-name=lr_in_ip_routing} logical_datapath : 161206d2-72c5-4e79-ae5c-669e6ee4e71a match : "inport == \"lrp-09f7eba5-54b7-48f4-9820-80423b65c608\" && ip4.src == 10.1.0.170 && ip4.dst == 10.142.140.39" pipeline : ingress priority : 400 Looks like these last flows are added by this commit: https://github.com/ovn-org/ovn/commit/8244c6b6bd8802a018e4ec3d3665510ebb16a9c7 Each FIP contributes to 599 LFlows in this stage: [root@central ~]# grep -c 10.142.140.39 lr_in_ip_routing.txt 599 [root@central ~]# grep -c 10.142.140.185 lr_in_ip_routing.txt 599 In order to figure out the relationship between the # of FIPs and the lflows, I removed a few of them and still the % of lflows in both stages remain constant. [root@central ~]# ovn-nbctl find NAT type=dnat_and_snat | grep -c _uuid 833 [root@central ~]# grep _uuid logical_flows_2.txt -c 611640 lr_out_egr_loop: 379740 62.08% lr_in_ip_routing: 190295 31.11% We need to find a way to reduce the number of flows on both stages and/or offload part of the calculation to ovn-controller in case the logical_port assigned to that FIP is bound to its chassis as this is creating big scale issues. Daniel --- Additional comment from Daniel Alvarez Sanchez on 2020-02-03 15:38:55 UTC --- Patch upstream under review: https://patchwork.ozlabs.org/patch/1232232/
reproduced on ovn2.12.0-26 with following steps: #!/bin/bash systemctl restart openvswitch systemctl restart ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external-ids:system-id=hv0 external-ids:ovn-remote=tcp:20.0.30.26:6642 external-ids:ovn-encap-type=geneve external-ids:ovn-encap-ip=20.0.30.26 systemctl restart ovn-controller copy ovnnb_db.db in bz1788906 to /var/lib/ovn/ then restart ovn-northd: systemctl restart ovn-northd wait a few minutes: [root@dell-per740-12 bz1798173]# ovn-nbctl list NAT|grep dnat_and_snat -c 985 [root@dell-per740-12 bz1798173]# time ovn-sbctl list logical_flow > logical_flows.txt real 0m38.315s user 0m21.424s sys 0m2.744s [root@dell-per740-12 bz1798173]# grep _uuid logical_flows.txt -c 680324 [root@dell-per740-12 bz1798173]# rpm -qa | grep -E "ovn|openvswitch" ovn2.12-central-2.12.0-26.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch openvswitch2.12-2.12.0-21.el7fdp.x86_64 ovn2.12-2.12.0-26.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-basic-1.0-18.noarch ovn2.12-host-2.12.0-26.el7fdp.x86_64 Verified on ovn2.12.0-27: [root@dell-per740-12 bz1798173]# ovn-nbctl list NAT|grep dnat_and_snat -c 985 [root@dell-per740-12 bz1798173]# time ovn-sbctl list logical_flow > logical_flows.txt real 0m2.530s user 0m1.617s sys 0m0.205s [root@dell-per740-12 bz1798173]# grep _uuid logical_flows.txt -c 44899 [root@dell-per740-12 bz1798173]# rpm -qa | grep -E "ovn|openvswitch" kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch ovn2.12-host-2.12.0-27.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch openvswitch2.12-2.12.0-21.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-basic-1.0-18.noarch ovn2.12-2.12.0-27.el7fdp.x86_64 ovn2.12-central-2.12.0-27.el7fdp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0752