Description of problem: I have run rally scenario "Boot VM and ping its floating ip" for 1000 VMs (with 100 networks and each network having 10 VMs) on both Openstack OSP16 ML2/OVS and OVN drivers. OVN is taking more time compared to OVS. 95%le time for ML2/OVS 509 seconds for 10 VMs 95%le time for OVN 633 sec for 10 VMs ML2/OVS rally results page http://rdu-storage01.scalelab.redhat.com/anilvenkata/20200327-105623/rally/simple-plugins/all-rally-run-0.html#/BrowbeatPlugin.create_network_nova_boot_ping OVN rally result page http://rdu-storage01.scalelab.redhat.com/anilvenkata/20200327-105619/rally/simple-plugins/all-rally-run-0.html#/BrowbeatPlugin.create_network_nova_boot_ping Version-Release number of selected component (if applicable): How reproducible: Deploy two OSP16 deployments with ML2/OVS and OVN and run below rally scenario https://github.com/cloud-bulldozer/browbeat/blob/master/rally/rally-plugins/netcreate-boot/netcreate_nova_boot_fip_ping.py It is a 3 controller and 3 compute OSP16 HA (and no DVR) OVN setup with puddle version RHOS_TRUNK-16.0-RHEL-8-20200226.n.1
We are trying to confirm our suspicions around this issue: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html Due to the test design where 300 routers are connected to the same public logical switch, we're observing a flow explosion in the lr_in_arp_resolve stage. This is not a regression but something that we haven't tested before. Running the same test (booting 3K VMs) on less routers should throw totally different results (much better performance) but we'll address the issue described in the email thread.
After investigation it looks like the main culprit at this point is the processing of Port Groups with lots of ports. Numan's working on the incremental processing bits around Port Groups.
(In reply to Daniel Alvarez Sanchez from comment #5) > After investigation it looks like the main culprit at this point is the > processing of Port Groups with lots of ports. Numan's working on the > incremental processing bits around Port Groups. Sorry, Dumitru (assignee) is working on it.
tested with following script: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.111.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.111.25 systemctl restart ovn-controller ovn-nbctl pg-add pg ovn-nbctl --type=port-group acl-add pg from-lport 1001 "inport == @pg && ip" drop ovn-nbctl --type=port-group acl-add pg to-lport 1001 "outport == @pg && ip" drop ovn-nbctl acl-list pg for switch_id in {1..100} do ovn-nbctl ls-add ls$switch_id for port in {1..5} do ovn-nbctl lsp-add ls$switch_id lsp${switch_id}_$port ovs-vsctl add-port br-int p${switch_id}_$port -- set interface p${switch_id}_$port type=internal ovs-vsctl set interface p${switch_id}_$port external_ids:iface-id=lsp${switch_id}_$port done done echo > /tmp/ls_ports date for switch_id in {1..100} do for port in {1..5} do echo lsp${switch_id}_$port >> /tmp/ls_ports done pg_ports=`cat /tmp/ls_ports |xargs` ovn-nbctl --wait=hv pg-set-ports pg $pg_ports done set -x flow_num_before=0 sleep 1 while : do flow_num=$(ovs-ofctl dump-flows br-int | wc -l) [ $flow_num -eq $flow_num_before ] && break flow_num_before=$flow_num sleep 1 done date result on ovn2.13.0-37: [root@hp-dl380pg8-12 bz1818128]# bash try.sh from-lport 1001 (inport == @pg && ip) drop to-lport 1001 (outport == @pg && ip) drop Mon Jul 13 03:59:16 EDT 2020 <=== start time + flow_num_before=0 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 0 ']' + flow_num_before=11524 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 11524 ']' + break + date Mon Jul 13 03:59:52 EDT 2020 <=== end time <=== it takes about 36 seconds [root@hp-dl380pg8-12 bz1818128]# rpm -qa | grep -E "openvswitch|ovn" ovn2.13-2.13.0-37.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch ovn2.13-central-2.13.0-37.el8fdp.x86_64 openvswitch2.13-2.13.0-41.el8fdb.x86_64 ovn2.13-host-2.13.0-37.el8fdp.x86_64 result on ovn2.13.0-39: [root@hp-dl380pg8-12 bz1818128]# bash try.sh from-lport 1001 (inport == @pg && ip) drop to-lport 1001 (outport == @pg && ip) drop Mon Jul 13 04:02:45 EDT 2020 <=== start time + flow_num_before=0 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 0 ']' + flow_num_before=11524 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 11524 ']' + break + date Mon Jul 13 04:02:59 EDT 2020 <=== end time <=== it takes about 14 seconds. much improvement compared to 37 [root@hp-dl380pg8-12 bz1818128]# rpm -qa | grep -E "openvswitch|ovn" ovn2.13-2.13.0-39.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch ovn2.13-host-2.13.0-39.el8fdp.x86_64 openvswitch2.13-2.13.0-41.el8fdb.x86_64 ovn2.13-central-2.13.0-39.el8fdp.x86_64
port group record on 39: [root@hp-dl380pg8-12 bz1818128]# ovn-sbctl list port_group _uuid : c6a805f1-58f2-4540-83e5-e3643af96bc3 name : "2_pg_test" ports : [lsp2] _uuid : ea5830ea-0787-484e-b649-fbd9c9011b7b name : "1_pg_test" ports : [lsp1] on 37: [root@hp-dl380pg8-12 bz1818128]# ovn-sbctl list port_group _uuid : ac6305f9-1514-42af-9c28-67f57038b0a5 name : pg_test ports : [lsp1, lsp2]
also tested on rhel7 version: [root@dell-per740-12 bz818128]# bash try.sh from-lport 1001 (inport == @pg && ip) drop to-lport 1001 (outport == @pg && ip) drop Mon Jul 13 04:14:33 EDT 2020 + flow_num_before=0 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 0 ']' + flow_num_before=11524 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 11524 ']' + break + date Mon Jul 13 04:15:02 EDT 2020 <=== about 39 seconds [root@dell-per740-12 bz818128]# ls cleanup.sh no ovn2.13.0-39 setup.sh test.sh try.sh [root@dell-per740-12 bz818128]# rpm -qa | grep -E "openvswitch|ovn" kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch ovn2.13-2.13.0-37.el7fdp.x86_64 openvswitch2.13-2.13.0-10.el7fdp.x86_64 ovn2.13-central-2.13.0-37.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch ovn2.13-host-2.13.0-37.el7fdp.x86_64 on 39: [root@dell-per740-12 bz818128]# bash try.sh from-lport 1001 (inport == @pg && ip) drop to-lport 1001 (outport == @pg && ip) drop Mon Jul 13 04:19:21 EDT 2020 + flow_num_before=0 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 0 ']' + flow_num_before=11524 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 11524 ']' + break + date Mon Jul 13 04:19:32 EDT 2020 <=== about 11 seconds [root@dell-per740-12 bz818128]# rpm -qa | grep -E "openvswitch|ovn" kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch ovn2.13-central-2.13.0-39.el7fdp.x86_64 openvswitch2.13-2.13.0-10.el7fdp.x86_64 ovn2.13-2.13.0-39.el7fdp.x86_64 ovn2.13-host-2.13.0-39.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch
This issue has conditional approval for OSP16.1 Z1 release, it must be either shipped/in CDN prior to July 29th. If not, we will move to TM=Z2.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3150