Bug 1818128
Summary: | [OVN SCALE] [ovn-controller] Adding flows for bringing up the VM takes more time | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | anil venkata <vkommadi> |
Component: | OVN | Assignee: | Dumitru Ceara <dceara> |
Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | RHEL 8.0 | CC: | ctrautma, dalvarez, mjozefcz, mmichels, nusiddiq, rkhan, spower |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-27 05:11:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
anil venkata
2020-03-27 19:37:17 UTC
We are trying to confirm our suspicions around this issue: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html Due to the test design where 300 routers are connected to the same public logical switch, we're observing a flow explosion in the lr_in_arp_resolve stage. This is not a regression but something that we haven't tested before. Running the same test (booting 3K VMs) on less routers should throw totally different results (much better performance) but we'll address the issue described in the email thread. After investigation it looks like the main culprit at this point is the processing of Port Groups with lots of ports. Numan's working on the incremental processing bits around Port Groups. (In reply to Daniel Alvarez Sanchez from comment #5) > After investigation it looks like the main culprit at this point is the > processing of Port Groups with lots of ports. Numan's working on the > incremental processing bits around Port Groups. Sorry, Dumitru (assignee) is working on it. tested with following script: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.111.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.111.25 systemctl restart ovn-controller ovn-nbctl pg-add pg ovn-nbctl --type=port-group acl-add pg from-lport 1001 "inport == @pg && ip" drop ovn-nbctl --type=port-group acl-add pg to-lport 1001 "outport == @pg && ip" drop ovn-nbctl acl-list pg for switch_id in {1..100} do ovn-nbctl ls-add ls$switch_id for port in {1..5} do ovn-nbctl lsp-add ls$switch_id lsp${switch_id}_$port ovs-vsctl add-port br-int p${switch_id}_$port -- set interface p${switch_id}_$port type=internal ovs-vsctl set interface p${switch_id}_$port external_ids:iface-id=lsp${switch_id}_$port done done echo > /tmp/ls_ports date for switch_id in {1..100} do for port in {1..5} do echo lsp${switch_id}_$port >> /tmp/ls_ports done pg_ports=`cat /tmp/ls_ports |xargs` ovn-nbctl --wait=hv pg-set-ports pg $pg_ports done set -x flow_num_before=0 sleep 1 while : do flow_num=$(ovs-ofctl dump-flows br-int | wc -l) [ $flow_num -eq $flow_num_before ] && break flow_num_before=$flow_num sleep 1 done date result on ovn2.13.0-37: [root@hp-dl380pg8-12 bz1818128]# bash try.sh from-lport 1001 (inport == @pg && ip) drop to-lport 1001 (outport == @pg && ip) drop Mon Jul 13 03:59:16 EDT 2020 <=== start time + flow_num_before=0 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 0 ']' + flow_num_before=11524 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 11524 ']' + break + date Mon Jul 13 03:59:52 EDT 2020 <=== end time <=== it takes about 36 seconds [root@hp-dl380pg8-12 bz1818128]# rpm -qa | grep -E "openvswitch|ovn" ovn2.13-2.13.0-37.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch ovn2.13-central-2.13.0-37.el8fdp.x86_64 openvswitch2.13-2.13.0-41.el8fdb.x86_64 ovn2.13-host-2.13.0-37.el8fdp.x86_64 result on ovn2.13.0-39: [root@hp-dl380pg8-12 bz1818128]# bash try.sh from-lport 1001 (inport == @pg && ip) drop to-lport 1001 (outport == @pg && ip) drop Mon Jul 13 04:02:45 EDT 2020 <=== start time + flow_num_before=0 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 0 ']' + flow_num_before=11524 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 11524 ']' + break + date Mon Jul 13 04:02:59 EDT 2020 <=== end time <=== it takes about 14 seconds. much improvement compared to 37 [root@hp-dl380pg8-12 bz1818128]# rpm -qa | grep -E "openvswitch|ovn" ovn2.13-2.13.0-39.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch ovn2.13-host-2.13.0-39.el8fdp.x86_64 openvswitch2.13-2.13.0-41.el8fdb.x86_64 ovn2.13-central-2.13.0-39.el8fdp.x86_64 port group record on 39: [root@hp-dl380pg8-12 bz1818128]# ovn-sbctl list port_group _uuid : c6a805f1-58f2-4540-83e5-e3643af96bc3 name : "2_pg_test" ports : [lsp2] _uuid : ea5830ea-0787-484e-b649-fbd9c9011b7b name : "1_pg_test" ports : [lsp1] on 37: [root@hp-dl380pg8-12 bz1818128]# ovn-sbctl list port_group _uuid : ac6305f9-1514-42af-9c28-67f57038b0a5 name : pg_test ports : [lsp1, lsp2] also tested on rhel7 version: [root@dell-per740-12 bz818128]# bash try.sh from-lport 1001 (inport == @pg && ip) drop to-lport 1001 (outport == @pg && ip) drop Mon Jul 13 04:14:33 EDT 2020 + flow_num_before=0 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 0 ']' + flow_num_before=11524 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 11524 ']' + break + date Mon Jul 13 04:15:02 EDT 2020 <=== about 39 seconds [root@dell-per740-12 bz818128]# ls cleanup.sh no ovn2.13.0-39 setup.sh test.sh try.sh [root@dell-per740-12 bz818128]# rpm -qa | grep -E "openvswitch|ovn" kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch ovn2.13-2.13.0-37.el7fdp.x86_64 openvswitch2.13-2.13.0-10.el7fdp.x86_64 ovn2.13-central-2.13.0-37.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch ovn2.13-host-2.13.0-37.el7fdp.x86_64 on 39: [root@dell-per740-12 bz818128]# bash try.sh from-lport 1001 (inport == @pg && ip) drop to-lport 1001 (outport == @pg && ip) drop Mon Jul 13 04:19:21 EDT 2020 + flow_num_before=0 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 0 ']' + flow_num_before=11524 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 11524 ']' + break + date Mon Jul 13 04:19:32 EDT 2020 <=== about 11 seconds [root@dell-per740-12 bz818128]# rpm -qa | grep -E "openvswitch|ovn" kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch ovn2.13-central-2.13.0-39.el7fdp.x86_64 openvswitch2.13-2.13.0-10.el7fdp.x86_64 ovn2.13-2.13.0-39.el7fdp.x86_64 ovn2.13-host-2.13.0-39.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch This issue has conditional approval for OSP16.1 Z1 release, it must be either shipped/in CDN prior to July 29th. If not, we will move to TM=Z2. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3150 |