Bug 1818128
| Summary: | [OVN SCALE] [ovn-controller] Adding flows for bringing up the VM takes more time | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | anil venkata <vkommadi> |
| Component: | OVN | Assignee: | Dumitru Ceara <dceara> |
| Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | RHEL 8.0 | CC: | ctrautma, dalvarez, mjozefcz, mmichels, nusiddiq, rkhan, spower |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-07-27 05:11:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
anil venkata
2020-03-27 19:37:17 UTC
We are trying to confirm our suspicions around this issue: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html Due to the test design where 300 routers are connected to the same public logical switch, we're observing a flow explosion in the lr_in_arp_resolve stage. This is not a regression but something that we haven't tested before. Running the same test (booting 3K VMs) on less routers should throw totally different results (much better performance) but we'll address the issue described in the email thread. After investigation it looks like the main culprit at this point is the processing of Port Groups with lots of ports. Numan's working on the incremental processing bits around Port Groups. (In reply to Daniel Alvarez Sanchez from comment #5) > After investigation it looks like the main culprit at this point is the > processing of Port Groups with lots of ports. Numan's working on the > incremental processing bits around Port Groups. Sorry, Dumitru (assignee) is working on it. tested with following script:
systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.111.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.111.25
systemctl restart ovn-controller
ovn-nbctl pg-add pg
ovn-nbctl --type=port-group acl-add pg from-lport 1001 "inport == @pg && ip" drop
ovn-nbctl --type=port-group acl-add pg to-lport 1001 "outport == @pg && ip" drop
ovn-nbctl acl-list pg
for switch_id in {1..100}
do
ovn-nbctl ls-add ls$switch_id
for port in {1..5}
do
ovn-nbctl lsp-add ls$switch_id lsp${switch_id}_$port
ovs-vsctl add-port br-int p${switch_id}_$port -- set interface p${switch_id}_$port type=internal
ovs-vsctl set interface p${switch_id}_$port external_ids:iface-id=lsp${switch_id}_$port
done
done
echo > /tmp/ls_ports
date
for switch_id in {1..100}
do
for port in {1..5}
do
echo lsp${switch_id}_$port >> /tmp/ls_ports
done
pg_ports=`cat /tmp/ls_ports |xargs`
ovn-nbctl --wait=hv pg-set-ports pg $pg_ports
done
set -x
flow_num_before=0
sleep 1
while :
do
flow_num=$(ovs-ofctl dump-flows br-int | wc -l)
[ $flow_num -eq $flow_num_before ] && break
flow_num_before=$flow_num
sleep 1
done
date
result on ovn2.13.0-37:
[root@hp-dl380pg8-12 bz1818128]# bash try.sh
from-lport 1001 (inport == @pg && ip) drop
to-lport 1001 (outport == @pg && ip) drop
Mon Jul 13 03:59:16 EDT 2020
<=== start time
+ flow_num_before=0
+ sleep 1
+ :
++ ovs-ofctl dump-flows br-int
++ wc -l
+ flow_num=11524
+ '[' 11524 -eq 0 ']'
+ flow_num_before=11524
+ sleep 1
+ :
++ ovs-ofctl dump-flows br-int
++ wc -l
+ flow_num=11524
+ '[' 11524 -eq 11524 ']'
+ break
+ date
Mon Jul 13 03:59:52 EDT 2020
<=== end time
<=== it takes about 36 seconds
[root@hp-dl380pg8-12 bz1818128]# rpm -qa | grep -E "openvswitch|ovn"
ovn2.13-2.13.0-37.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch
ovn2.13-central-2.13.0-37.el8fdp.x86_64
openvswitch2.13-2.13.0-41.el8fdb.x86_64
ovn2.13-host-2.13.0-37.el8fdp.x86_64
result on ovn2.13.0-39:
[root@hp-dl380pg8-12 bz1818128]# bash try.sh
from-lport 1001 (inport == @pg && ip) drop
to-lport 1001 (outport == @pg && ip) drop
Mon Jul 13 04:02:45 EDT 2020
<=== start time
+ flow_num_before=0
+ sleep 1
+ :
++ ovs-ofctl dump-flows br-int
++ wc -l
+ flow_num=11524
+ '[' 11524 -eq 0 ']'
+ flow_num_before=11524
+ sleep 1
+ :
++ ovs-ofctl dump-flows br-int
++ wc -l
+ flow_num=11524
+ '[' 11524 -eq 11524 ']'
+ break
+ date
Mon Jul 13 04:02:59 EDT 2020
<=== end time
<=== it takes about 14 seconds. much improvement compared to 37
[root@hp-dl380pg8-12 bz1818128]# rpm -qa | grep -E "openvswitch|ovn"
ovn2.13-2.13.0-39.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch
ovn2.13-host-2.13.0-39.el8fdp.x86_64
openvswitch2.13-2.13.0-41.el8fdb.x86_64
ovn2.13-central-2.13.0-39.el8fdp.x86_64
port group record on 39: [root@hp-dl380pg8-12 bz1818128]# ovn-sbctl list port_group _uuid : c6a805f1-58f2-4540-83e5-e3643af96bc3 name : "2_pg_test" ports : [lsp2] _uuid : ea5830ea-0787-484e-b649-fbd9c9011b7b name : "1_pg_test" ports : [lsp1] on 37: [root@hp-dl380pg8-12 bz1818128]# ovn-sbctl list port_group _uuid : ac6305f9-1514-42af-9c28-67f57038b0a5 name : pg_test ports : [lsp1, lsp2] also tested on rhel7 version: [root@dell-per740-12 bz818128]# bash try.sh from-lport 1001 (inport == @pg && ip) drop to-lport 1001 (outport == @pg && ip) drop Mon Jul 13 04:14:33 EDT 2020 + flow_num_before=0 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 0 ']' + flow_num_before=11524 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 11524 ']' + break + date Mon Jul 13 04:15:02 EDT 2020 <=== about 39 seconds [root@dell-per740-12 bz818128]# ls cleanup.sh no ovn2.13.0-39 setup.sh test.sh try.sh [root@dell-per740-12 bz818128]# rpm -qa | grep -E "openvswitch|ovn" kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch ovn2.13-2.13.0-37.el7fdp.x86_64 openvswitch2.13-2.13.0-10.el7fdp.x86_64 ovn2.13-central-2.13.0-37.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch ovn2.13-host-2.13.0-37.el7fdp.x86_64 on 39: [root@dell-per740-12 bz818128]# bash try.sh from-lport 1001 (inport == @pg && ip) drop to-lport 1001 (outport == @pg && ip) drop Mon Jul 13 04:19:21 EDT 2020 + flow_num_before=0 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 0 ']' + flow_num_before=11524 + sleep 1 + : ++ ovs-ofctl dump-flows br-int ++ wc -l + flow_num=11524 + '[' 11524 -eq 11524 ']' + break + date Mon Jul 13 04:19:32 EDT 2020 <=== about 11 seconds [root@dell-per740-12 bz818128]# rpm -qa | grep -E "openvswitch|ovn" kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch ovn2.13-central-2.13.0-39.el7fdp.x86_64 openvswitch2.13-2.13.0-10.el7fdp.x86_64 ovn2.13-2.13.0-39.el7fdp.x86_64 ovn2.13-host-2.13.0-39.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch This issue has conditional approval for OSP16.1 Z1 release, it must be either shipped/in CDN prior to July 29th. If not, we will move to TM=Z2. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3150 |