The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1818128 - [OVN SCALE] [ovn-controller] Adding flows for bringing up the VM takes more time
Summary: [OVN SCALE] [ovn-controller] Adding flows for bringing up the VM takes more time
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: RHEL 8.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Dumitru Ceara
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-27 19:37 UTC by anil venkata
Modified: 2020-07-27 05:11 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-27 05:11:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:3150 0 None None None 2020-07-27 05:11:53 UTC

Description anil venkata 2020-03-27 19:37:17 UTC
Description of problem:

I have run rally scenario "Boot VM and ping its floating ip" for 1000 VMs (with 100 networks and each network having 10 VMs) on both Openstack OSP16 ML2/OVS and OVN drivers. OVN is taking more time compared to OVS. 


95%le time for ML2/OVS 509 seconds for 10 VMs
95%le time for OVN     633 sec for 10 VMs

ML2/OVS rally results page 
http://rdu-storage01.scalelab.redhat.com/anilvenkata/20200327-105623/rally/simple-plugins/all-rally-run-0.html#/BrowbeatPlugin.create_network_nova_boot_ping

OVN rally result page
http://rdu-storage01.scalelab.redhat.com/anilvenkata/20200327-105619/rally/simple-plugins/all-rally-run-0.html#/BrowbeatPlugin.create_network_nova_boot_ping


Version-Release number of selected component (if applicable):


How reproducible:
Deploy two OSP16 deployments with ML2/OVS and OVN and run below rally scenario
https://github.com/cloud-bulldozer/browbeat/blob/master/rally/rally-plugins/netcreate-boot/netcreate_nova_boot_fip_ping.py

It is a 3 controller and 3 compute OSP16 HA (and no DVR) OVN setup with puddle version RHOS_TRUNK-16.0-RHEL-8-20200226.n.1

Comment 4 Daniel Alvarez Sanchez 2020-05-28 12:08:18 UTC
We are trying to confirm our suspicions around this issue:

https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html

Due to the test design where 300 routers are connected to the same public logical switch, we're observing a flow explosion in the lr_in_arp_resolve stage.
This is not a regression but something that we haven't tested before.

Running the same test (booting 3K VMs) on less routers should throw totally different results (much better performance) but we'll address the issue described in the email thread.

Comment 5 Daniel Alvarez Sanchez 2020-06-18 08:08:30 UTC
After investigation it looks like the main culprit at this point is the processing of Port Groups with lots of ports. Numan's working on the incremental processing bits around Port Groups.

Comment 6 Daniel Alvarez Sanchez 2020-06-18 08:15:00 UTC
(In reply to Daniel Alvarez Sanchez from comment #5)
> After investigation it looks like the main culprit at this point is the
> processing of Port Groups with lots of ports. Numan's working on the
> incremental processing bits around Port Groups.

Sorry, Dumitru (assignee) is working on it.

Comment 10 Jianlin Shi 2020-07-13 08:05:52 UTC
tested with following script:

systemctl start openvswitch                                                                                                                                                                                 
systemctl start ovn-northd     
ovn-nbctl set-connection ptcp:6641                                         
ovn-sbctl set-connection ptcp:6642                 
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.111.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.111.25
systemctl restart ovn-controller 
ovn-nbctl pg-add pg
ovn-nbctl  --type=port-group  acl-add pg from-lport 1001 "inport == @pg && ip" drop
ovn-nbctl --type=port-group acl-add pg to-lport 1001 "outport == @pg && ip" drop
ovn-nbctl acl-list pg


for switch_id in {1..100}
do
        ovn-nbctl ls-add ls$switch_id
        for port in {1..5}
        do
                ovn-nbctl lsp-add ls$switch_id lsp${switch_id}_$port
                ovs-vsctl add-port br-int p${switch_id}_$port  -- set interface p${switch_id}_$port type=internal
                ovs-vsctl set interface p${switch_id}_$port external_ids:iface-id=lsp${switch_id}_$port
        done
done

echo > /tmp/ls_ports
date
for switch_id in {1..100}
do
        for port in {1..5}
        do
                echo lsp${switch_id}_$port >> /tmp/ls_ports
        done
        pg_ports=`cat /tmp/ls_ports |xargs`
        ovn-nbctl --wait=hv pg-set-ports pg $pg_ports
done


set -x
flow_num_before=0
sleep 1
while :
do
    flow_num=$(ovs-ofctl dump-flows br-int | wc -l)
    [ $flow_num -eq $flow_num_before ] && break
    flow_num_before=$flow_num
    sleep 1
done
date

result on ovn2.13.0-37:

[root@hp-dl380pg8-12 bz1818128]# bash try.sh 
from-lport  1001 (inport == @pg && ip) drop
  to-lport  1001 (outport == @pg && ip) drop
Mon Jul 13 03:59:16 EDT 2020
<=== start time

+ flow_num_before=0
+ sleep 1
+ :
++ ovs-ofctl dump-flows br-int
++ wc -l
+ flow_num=11524
+ '[' 11524 -eq 0 ']'
+ flow_num_before=11524
+ sleep 1
+ :     
++ ovs-ofctl dump-flows br-int
++ wc -l
+ flow_num=11524
+ '[' 11524 -eq 11524 ']'
+ break
+ date
Mon Jul 13 03:59:52 EDT 2020
<=== end time

<=== it takes about 36 seconds

[root@hp-dl380pg8-12 bz1818128]# rpm -qa | grep -E "openvswitch|ovn"
ovn2.13-2.13.0-37.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch
ovn2.13-central-2.13.0-37.el8fdp.x86_64
openvswitch2.13-2.13.0-41.el8fdb.x86_64
ovn2.13-host-2.13.0-37.el8fdp.x86_64

result on ovn2.13.0-39:

[root@hp-dl380pg8-12 bz1818128]# bash try.sh               
from-lport  1001 (inport == @pg && ip) drop
  to-lport  1001 (outport == @pg && ip) drop                                                                                                                         
Mon Jul 13 04:02:45 EDT 2020  

<=== start time
                       
+ flow_num_before=0                        
+ sleep 1  
+ :
++ ovs-ofctl dump-flows br-int
++ wc -l                                                                           
+ flow_num=11524                                                                
+ '[' 11524 -eq 0 ']'
+ flow_num_before=11524
+ sleep 1                                          
+ :                                            
++ ovs-ofctl dump-flows br-int
++ wc -l                             
+ flow_num=11524          
+ '[' 11524 -eq 11524 ']'
+ break                                                             
+ date                                                                                                           
Mon Jul 13 04:02:59 EDT 2020   
<=== end time

<=== it takes about 14 seconds. much improvement compared to 37
                                                                        
[root@hp-dl380pg8-12 bz1818128]# rpm -qa | grep -E "openvswitch|ovn"
ovn2.13-2.13.0-39.el8fdp.x86_64                   
openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch
ovn2.13-host-2.13.0-39.el8fdp.x86_64
openvswitch2.13-2.13.0-41.el8fdb.x86_64
ovn2.13-central-2.13.0-39.el8fdp.x86_64

Comment 11 Jianlin Shi 2020-07-13 08:07:41 UTC
port group record on 39:

[root@hp-dl380pg8-12 bz1818128]# ovn-sbctl list port_group
_uuid               : c6a805f1-58f2-4540-83e5-e3643af96bc3
name                : "2_pg_test"
ports               : [lsp2]

_uuid               : ea5830ea-0787-484e-b649-fbd9c9011b7b
name                : "1_pg_test"
ports               : [lsp1]

on 37:

[root@hp-dl380pg8-12 bz1818128]# ovn-sbctl list port_group
_uuid               : ac6305f9-1514-42af-9c28-67f57038b0a5
name                : pg_test
ports               : [lsp1, lsp2]

Comment 12 Jianlin Shi 2020-07-13 08:21:00 UTC
also tested on rhel7 version:

[root@dell-per740-12 bz818128]# bash try.sh                                                           
from-lport  1001 (inport == @pg && ip) drop
  to-lport  1001 (outport == @pg && ip) drop
Mon Jul 13 04:14:33 EDT 2020                                                                          
+ flow_num_before=0
+ sleep 1
+ :
++ ovs-ofctl dump-flows br-int                                                                        
++ wc -l
+ flow_num=11524
+ '[' 11524 -eq 0 ']'
+ flow_num_before=11524                                                                               
+ sleep 1                                                                                             
+ :                                                                                                   
++ ovs-ofctl dump-flows br-int                                                                        
++ wc -l                                                                                              
+ flow_num=11524                                                                                      
+ '[' 11524 -eq 11524 ']'                                                                             
+ break                                                                                               
+ date                                                                                                
Mon Jul 13 04:15:02 EDT 2020    

<=== about 39 seconds
                                                                      
[root@dell-per740-12 bz818128]# ls                                                                    
cleanup.sh  no  ovn2.13.0-39  setup.sh  test.sh  try.sh
[root@dell-per740-12 bz818128]# rpm -qa | grep -E "openvswitch|ovn"
kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch
ovn2.13-2.13.0-37.el7fdp.x86_64                                                                       
openvswitch2.13-2.13.0-10.el7fdp.x86_64                                                               
ovn2.13-central-2.13.0-37.el7fdp.x86_64                                                               
openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch
ovn2.13-host-2.13.0-37.el7fdp.x86_64

on 39:
[root@dell-per740-12 bz818128]# bash try.sh                                                           
from-lport  1001 (inport == @pg && ip) drop
  to-lport  1001 (outport == @pg && ip) drop                                                          
Mon Jul 13 04:19:21 EDT 2020                                                                          
+ flow_num_before=0
+ sleep 1
+ :                                                                                                   
++ ovs-ofctl dump-flows br-int                                                                        
++ wc -l                                                                                              
+ flow_num=11524                                                   
+ '[' 11524 -eq 0 ']'                                       
+ flow_num_before=11524        
+ sleep 1                                                                                             
+ :                                                                                                   
++ ovs-ofctl dump-flows br-int                                                                        
++ wc -l                            
+ flow_num=11524                                                                                      
+ '[' 11524 -eq 11524 ']'                                                                             
+ break                                                                       
+ date                  
Mon Jul 13 04:19:32 EDT 2020  

<=== about 11 seconds
                                                
[root@dell-per740-12 bz818128]# rpm -qa | grep -E "openvswitch|ovn"           
kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch                  
ovn2.13-central-2.13.0-39.el7fdp.x86_64
openvswitch2.13-2.13.0-10.el7fdp.x86_64                                                               
ovn2.13-2.13.0-39.el7fdp.x86_64                                               
ovn2.13-host-2.13.0-39.el7fdp.x86_64                                          
openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch

Comment 15 spower 2020-07-14 18:50:13 UTC
This issue has conditional approval for OSP16.1 Z1 release, it must be either shipped/in CDN prior to July 29th.  If not, we will move to TM=Z2.

Comment 17 errata-xmlrpc 2020-07-27 05:11:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3150


Note You need to log in before you can comment on or make changes to this bug.