Bug 1053114

Summary: [vdsm] [Scalability] When host is loaded with networks - addNetwork and getVdsCaps takes a lot of time to return.
Product: Red Hat Enterprise Virtualization Manager Reporter: pagupta
Component: vdsmAssignee: Barak <bazulay>
Status: CLOSED ERRATA QA Contact: Yuri Obshansky <yobshans>
Severity: high Docs Contact:
Priority: high    
Version: 3.2.0CC: bazulay, danken, dnaori, iheim, juwu, lpeer, mavital, mgoldboi, nyechiel, pagupta, srevivo, tdosek, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: network
Fixed In Version: vt1.3, 4.16.0-1.el6_5 Doc Type: Bug Fix
Doc Text:
Previously, extracting information on networks took a long time when there were multiple networks defined on the host. Using a host with 200+ networks was very slow or impossible. Now, the code has been refactored with attention to asymptotic time efficiency, so that 1000 networks are workable.
Story Points: ---
Clone Of: 714421 Environment:
Last Closed: 2015-02-11 21:10:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 714421    
Bug Blocks: 612978, 1142923, 1156165    
Attachments:
Description Flags
getVdsCaps Graph
none
RT_AddNetwork graph none

Comment 2 pagupta 2014-01-15 08:44:14 UTC
*** Bug 1052976 has been marked as a duplicate of this bug. ***

Comment 3 Dan Kenigsberg 2014-03-04 22:03:54 UTC
Toni have made a significant progress, and all his patches are merged into the master branch. However, they are quite intrusive, and I would prefer not to backport them to the 3.4 branch.

Unless this issue is extremely urgent for a customer, I suggest waiting for rhev-3.5.

Comment 6 Yuri Obshansky 2015-01-04 09:59:48 UTC
I verified that bug on RHEV-M 3.5.0-0.27.el6ev (build vt13.5)
RHEL - 6Server - 6.6.0.2.el6
KVM - 0.12.1.2 - 2.448.el6
libvirt-0.10.2-46.el6_6.1
vdsm-4.16.8.1-4.el6ev

First of all I changed default ovirt configuration vdsHeartbeatInSeconds from 10 sec to 60 sec. 
Other host became not functional (forever in Connecting state). 

I created and attached to host 1000 vlans and ran simple script which performs 100 times command getVdsCaps and measure time:
#!/bin/bash
for x in {1..100}; do
        STARTTIME=`date +%s.%N`
        vdsClient -s 0 getVdsCaps
        ENDTIME=`date +%s.%N`
        TIMEDIFF=`echo "$ENDTIME - $STARTTIME" | bc | awk -F"." '{print $1"."substr($2,1,3)}'`
        echo "$TIMEDIFF" >> getVdsCaps_1000_networks.csv   
done

I got following results:
- min: 21.76 sec
- average: 59.28 sec	
- max: 3268.48 sec	
During script running I got 2 very slow times: 340.66 sec and 3268.48 sec.
See attached graph: getVdsCaps_graph.jpg

I measured response time of REST API calls during population of vlans 
- Create Netowrks: /api/networks/
- Attach Network to cluster: /api/clusters/${cl_id}/networks
- Attach Network to Host NIC: /api/hosts/${host_id}/nics/${dummy_id}/attach
Here is the results (msec):
	                        Count	Average	90% 	Min	Max
Create Networks	                1000	45	55	29	1132
Attach Network to Cluster 	1000	51	63	33	190
Attach Network to Host NIC	1000	18116	35448	808	41075
See attached graph: RT_AddNetwork.png

Comment 7 Yuri Obshansky 2015-01-04 10:00:46 UTC
Created attachment 975935 [details]
getVdsCaps  Graph

Comment 8 Yuri Obshansky 2015-01-04 10:01:24 UTC
Created attachment 975936 [details]
RT_AddNetwork graph

Comment 10 errata-xmlrpc 2015-02-11 21:10:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0159.html