Bug 662643

Summary: [vdsm] [service] [scale] getVdsCaps returns after ~2 minutes (98 running vms)
Product: Red Hat Enterprise Linux 6 Reporter: Haim <hateya>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED WORKSFORME QA Contact: yeylon <yeylon>
Severity: high Docs Contact:
Priority: low    
Version: 6.1CC: abaron, hateya, iheim, mgoldboi, Rhev-m-bugs, srevivo, yeylon, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-16 11:52:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm logs. none

Description Haim 2010-12-13 13:36:06 UTC
Created attachment 468372 [details]
vdsm logs.

Description of problem:

running some small scale testing over system with 4 hosts, 5 storage domains, and about 200 vms running, i see some disturbing behavior where getVdsCaps command takes long time to return (2m 9 seconds - see below for full output).

this issue produces situation where host differs between non-responsive to non-operational (see no event for host goes non-responsive) then, due too other scale bugs, it fails to connect to storage server. 


[root@nott-vds4 ~]# time vdsClient -s 0 getVdsCaps
          


        HBAInventory = {'iSCSI': [{'InitiatorName': 'iqn.1994-05.com.redhat:b5b3c72ebe3'}], 'FC': []}
        ISCSIInitiatorName = iqn.1994-05.com.redhat:b5b3c72ebe3
        bondings = {'bond4': {'cfg': {}, 'netmask': '', 'addr': '', 'slaves': []}, 'bond0': {'cfg': {}, 'netmask': '', 'addr': '', 'slaves': []}, 'bond1': {'cfg': {}, 'netmask': '', 'addr': '', 'slaves': []}, 'bond2': {'cfg': {}, 'netmask': '', 'addr': '', 'slaves': []}, 'bond3': {'cfg': {}, 'netmask': '', 'addr': '', 'slaves': []}}
        clusterLevels = ['2.3']
        cpuCores = 16
        cpuFlags = fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,rep_good,xtopology,nonstop_tsc,aperfmperf,pni,dtes64,monitor,ds_cpl,vmx,est,tm2,ssse3,cx16,xtpr,pdcm,dca,sse4_1,sse4_2,popcnt,lahf_lm,ida,tpr_shadow,vnmi,flexpriority,ept,vpid,model_486,model_pentium,model_pentium2,model_pentium3,model_pentiumpro,model_qemu32,model_coreduo,model_core2duo,model_n270,model_Conroe,model_Penryn,model_Nehalem,model_Opteron_G1
        cpuModel = Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
        cpuSockets = 2
        cpuSpeed = 2266.611
        emulatedMachines = ['pc', 'rhel6.0.0', 'rhel5.5.0', 'rhel5.4.4', 'rhel5.4.0']
        guestOverhead = 65
        hooks = {}
        kvmEnabled = true
        lastClient = 10.35.115.13
        lastClientIface = rhevm
        management_ip = 
        memSize = 32098
        networks = {'rhevm': {'cfg': {'DELAY': '0', 'NM_CONTROLLED': 'yes', 'BOOTPROTO': 'dhcp', 'DEVICE': 'rhevm', 'TYPE': 'Bridge', 'ONBOOT': 'yes'}, 'netmask': '255.255.255.0', 'stp': 'off', 'ports': ['eth0', 'vnet0', 'vnet1', 'vnet2', 'vnet3', 'vnet4', 'vnet5', 'vnet6', 'vnet7', 'vnet8', 'vnet9', 'vnet10', 'vnet11', 'vnet12', 'vnet13', 'vnet14', 'vnet15', 'vnet16', 'vnet17', 'vnet18', 'vnet19', 'vnet20', 'vnet21', 'vnet22', 'vnet23', 'vnet24', 'vnet25', 'vnet26', 'vnet27', 'vnet28', 'vnet29', 'vnet30', 'vnet31', 'vnet32', 'vnet33', 'vnet34', 'vnet35', 'vnet36', 'vnet37', 'vnet38', 'vnet39', 'vnet40', 'vnet41', 'vnet42', 'vnet43', 'vnet44', 'vnet45', 'vnet46', 'vnet47', 'vnet48', 'vnet49', 'vnet50', 'vnet51', 'vnet52', 'vnet53', 'vnet54', 'vnet55', 'vnet56', 'vnet57', 'vnet58', 'vnet59', 'vnet60', 'vnet61', 'vnet62', 'vnet63', 'vnet64', 'vnet65', 'vnet66', 'vnet67', 'vnet68', 'vnet69', 'vnet70', 'vnet71', 'vnet72', 'vnet73', 'vnet74', 'vnet75', 'vnet76', 'vnet77', 'vnet78', 'vnet79', 'vnet80', 'vnet81', 'vnet82', 'vnet83', 'vnet84', 'vnet85', 'vnet86', 'vnet87', 'vnet88', 'vnet89', 'vnet90', 'vnet91', 'vnet92', 'vnet93', 'vnet94', 'vnet95', 'vnet96', 'vnet97', 'vnet98', 'vnet99', 'vnet100', 'vnet101', 'vnet102', 'vnet103', 'vnet104', 'vnet105', 'vnet106', 'vnet107', 'vnet108', 'vnet109', 'vnet110', 'vnet111', 'vnet112', 'vnet113', 'vnet114', 'vnet115', 'vnet116', 'vnet117', 'vnet118', 'vnet119', 'vnet120', 'vnet121', 'vnet122', 'vnet123', 'vnet124', 'vnet125', 'vnet126', 'vnet127', 'vnet128', 'vnet129', 'vnet130', 'vnet131', 'vnet132', 'vnet133', 'vnet134', 'vnet135', 'vnet136', 'vnet137', 'vnet138', 'vnet139', 'vnet140', 'vnet141', 'vnet142', 'vnet143', 'vnet144', 'vnet145', 'vnet146', 'vnet147', 'vnet148', 'vnet149', 'vnet150', 'vnet151', 'vnet152', 'vnet153', 'vnet154', 'vnet155', 'vnet156', 'vnet157', 'vnet158', 'vnet159', 'vnet160', 'vnet161', 'vnet162', 'vnet163', 'vnet164', 'vnet165', 'vnet166', 'vnet167', 'vnet168', 'vnet169', 'vnet170', 'vnet171', 'vnet172', 'vnet173', 'vnet174', 'vnet175', 'vnet176', 'vnet177', 'vnet178', 'vnet179', 'vnet180', 'vnet181', 'vnet182', 'vnet183', 'vnet184', 'vnet185', 'vnet186', 'vnet187', 'vnet188', 'vnet189'], 'addr': '10.35.115.13'}, 'virbr0': {'cfg': {}, 'netmask': '255.255.255.0', 'stp': 'on', 'ports': [], 'addr': '192.168.122.1'}}
        nics = {'eth1': {'hwaddr': '78:E7:D1:E4:8C:53', 'netmask': '', 'speed': 0, 'addr': ''}, 'eth0': {'hwaddr': '78:E7:D1:E4:8C:52', 'netmask': '', 'speed': 1000, 'addr': ''}}
        operatingSystem = {'release': '6.0.0.37.el6', 'version': '6Server', 'name': 'RHEL'}
        packages = [{'release': '71.el6', 'buildtime': '1283321164', 'version': '2.6.32', 'name': 'kernel'}, {'release': '71.7.1.el6', 'buildtime': '1288168562', 'version': '2.6.32', 'name': 'kernel'}, {'release': '2.113.el6_0.3', 'buildtime': '1287060573', 'version': '0.12.1.2', 'name': 'qemu-kvm'}, {'release': '2.113.el6_0.3', 'buildtime': '1287060573', 'version': '0.12.1.2', 'name': 'qemu-img'}, {'release': '29.el6', 'buildtime': '1291650662', 'version': '4.9', 'name': 'vdsm'}, {'release': '15.el6', 'buildtime': '1280936485', 'version': '0.4.2', 'name': 'spice-server'}, {'release': '28.el6.jd', 'buildtime': '1291370811', 'version': '0.8.1', 'name': 'libvirt'}]
        reservedMem = 256
        software_revision = 29
        software_version = 4.9
        supportedProtocols = ['2.2', '2.3']
        supportedRHEVMs = ['2.3']
        uuid = 38373035-3536-4247-3830-333334343957_78:e7:d1:e4:8c:52
        version_name = Snow Man
        vlans = {}
        vmTypes = ['kvm', 'qemu']

real    2m9.309s
user    0m0.160s
sys     0m0.042s

attached all vdsm logs (17!)

more info:

1) getVdsStats returns in 2 seconds!
2) vdsm cpu moves around 100-140%
3  vdsm memory moves around 50% 
4) [root@nott-vds4 vdsm]# virsh list |wc -l 
98

Comment 1 Haim 2010-12-13 13:38:16 UTC
vdsm-4.9-29.el6.x86_64
libvirt-0.8.1-28.el6.jd.x86_64

Comment 5 Dan Kenigsberg 2011-01-16 11:52:50 UTC
Please reopen when this reproduces.