Created attachment 897917 [details] script to create 100 SDs Description of problem: high cpu usage attributed to 'vdsm' after setting 100 NFS storage domains Version-Release number of selected component (if applicable): is36.4 vdsm-4.13.2-0.17.el6ev How reproducible: 100% Steps to Reproduce: 1. set NFS DC with 3 hosts (not sure if we really need 3 hosts) 2. create 100 NFS storage domains 3. run "top" on one of the hosts and check vdsm Actual results: ========== 7589 vdsm 0 -20 3537m 55m 6532 S 249.4 0.6 2540:04 vdsm Expected results: Additional info:
Created attachment 897918 [details] logs
Created attachment 897920 [details] create_sd.py
How many cores in the machine that show 249% cpu usage?
4 cores 1 socket
Some more info from the machine: cpu: Intel(R) Xeon(R) CPU E5504 @ 2.00GHz release: Red Hat Enterprise Linux Server release 6.5 Beta (Santiago) last yum update: 2014-05-04 (missing lot of updates)
Please repeat this test with sane number of storage domains - we have customers using 30-40 storage domains, and it would be useful to see how the system behave in normal conditions to evaluate the severity of this issue.
I reproduced this partially using master (2014-05-21) setup with 30 nfs storage domains. I don't get the extreme cpu reported by Aharon, only little high cpu usage of a about 20% out of 800%. Attached profiles showing where time is spent on this setup on the spm.
Created attachment 898046 [details] profile results sorted by time
Created attachment 898047 [details] profile results sorted by cumultive time
The high cpu usage is caused by inefficient implementation of the mount related code, having O(N^2) complexity. NfsStorageDomain.selftest is responsible to 267 seconds of total 458 seconds of cpu time (58%).
Set severity to medium and schedule for 3.5.0, since with normal setup (30 storage domains), this is not a major issue. This is also not a regression, the code responsible for this is from 2012.
Marina, can you tell us what is a common number of storage domains in the field? Do we support systems with more than 30-40 NFS storage domains?
Without requirement guidelines, these kind of bugs are pointless. Sean - We need concrete definition on the size of environment we need to support, and the hardware we require customers to have for it. Aharon/Gil - we need input on what QA are able to test. [in any event, 100 SDs sounds like a usecase we'll never see in the field, and if we do, the first action item would be to consolidate them.]
Nir, You asked me to set 100 SDs in comment #7 from https://bugzilla.redhat.com/show_bug.cgi?id=1095907 Anyway, in case it is supported we need to fix, In case it is not, we need to block the option to add SD above supported numbers. I think is it up to PM to decide and then we should continue accordingly Sean?
(In reply to Aharon Canan from comment #16) > You asked me to set 100 SDs in comment #7 from > https://bugzilla.redhat.com/show_bug.cgi?id=1095907 In https://bugzilla.redhat.com/show_bug.cgi?id=1095907#c6 I asked for "30 ISCIS storage domains" In https://bugzilla.redhat.com/show_bug.cgi?id=1095907#c7 I suggested to create "lot of (100?) mounts" Sorry if that was not clear.
Aharon, can you test the attached patch with your setup?
do not have resource for integration testing for now.
Bug verified on RHEV-M 3.5.0-0.22.el6ev RHEL - 6Server - 6.6.0.2.el6 libvirt-0.10.2-46.el6_6.1 vdsm-4.16.7.6-1.el6ev Created 100 NFS Storage Domains and checked top on host: top - 18:52:14 up 12 days, 9:09, 2 users, load average: 1.29, 1.21, 1.15 Tasks: 1343 total, 1 running, 1341 sleeping, 0 stopped, 1 zombie Cpu(s): 0.3%us, 1.0%sy, 0.0%ni, 97.9%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 396875340k total, 6112184k used, 390763156k free, 159604k buffers Swap: 16383996k total, 0k used, 16383996k free, 1931216k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 45620 vdsm 0 -20 33.6g 121m 9700 S 62.4 0.0 171:36.09 vdsm Bug didn't reproduce.