661321 – [vdsm] [scale] vdsm CPU consumption goes between 180-400 when running 100 vms

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 661321 - [vdsm] [scale] vdsm CPU consumption goes between 180-400 when running 100 vms

Summary: [vdsm] [scale] vdsm CPU consumption goes between 180-400 when running 100 vms

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	6.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Federico Simoncelli
QA Contact:	Haim
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	683044 (view as bug list)
Depends On:	687907
Blocks:
TreeView+	depends on / blocked

Reported:	2010-12-08 14:55 UTC by Haim
Modified:	2014-01-13 00:48 UTC (History)
CC List:	10 users (show)
Fixed In Version:	vdsm-4.9-62
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-12-06 07:03:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2011:1782	0	normal	SHIPPED_LIVE	new packages: vdsm	2011-12-06 11:55:51 UTC

Description Haim 2010-12-08 14:55:23 UTC

Description of problem:

vdsm cpu consumption goes between 180-400 when there are 100 running vms. 
libvirt CPU moves between 20-60 during that time. 
during that time, there are about 237 threads.

attached captures:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                           
17327 vdsm      15  -5 10.6g 289m 6220 S 257.0  0.9 233:24.79 vdsm                                                                                             

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                           
17327 vdsm      15  -5 10.6g 290m 6220 S 301.2  0.9 234:35.71 vdsm                                                                                             

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                           
17327 vdsm      15  -5 10.6g 291m 6220 S 341.7  0.9 236:59.49 vdsm 

Danken started python profiling on service, and it seems as if select.select takes lots of cpu time. 

repro steps: 

1) run about 100 vms 
2) run top 

vdsm-4.9-28.el6.x86_64
libvirt-0.8.1-28.el6.x86_64

Comment 4 Ayal Baron 2011-02-09 21:40:27 UTC

Patch partially fixes this (reduces consumption from 400% to 200%).
Need to investigate further and see what is still consuming so much CPU.
Currently each VM is sampled every 1s, but 40 samples per second in general should not take so much CPU (although we need to consider reducing frequency)

Comment 5 Federico Simoncelli 2011-02-28 17:53:45 UTC

I am not able to reproduce this issue with vdsm-4.9-51.
The vm's are running (100) and each has 1 virtual disk (on nfs) attached.

# vdsClient -s 0 list table | wc -l
100

# top -b | head
top - 19:24:04 up 4 days,  7:53,  2 users,  load average: 1.25, 2.01, 1.26
Tasks: 702 total,  39 running, 663 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.1%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32868712k total,  2653236k used, 30215476k free,   102092k buffers
Swap: 16383992k total,        0k used, 16383992k free,   682960k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2147 root      20   0  689m  15m 4680 S 42.7  0.0  13:12.67 libvirtd
23607 vdsm      15  -5 10.3g 186m 6796 S 39.0  0.6  15:09.43 vdsm
 2442 qemu      20   0  266m  12m 2964 S 14.9  0.0   1:18.08 qemu-kvm

The vm's have no guest running, but it shouldn't affect my test.

Comment 6 Haim 2011-02-28 17:59:31 UTC

(In reply to comment #5)
> I am not able to reproduce this issue with vdsm-4.9-51.
> The vm's are running (100) and each has 1 virtual disk (on nfs) attached.
> 
> # vdsClient -s 0 list table | wc -l
> 100
> 
> # top -b | head
> top - 19:24:04 up 4 days,  7:53,  2 users,  load average: 1.25, 2.01, 1.26
> Tasks: 702 total,  39 running, 663 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.0%us,  0.1%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  32868712k total,  2653236k used, 30215476k free,   102092k buffers
> Swap: 16383992k total,        0k used, 16383992k free,   682960k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  2147 root      20   0  689m  15m 4680 S 42.7  0.0  13:12.67 libvirtd
> 23607 vdsm      15  -5 10.3g 186m 6796 S 39.0  0.6  15:09.43 vdsm
>  2442 qemu      20   0  266m  12m 2964 S 14.9  0.0   1:18.08 qemu-kvm
> 
> The vm's have no guest running, but it shouldn't affect my test.

please test with NFS (block device), tested it 3 times, all with both FCP\iSCSI, and got same results, never tested with NFS.

Comment 13 Ayal Baron 2011-04-13 08:18:10 UTC

*** Bug 683044 has been marked as a duplicate of this bug. ***

Comment 14 Haim 2011-04-28 13:43:52 UTC

on vdsm build vdsm-4.9-62, when machine runs about 90 vms, CPU consumption doesn't go above 20%. 


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                           
13207 vdsm      15  -5 10.2g 212m 6580 S 17.1  0.7   5:38.23 vdsm                                                                                              
13709 vdsm      15  -5 1409m  25m 1628 S  0.0  0.1   0:00.69 vdsm                                                                                              
13204 vdsm      15  -5  9212  684  500 S  0.0  0.0   0:00.00 respawn                                                                                           
13710 vdsm      15  -5 1473m  25m 1400 S  0.0  0.1   0:00.00 vdsm  


[root@nott-vds2 nfswork]# virsh list |wc -l 
91

Comment 15 errata-xmlrpc 2011-12-06 07:03:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html

Note You need to log in before you can comment on or make changes to this bug.