Bug 886933

Summary:	High disk usage when both libvirt and virt-manager are opened
Product:	Red Hat Enterprise Linux 6	Reporter:	g.danti
Component:	libvirt	Assignee:	Jiri Denemark <jdenemar>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	6.3	CC:	acathrow, bili, dyasny, dyuan, g.danti, gsun, jdenemar, mzhan, rwu, tlavigne
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	libvirt-0.10.2-1.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-02-21 07:28:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description g.danti 2012-12-13 15:53:33 UTC

Description of problem:
When both libvirtd and virt-manager are opened and one or more virtual machine are started, the host system continuously rewrite some .xml files under /var/run/libvirt/qemu system directory.

These repeated writes hit the host's physical storage at the same rate of virt-manager statistics update delay generating, depending of what statistics virt-manager is collecting (cpu/mem only or disk and network also), about 500-800 KB of write traffic per started virtual machine.

This means that a system with 2 virtual machine will issue ~1.5 MB of write traffic to the physical disks per second, while a system with 20 VMs will issue traffic in excess of 15 MB/s. This is independent of guest OS state - the writes will happen even inside grub.

Version-Release number of selected component (if applicable):
libvirt-python-0.9.10-21.el6_3.6.x86_64
libvirt-client-0.9.10-21.el6_3.6.x86_64
libvirt-0.9.10-21.el6_3.6.x86_64
virt-manager-0.9.0-14.el6.x86_64

How reproducible:

Steps to Reproduce:
1. open a command prompt and issue the dstat --disk command
2. start libvirtd and open virt-manager
3. start a Linux virtual machine and stop it at the grub screen
4. see how dstat reports high disk usage

Actual results:
This high write traffic can impair host and guests disk performance.

Expected results:
As disk bandwidth is a precious resource, libvirtd and virt-manager should not pose so much load to the host's disks.

Additional info:
Moving the /var/run/libvirt/qemu inside /dev/shm (or mounting a tmpfs in its location) bypasses the problem by moving the repeated writes inside main system RAM. As the FHS indicate that the content of /var/run should be renewed for each boot, the solutions above seem passable as temporary workaround.

Comment 1 Daniel Berrangé 2012-12-13 15:58:45 UTC

The upstream fix was done in 0.9.12:

commit 31796e2c1c8187b6b76a58d43f3bc28e030223ee
Author: Jiri Denemark <jdenemar>
Date:   Fri Apr 6 19:42:34 2012 +0200

    qemu: Avoid excessive calls to qemuDomainObjSaveJob()
    
    As reported by Daniel Berrangé, we have a huge performance regression
    for virDomainGetInfo() due to the change which makes virDomainEndJob()
    save the XML status file every time it is called. Previous to that
    change, 2000 calls to virDomainGetInfo() took ~2.5 seconds. After that
    change, 2000 calls to virDomainGetInfo() take 2 *minutes* 45 secs.
    
    We made the change to be able to recover from libvirtd restart in the
    middle of a job. However, only destroy and async jobs are taken care of.
    Thus it makes more sense to only save domain state XML when these jobs
    are started/stopped.


Further to that, in latest libvirt+kvm, we don't even need to query the balloon driver when calling virDomainGetInfo(), since QEMU gives us async notifications of balloon changes. This avoids the performance issue entirely

Comment 3 g.danti 2012-12-13 16:15:51 UTC

Hi,
thanks for the quick reply.

Will this fix be backported to RHEL 6.x?

In the meantime, it is safe to use tmpsfs to store the the /var/run/libvirt/qemu directory?

Thanks.

Comment 4 Wayne Sun 2012-12-17 04:01:06 UTC

pkgs:
libvirt-0.10.2-12.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.337.el6.x86_64
kernel-2.6.32-341.el6.x86_64

steps:
1. set log level to 1
log_outputs="1:file:/tmp/libvirtd.log"

then restart libvirtd

2. prepare two guests paused at grub
# virsh list
 Id    Name                           State
----------------------------------------------------
 17    T1                             paused
 18    libvirt_test_api               paused

3. prepare a script
# vim getinfo.sh
#!/bin/sh

while true; do
  count=$(grep virDomainGetInfo -n /tmp/libvirtd.log |wc -l)
  if [ "$count" == "2000" ]; then
    exit
  fi
done

# time ./getinfo.sh

real	16m38.126s
user	10m28.336s
sys	8m12.531s

2000 calls to virDomainGetInfo() of 2 guests take 16 *minutes* 35 secs.

4. calculate disk write traffic
# dstat -d 1 60

manual calculate shows write traffic is 6k per second.

Comment 5 Wayne Sun 2012-12-17 04:02:36 UTC

(In reply to comment #4)
> pkgs:
> libvirt-0.10.2-12.el6.x86_64
> qemu-kvm-rhev-0.12.1.2-2.337.el6.x86_64
> kernel-2.6.32-341.el6.x86_64
> 
> steps:
> 1. set log level to 1
> log_outputs="1:file:/tmp/libvirtd.log"
> 
> then restart libvirtd
> 
> 2. prepare two guests paused at grub
> # virsh list
>  Id    Name                           State
> ----------------------------------------------------
>  17    T1                             paused
>  18    libvirt_test_api               paused
> 

virt-manager is running here

> 3. prepare a script
> # vim getinfo.sh
> #!/bin/sh
> 
> while true; do
>   count=$(grep virDomainGetInfo -n /tmp/libvirtd.log |wc -l)
>   if [ "$count" == "2000" ]; then
>     exit
>   fi
> done
> 
> # time ./getinfo.sh
> 
> real	16m38.126s
> user	10m28.336s
> sys	8m12.531s
> 
> 2000 calls to virDomainGetInfo() of 2 guests take 16 *minutes* 35 secs.
> 
> 4. calculate disk write traffic
> # dstat -d 1 60
> 
> manual calculate shows write traffic is 6k per second.

Comment 6 Wayne Sun 2012-12-17 06:12:02 UTC

pkgs:
libvirt-0.9.10-21.el6_3.7.x86_64

steps:
same as comment #4

3.
# time ./getinfo.sh

real	16m35.678s
user	10m27.248s
sys	7m54.820s

2000 calls to virDomainGetInfo() of 2 guests take 16 *minutes* 35 secs.
4.
write traffic is 328k per second.

also tested with libvirt-0.9.10-21.el6_3.6.x86_64, result is similar with 3.7.

Comment 7 EricLee 2012-12-17 07:38:18 UTC

I have reproduced the bug as Steps in Bug Description with the package libvirt-0.9.10-21.el6_3.6.x86_64. However, I get about 800k write traffic per 8 vms per second after eight guests have run long time(of course stoping at grub choosing kernel menu), and when they are just running status the traffic are about 200k per vm. Do not be serious as the bug descritption.

Keep virt-manager opening.

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 1     a                              running
 2     b                              running
 3     c                              running
 4     d                              running
 5     r6u1                           running
 6     r6u2                           running
 7     r6u3                           running
 8     test                           running

They are all hanging at grub choosing kernel menu.

# dstat --disk
-dsk/total-
 read  writ
   0   744k
   0   776k
   0   772k
   0   748k
   0   744k
   0   748k
   0   756k
   0   748k
   0   756k
   0   796k
   0   804k
   0   772k
   0   772k
   0   748k
   0   848k
   0   776k
   0   760k
   0   752k
   0   840k
   0   828k
   0   828k
   0   912k
   0   772k
   0   752k
   0   744k
   0   752k
   0   744k

Comment 9 Min Zhan 2012-12-20 09:39:40 UTC

Move to VERIFIED per Comment 4.

Comment 10 errata-xmlrpc 2013-02-21 07:28:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html