Red Hat Bugzilla – Bug 886933
High disk usage when both libvirt and virt-manager are opened
Last modified: 2013-02-21 02:28:38 EST
Description of problem: When both libvirtd and virt-manager are opened and one or more virtual machine are started, the host system continuously rewrite some .xml files under /var/run/libvirt/qemu system directory. These repeated writes hit the host's physical storage at the same rate of virt-manager statistics update delay generating, depending of what statistics virt-manager is collecting (cpu/mem only or disk and network also), about 500-800 KB of write traffic per started virtual machine. This means that a system with 2 virtual machine will issue ~1.5 MB of write traffic to the physical disks per second, while a system with 20 VMs will issue traffic in excess of 15 MB/s. This is independent of guest OS state - the writes will happen even inside grub. Version-Release number of selected component (if applicable): libvirt-python-0.9.10-21.el6_3.6.x86_64 libvirt-client-0.9.10-21.el6_3.6.x86_64 libvirt-0.9.10-21.el6_3.6.x86_64 virt-manager-0.9.0-14.el6.x86_64 How reproducible: Steps to Reproduce: 1. open a command prompt and issue the dstat --disk command 2. start libvirtd and open virt-manager 3. start a Linux virtual machine and stop it at the grub screen 4. see how dstat reports high disk usage Actual results: This high write traffic can impair host and guests disk performance. Expected results: As disk bandwidth is a precious resource, libvirtd and virt-manager should not pose so much load to the host's disks. Additional info: Moving the /var/run/libvirt/qemu inside /dev/shm (or mounting a tmpfs in its location) bypasses the problem by moving the repeated writes inside main system RAM. As the FHS indicate that the content of /var/run should be renewed for each boot, the solutions above seem passable as temporary workaround.
The upstream fix was done in 0.9.12: commit 31796e2c1c8187b6b76a58d43f3bc28e030223ee Author: Jiri Denemark <jdenemar@redhat.com> Date: Fri Apr 6 19:42:34 2012 +0200 qemu: Avoid excessive calls to qemuDomainObjSaveJob() As reported by Daniel Berrangé, we have a huge performance regression for virDomainGetInfo() due to the change which makes virDomainEndJob() save the XML status file every time it is called. Previous to that change, 2000 calls to virDomainGetInfo() took ~2.5 seconds. After that change, 2000 calls to virDomainGetInfo() take 2 *minutes* 45 secs. We made the change to be able to recover from libvirtd restart in the middle of a job. However, only destroy and async jobs are taken care of. Thus it makes more sense to only save domain state XML when these jobs are started/stopped. Further to that, in latest libvirt+kvm, we don't even need to query the balloon driver when calling virDomainGetInfo(), since QEMU gives us async notifications of balloon changes. This avoids the performance issue entirely
Hi, thanks for the quick reply. Will this fix be backported to RHEL 6.x? In the meantime, it is safe to use tmpsfs to store the the /var/run/libvirt/qemu directory? Thanks.
pkgs: libvirt-0.10.2-12.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.337.el6.x86_64 kernel-2.6.32-341.el6.x86_64 steps: 1. set log level to 1 log_outputs="1:file:/tmp/libvirtd.log" then restart libvirtd 2. prepare two guests paused at grub # virsh list Id Name State ---------------------------------------------------- 17 T1 paused 18 libvirt_test_api paused 3. prepare a script # vim getinfo.sh #!/bin/sh while true; do count=$(grep virDomainGetInfo -n /tmp/libvirtd.log |wc -l) if [ "$count" == "2000" ]; then exit fi done # time ./getinfo.sh real 16m38.126s user 10m28.336s sys 8m12.531s 2000 calls to virDomainGetInfo() of 2 guests take 16 *minutes* 35 secs. 4. calculate disk write traffic # dstat -d 1 60 manual calculate shows write traffic is 6k per second.
(In reply to comment #4) > pkgs: > libvirt-0.10.2-12.el6.x86_64 > qemu-kvm-rhev-0.12.1.2-2.337.el6.x86_64 > kernel-2.6.32-341.el6.x86_64 > > steps: > 1. set log level to 1 > log_outputs="1:file:/tmp/libvirtd.log" > > then restart libvirtd > > 2. prepare two guests paused at grub > # virsh list > Id Name State > ---------------------------------------------------- > 17 T1 paused > 18 libvirt_test_api paused > virt-manager is running here > 3. prepare a script > # vim getinfo.sh > #!/bin/sh > > while true; do > count=$(grep virDomainGetInfo -n /tmp/libvirtd.log |wc -l) > if [ "$count" == "2000" ]; then > exit > fi > done > > # time ./getinfo.sh > > real 16m38.126s > user 10m28.336s > sys 8m12.531s > > 2000 calls to virDomainGetInfo() of 2 guests take 16 *minutes* 35 secs. > > 4. calculate disk write traffic > # dstat -d 1 60 > > manual calculate shows write traffic is 6k per second.
pkgs: libvirt-0.9.10-21.el6_3.7.x86_64 steps: same as comment #4 3. # time ./getinfo.sh real 16m35.678s user 10m27.248s sys 7m54.820s 2000 calls to virDomainGetInfo() of 2 guests take 16 *minutes* 35 secs. 4. write traffic is 328k per second. also tested with libvirt-0.9.10-21.el6_3.6.x86_64, result is similar with 3.7.
I have reproduced the bug as Steps in Bug Description with the package libvirt-0.9.10-21.el6_3.6.x86_64. However, I get about 800k write traffic per 8 vms per second after eight guests have run long time(of course stoping at grub choosing kernel menu), and when they are just running status the traffic are about 200k per vm. Do not be serious as the bug descritption. Keep virt-manager opening. # virsh list --all Id Name State ---------------------------------------------------- 1 a running 2 b running 3 c running 4 d running 5 r6u1 running 6 r6u2 running 7 r6u3 running 8 test running They are all hanging at grub choosing kernel menu. # dstat --disk -dsk/total- read writ 0 744k 0 776k 0 772k 0 748k 0 744k 0 748k 0 756k 0 748k 0 756k 0 796k 0 804k 0 772k 0 772k 0 748k 0 848k 0 776k 0 760k 0 752k 0 840k 0 828k 0 828k 0 912k 0 772k 0 752k 0 744k 0 752k 0 744k
Move to VERIFIED per Comment 4.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0276.html