Bug 831149

Summary: virt-manager causes iowait, due to rewriting XML files repeatable
Product: Red Hat Enterprise Linux 6 Reporter: Jesper Brouer <jbrouer>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.3CC: acathrow, berrange, crobinso, dyasny, dyuan, gsun, jbrouer, jdenemar, mzhan, rwu, veillard, weizhan, ydu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.9.13-3.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:17:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jesper Brouer 2012-06-12 10:07:41 UTC
Description of problem:

 When opening virt-manager I see a high iowait, on fairly idle system
 with 8 KVM hosts (which are all idle).  When closing virt-manager, to
 IO-wait disappears (vmstat says 0 "wa")

 Further analysis show, that the IO is caused by re-creating the XML
 files (under /var/run/libvirt/qemu/*.xml) every 2 sec, which seems a
 waist of time as the files don't change.

 Notice, that the files are written by libvirtd (by creating a .new
 files, and moving it).  So, I expect this behavior is caused by a bad
 interaction between virt-manager and libvirtd.

Version-Release number of selected component (if applicable):

 # rpm -q libvirt virt-manager
 libvirt-0.9.10-11.el6.x86_64
 virt-manager-0.9.0-11.el6.x86_64

How reproducible: Always

Steps to Reproduce:

1. While running commands, observe IO-wait via 'vmstat 1' column "wa"
2. Run: strace -f -p `pgrep libvirtd` -s 512 -e trace=rename
3. Start virt-manager
4. Observe all the syscall rename calls from the strace
5. Stop virt-manager
6. Observe the rename syscalls stops

Actual results:

 Starting virt-manager causes IO-waits, due to rewriting XML files repeatable

Expected results:

 Starting virt-manager should not case IO-waits, and XML files should
 only be written when something change in them.

Additional info:

Tracking this down was a long process, here are some hints that others might find usable.

Normally you can "just" look at processes in state "D" Uninterruptible sleep (usually IO) via the "ps" command:

  ps ax | awk '$3 ~ /^D/ { print $0 }'

But it just returned/pointed at the process: "[jbd2/sda6-8]" from this info I could NOT connect it with "libvirtd"

I used blktrace and blkparse on the device /dev/sda6, to figure out the it was libvirtd that caused to IO.

As I had mounted /dev/sda6 on /var, I found an easier method of finding files, that was modified within the last minute:

 find /var/  -mmin -1 

This find command pointed to both /var/run/libvirt/qemu/*.xml and /var/lib/libvirt/images/*.img.  Looking at the files via the stat command, it was clear that it was the XML files that changed too often (the img files were only updated every 30 sec).
The following command will help you spot the changes:

 watch -d 'for f in /var/run/libvirt/qemu/*.xml /var/lib/libvirt/images/*.img; do echo -ne "File $f:\t"; stat $f | grep Modify ; done'

When running the watch command, like this:
 watch -d 'for f in /var/run/libvirt/qemu/* /var/lib/libvirt/images/*; do echo -ne "File $f:\t"; stat $f | grep Modify ; done'

I realized that files named "*.xml.new" appeared, then I used strace to trace renaming of files via:

  strace -f -p `pgrep libvirtd` -s 512 -e trace=rename

Comment 1 Daniel Berrangé 2012-06-12 10:16:26 UTC
This is a libvirt bug - every time we run a QEMU monitor command we save the XML state under /var/run/libvirt/qemu. virt-manager triggers a monitor command every time it asks for updated guest status. Latest upstream libvirt fixed this issue

 commit 31796e2c1c8187b6b76a58d43f3bc28e030223ee
 Author: Jiri Denemark <jdenemar>
 Date:   Fri Apr 6 19:42:34 2012 +0200

    qemu: Avoid excessive calls to qemuDomainObjSaveJob()
    
    As reported by Daniel Berrangé, we have a huge performance regression
    for virDomainGetInfo() due to the change which makes virDomainEndJob()
    save the XML status file every time it is called. Previous to that
    change, 2000 calls to virDomainGetInfo() took ~2.5 seconds. After that
    change, 2000 calls to virDomainGetInfo() take 2 *minutes* 45 secs.
    
    We made the change to be able to recover from libvirtd restart in the
    middle of a job. However, only destroy and async jobs are taken care of.
    Thus it makes more sense to only save domain state XML when these jobs
    are started/stopped.

Comment 5 Jiri Denemark 2012-06-12 11:29:19 UTC
The commit Daniel mentioned in comment 1 is upstream as v0.9.11-40-g31796e2.

Comment 6 Jesper Brouer 2012-06-14 08:23:44 UTC
A work-around to mitigate this issue is to lower how often virt-manager updates its status.

Select menu:  "Edit"
 -> Choose: Preferences
    (new window pops up)
    -> Goto fan "Stats"
       Lower:  "Update status every"

Comment 10 Wayne Sun 2012-07-24 10:21:57 UTC
pkgs:
libvirt-0.9.13-3.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.297.el6_3.x86_64
kernel-2.6.32-280.el6.x86_64

prepare a running domain
# virsh list
 Id    Name                           State
----------------------------------------------------
 3     aaa                            running

steps:
1. run vmstat -1 in one terminal
2. run strace on second terminal

# strace -f -p `pidof libvirtd` -s 512 -e trace=rename
Process 25638 attached with 11 threads - interrupt to quit

3. start virt-manager
3.1
# virt-manager

3.2
Destroy domain aaa through virt-manager

what's show up in strace:
[pid 25648] rename("/var/run/libvirt/qemu/aaa.xml.new", "/var/run/libvirt/qemu/aaa.xml") = 0

At same time the wa value stay low.

3.3
Start domain aaa through virt-manager

what's show up in strace:
[pid 25645] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 25642] rename("/var/run/libvirt/qemu/aaa.xml.new", "/var/run/libvirt/qemu/aaa.xml") = 0
Process 26127 attached
Process 26128 attached
[pid 26124] --- SIGALRM (Alarm clock) @ 0 (0) ---
[pid 25642] rename("/var/run/libvirt/qemu/aaa.xml.new", "/var/run/libvirt/qemu/aaa.xml") = 0


Also, the wa value stay low.

Test with libvirt-0.9.10-21.el6.x86_64, the rename continuously pop up in strace. 
So this is fixed now.

Comment 11 errata-xmlrpc 2013-02-21 07:17:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html