Hide Forgot
Created attachment 505649 [details] libvirtd.log Description of problem: When using vdsm to hibernating (virDomainSave) 4 busy VMs with 2GB of memory (80% usage) vdsm cant read from storage and fence it self. vdsm should have the option to run virDomainSave with O_DIRECT flag to open destenation file in order to minimize cache effect. Version-Release number of selected component (if applicable): libvirt-0.9.1-1.el6.x86_64 vdsm-4.9-75.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1.hibernate 4 vms with 2GB memory in 80% usage via vdsm. Actual results: vdsm fence it self Expected results: Additional info: attached vdsm log and libvirtd log.
Created attachment 505651 [details] vdsm log
I'm not sure how practical it is to simply add O_DIRECT in libvirt, since O_DIRECT places various alignment restrictions on buffers used for I/O [quote open(2)] The O_DIRECT flag may impose alignment restrictions on the length and address of userspace buffers and the file offset of I/Os. In Linux alignment restrictions vary by file system and kernel version and might be absent entirely. However there is currently no file system-independent interface for an applica‐ tion to discover these restrictions for a given file or file system. Some file systems provide their own interfaces for doing so, for example the XFS_IOC_DIOINFO operation in xfsctl(3). Under Linux 2.4, transfer sizes, and the alignment of the user buffer and the file offset must all be multiples of the logi‐ cal block size of the file system. Under Linux 2.6, alignment to 512-byte boundaries suffices. [/quote] If there is a compression program configured, then I'm fairly sure that gzip, bzip, etc won't be expecting these alignment restrictions. If there is no compression program, then QEMU will be writing directly, and again I'm fairly sure QEMU's migration code won't be expecting alignment restrictions. So while O_DIRECT could be desirable, we shouldn't expect it to be a trivial addition.
Dan's point about the application doing the actual writing having to be aware of alignment constraints is indeed problematic. Perhaps this bug should be cloned to qemu so that on direct writes, qemu is indeed honoring O_DIRECT constraints; but I don't think it is worth trying to force O_DIRECT semantics onto compression programs. So for the compression case, we would probably have to rearrange things in libvirt so that if O_DIRECT is required, then the compression program writes to a pipe managed by libvirt, and libvirt then turns around and does the aligned writes, instead of the compression program directly writing to disk. I suppose if we do that for the compression case that we can also do it for the direct case (qemu writes to a pipe instead of direct to disk). At any rate, introducing the intermediate pipe so that libvirt honors O_DIRECT semantics will add additional pipe I/O, but should reduce file system cache pollution by minimizing disk-based I/O, so this might be possible.
Raising severity and flagging blocker as this is critical to RHEVM.
See also bug 634653 for the restore direction.
Upstream work complete as of: commit 28d182506acda4dda6aa610bc44abb7a16f36c55 Author: Eric Blake <eblake> Date: Thu Jul 14 17:22:53 2011 -0600 save: support bypass-cache flag in libvirt-guests init script libvirt-guests is a perfect use case for bypassing the file system cache - lots of filesystem traffic done at system shutdown, where caching is pointless, and startup, where reading large files only once just gets in the way. Make this a configurable option in the init script, but defaulting to existing behavior. * tools/libvirt-guests.sysconf (BYPASS_CACHE): New variable. * tools/libvirt-guests.init.sh (start, suspend_guest): Use it.
1, set BYPASS_CACHE=1 in /etc/sysconfig/libvirt-guests 2, /etc/init.d/libvirt-guests start|stop it works. Through the code inspection, the O_DIRECT will be on when the --bypass-cache is given. For the bug,it is hard to reproduce on my local machine with vdsm installed. David Naori will help with this.
test the effect of O_DIRECT flag 1, before virsh managedsave <guest>, flush and free caches # sync # echo 3 > /proc/sys/vm/drop_caches # cat /proc/meminfo MemTotal: 3887796 kB MemFree: 2793728 kB Buffers: 536 kB Cached: 153556 kB ... 2, execute "virsh managedsave <guest>" without --bypass-cache, check the value of Cached again, it becomes bigger than before. cat /proc/meminfo MemTotal: 3887796 kB MemFree: 2808660 kB Buffers: 8448 kB Cached: 461136 kB 3, clean cache and run virsh managedsave <guest> with --bypass-cache, check the value of Cache, it didn't almost change. # cat /proc/meminfo MemTotal: 3887796 kB MemFree: 3121988 kB Buffers: 6392 kB Cached: 163428 kB
tested with libvirt-0.9.4-0rc1.1.el6.x86_64 vdsm-4.9-86.el6.x86_64 RHEVM 3.0.0.0 [root@rhevm-test vdsm]# echo 3 > /proc/sys/vm/drop_caches [root@rhevm-test vdsm]# head -n4 /proc/meminfo MemTotal: 8061616 kB MemFree: 6529192 kB Buffers: 1276 kB Cached: 79276 kB suspend guest via RHEVM [root@rhevm-test vdsm]# head -n4 /proc/meminfo MemTotal: 8061616 kB MemFree: 6527232 kB Buffers: 5760 kB Cached: 86280 kB From the above we can see , virDomainSave open destination file with O_DIRECT flag, there will be no memory bandwidth spent in the copies between userspace memory and kernel cache So set bug status to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1513.html