Bug 714752

Summary: [Libvirt] virDomainSave should open destination file with O_DIRECT flag
Product: Red Hat Enterprise Linux 6 Reporter: David Naori <dnaori>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.1CC: abaron, berrange, cpelland, dallan, danken, dnaori, dyuan, gren, mgoldboi, mzhan, nzhang, rwu, vbian, veillard, ykaul
Target Milestone: beta   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.9.4-0rc1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 11:15:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
libvirtd.log
none
vdsm log none

Description David Naori 2011-06-20 15:35:41 UTC
Created attachment 505649 [details]
libvirtd.log

Description of problem:

When using vdsm to hibernating (virDomainSave) 4 busy VMs  with 2GB of memory (80% usage) vdsm cant read from storage and fence it self.

vdsm should have the option to run virDomainSave with O_DIRECT flag to open destenation file in order to minimize cache effect.

Version-Release number of selected component (if applicable):
libvirt-0.9.1-1.el6.x86_64
vdsm-4.9-75.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.hibernate 4 vms with 2GB memory in 80% usage via vdsm.
  
Actual results:
vdsm fence it self

Expected results:

Additional info:
attached vdsm log and libvirtd log.

Comment 1 David Naori 2011-06-20 15:38:53 UTC
Created attachment 505651 [details]
vdsm log

Comment 3 Daniel Berrangé 2011-06-20 16:31:51 UTC
I'm not sure how practical it is to simply add O_DIRECT in libvirt, since O_DIRECT places various alignment restrictions on buffers used for I/O

[quote open(2)]
       The O_DIRECT flag may impose  alignment  restrictions  on  the
       length and address of userspace buffers and the file offset of
       I/Os.  In Linux alignment restrictions vary by file system and
       kernel version and might be absent entirely.  However there is
       currently no file system-independent interface for an applica‐
       tion  to  discover these restrictions for a given file or file
       system.  Some file systems provide their  own  interfaces  for
       doing   so,  for  example  the  XFS_IOC_DIOINFO  operation  in
       xfsctl(3).

       Under Linux 2.4, transfer sizes, and the alignment of the user
       buffer  and the file offset must all be multiples of the logi‐
       cal block size of the file system.  Under Linux 2.6, alignment
       to 512-byte boundaries suffices.
[/quote]

If there is a compression program configured, then I'm fairly sure that gzip, bzip, etc won't be expecting these alignment restrictions.

If there is no compression program, then QEMU will be writing directly, and again I'm fairly sure QEMU's migration code won't be expecting alignment restrictions.

So while O_DIRECT could be desirable, we shouldn't expect it to be a trivial addition.

Comment 5 Eric Blake 2011-06-22 13:56:59 UTC
Dan's point about the application doing the actual writing having to be aware of alignment constraints is indeed problematic.  Perhaps this bug should be cloned to qemu so that on direct writes, qemu is indeed honoring O_DIRECT constraints; but I don't think it is worth trying to force O_DIRECT semantics onto compression programs.  So for the compression case, we would probably have to rearrange things in libvirt so that if O_DIRECT is required, then the compression program writes to a pipe managed by libvirt, and libvirt then turns around and does the aligned writes, instead of the compression program directly writing to disk.  I suppose if we do that for the compression case that we can also do it for the direct case (qemu writes to a pipe instead of direct to disk).  At any rate, introducing the intermediate pipe so that libvirt honors O_DIRECT semantics will add additional pipe I/O, but should reduce file system cache pollution by minimizing disk-based I/O, so this might be possible.

Comment 7 Ayal Baron 2011-06-29 07:19:32 UTC
Raising severity and flagging blocker as this is critical to RHEVM.

Comment 8 Eric Blake 2011-07-12 17:30:24 UTC
See also bug 634653 for the restore direction.

Comment 9 Eric Blake 2011-07-26 12:56:57 UTC
Upstream work complete as of:

commit 28d182506acda4dda6aa610bc44abb7a16f36c55
Author: Eric Blake <eblake>
Date:   Thu Jul 14 17:22:53 2011 -0600

    save: support bypass-cache flag in libvirt-guests init script
    
    libvirt-guests is a perfect use case for bypassing the file system
    cache - lots of filesystem traffic done at system shutdown, where
    caching is pointless, and startup, where reading large files only
    once just gets in the way.  Make this a configurable option in the
    init script, but defaulting to existing behavior.
    
    * tools/libvirt-guests.sysconf (BYPASS_CACHE): New variable.
    * tools/libvirt-guests.init.sh (start, suspend_guest): Use it.

Comment 12 Gunannan Ren 2011-07-28 10:28:22 UTC
1, set BYPASS_CACHE=1 in /etc/sysconfig/libvirt-guests
2, /etc/init.d/libvirt-guests start|stop 
it works. 
Through the code inspection, the O_DIRECT will be on when the --bypass-cache is given.

For the bug,it is hard to reproduce on my local machine with vdsm installed.
David Naori will help with this.

Comment 14 Gunannan Ren 2011-07-28 12:18:57 UTC
test the effect of O_DIRECT flag

1, before virsh managedsave <guest>, flush and free caches 

# sync
# echo 3 > /proc/sys/vm/drop_caches 
# cat /proc/meminfo 
MemTotal:        3887796 kB
MemFree:         2793728 kB
Buffers:             536 kB
Cached:           153556 kB
...

2, execute "virsh managedsave <guest>" without --bypass-cache, check the value of Cached again, it becomes bigger than before.

cat /proc/meminfo 
MemTotal:        3887796 kB
MemFree:         2808660 kB
Buffers:            8448 kB
Cached:           461136 kB

3, clean cache and run virsh managedsave <guest> with --bypass-cache, check the value of Cache, it didn't almost change.

# cat /proc/meminfo 
MemTotal:        3887796 kB
MemFree:         3121988 kB
Buffers:            6392 kB
Cached:           163428 kB

Comment 15 Vivian Bian 2011-07-29 09:20:05 UTC
tested with 
libvirt-0.9.4-0rc1.1.el6.x86_64
vdsm-4.9-86.el6.x86_64
RHEVM 3.0.0.0

[root@rhevm-test vdsm]# echo 3 > /proc/sys/vm/drop_caches 
[root@rhevm-test vdsm]# head -n4 /proc/meminfo 
MemTotal:        8061616 kB
MemFree:         6529192 kB
Buffers:            1276 kB
Cached:            79276 kB

suspend guest via RHEVM

[root@rhevm-test vdsm]# head -n4 /proc/meminfo 
MemTotal:        8061616 kB
MemFree:         6527232 kB
Buffers:            5760 kB
Cached:            86280 kB


From the above we can see , virDomainSave open destination file with O_DIRECT flag, there will be no memory bandwidth spent in the copies between userspace memory and kernel cache

So set bug status to VERIFIED

Comment 16 errata-xmlrpc 2011-12-06 11:15:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html