Bug 714752 - [Libvirt] virDomainSave should open destination file with O_DIRECT flag
Summary: [Libvirt] virDomainSave should open destination file with O_DIRECT flag
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: beta
: ---
Assignee: Eric Blake
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-20 15:35 UTC by David Naori
Modified: 2011-12-06 11:15 UTC (History)
15 users (show)

Fixed In Version: libvirt-0.9.4-0rc1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 11:15:19 UTC
Target Upstream Version:


Attachments (Terms of Use)
libvirtd.log (42.30 KB, application/binary)
2011-06-20 15:35 UTC, David Naori
no flags Details
vdsm log (52.11 KB, application/binary)
2011-06-20 15:38 UTC, David Naori
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1513 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-12-06 01:23:30 UTC

Description David Naori 2011-06-20 15:35:41 UTC
Created attachment 505649 [details]
libvirtd.log

Description of problem:

When using vdsm to hibernating (virDomainSave) 4 busy VMs  with 2GB of memory (80% usage) vdsm cant read from storage and fence it self.

vdsm should have the option to run virDomainSave with O_DIRECT flag to open destenation file in order to minimize cache effect.

Version-Release number of selected component (if applicable):
libvirt-0.9.1-1.el6.x86_64
vdsm-4.9-75.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.hibernate 4 vms with 2GB memory in 80% usage via vdsm.
  
Actual results:
vdsm fence it self

Expected results:

Additional info:
attached vdsm log and libvirtd log.

Comment 1 David Naori 2011-06-20 15:38:53 UTC
Created attachment 505651 [details]
vdsm log

Comment 3 Daniel Berrangé 2011-06-20 16:31:51 UTC
I'm not sure how practical it is to simply add O_DIRECT in libvirt, since O_DIRECT places various alignment restrictions on buffers used for I/O

[quote open(2)]
       The O_DIRECT flag may impose  alignment  restrictions  on  the
       length and address of userspace buffers and the file offset of
       I/Os.  In Linux alignment restrictions vary by file system and
       kernel version and might be absent entirely.  However there is
       currently no file system-independent interface for an applica‐
       tion  to  discover these restrictions for a given file or file
       system.  Some file systems provide their  own  interfaces  for
       doing   so,  for  example  the  XFS_IOC_DIOINFO  operation  in
       xfsctl(3).

       Under Linux 2.4, transfer sizes, and the alignment of the user
       buffer  and the file offset must all be multiples of the logi‐
       cal block size of the file system.  Under Linux 2.6, alignment
       to 512-byte boundaries suffices.
[/quote]

If there is a compression program configured, then I'm fairly sure that gzip, bzip, etc won't be expecting these alignment restrictions.

If there is no compression program, then QEMU will be writing directly, and again I'm fairly sure QEMU's migration code won't be expecting alignment restrictions.

So while O_DIRECT could be desirable, we shouldn't expect it to be a trivial addition.

Comment 5 Eric Blake 2011-06-22 13:56:59 UTC
Dan's point about the application doing the actual writing having to be aware of alignment constraints is indeed problematic.  Perhaps this bug should be cloned to qemu so that on direct writes, qemu is indeed honoring O_DIRECT constraints; but I don't think it is worth trying to force O_DIRECT semantics onto compression programs.  So for the compression case, we would probably have to rearrange things in libvirt so that if O_DIRECT is required, then the compression program writes to a pipe managed by libvirt, and libvirt then turns around and does the aligned writes, instead of the compression program directly writing to disk.  I suppose if we do that for the compression case that we can also do it for the direct case (qemu writes to a pipe instead of direct to disk).  At any rate, introducing the intermediate pipe so that libvirt honors O_DIRECT semantics will add additional pipe I/O, but should reduce file system cache pollution by minimizing disk-based I/O, so this might be possible.

Comment 7 Ayal Baron 2011-06-29 07:19:32 UTC
Raising severity and flagging blocker as this is critical to RHEVM.

Comment 8 Eric Blake 2011-07-12 17:30:24 UTC
See also bug 634653 for the restore direction.

Comment 9 Eric Blake 2011-07-26 12:56:57 UTC
Upstream work complete as of:

commit 28d182506acda4dda6aa610bc44abb7a16f36c55
Author: Eric Blake <eblake>
Date:   Thu Jul 14 17:22:53 2011 -0600

    save: support bypass-cache flag in libvirt-guests init script
    
    libvirt-guests is a perfect use case for bypassing the file system
    cache - lots of filesystem traffic done at system shutdown, where
    caching is pointless, and startup, where reading large files only
    once just gets in the way.  Make this a configurable option in the
    init script, but defaulting to existing behavior.
    
    * tools/libvirt-guests.sysconf (BYPASS_CACHE): New variable.
    * tools/libvirt-guests.init.sh (start, suspend_guest): Use it.

Comment 12 Gunannan Ren 2011-07-28 10:28:22 UTC
1, set BYPASS_CACHE=1 in /etc/sysconfig/libvirt-guests
2, /etc/init.d/libvirt-guests start|stop 
it works. 
Through the code inspection, the O_DIRECT will be on when the --bypass-cache is given.

For the bug,it is hard to reproduce on my local machine with vdsm installed.
David Naori will help with this.

Comment 14 Gunannan Ren 2011-07-28 12:18:57 UTC
test the effect of O_DIRECT flag

1, before virsh managedsave <guest>, flush and free caches 

# sync
# echo 3 > /proc/sys/vm/drop_caches 
# cat /proc/meminfo 
MemTotal:        3887796 kB
MemFree:         2793728 kB
Buffers:             536 kB
Cached:           153556 kB
...

2, execute "virsh managedsave <guest>" without --bypass-cache, check the value of Cached again, it becomes bigger than before.

cat /proc/meminfo 
MemTotal:        3887796 kB
MemFree:         2808660 kB
Buffers:            8448 kB
Cached:           461136 kB

3, clean cache and run virsh managedsave <guest> with --bypass-cache, check the value of Cache, it didn't almost change.

# cat /proc/meminfo 
MemTotal:        3887796 kB
MemFree:         3121988 kB
Buffers:            6392 kB
Cached:           163428 kB

Comment 15 Vivian Bian 2011-07-29 09:20:05 UTC
tested with 
libvirt-0.9.4-0rc1.1.el6.x86_64
vdsm-4.9-86.el6.x86_64
RHEVM 3.0.0.0

[root@rhevm-test vdsm]# echo 3 > /proc/sys/vm/drop_caches 
[root@rhevm-test vdsm]# head -n4 /proc/meminfo 
MemTotal:        8061616 kB
MemFree:         6529192 kB
Buffers:            1276 kB
Cached:            79276 kB

suspend guest via RHEVM

[root@rhevm-test vdsm]# head -n4 /proc/meminfo 
MemTotal:        8061616 kB
MemFree:         6527232 kB
Buffers:            5760 kB
Cached:            86280 kB


From the above we can see , virDomainSave open destination file with O_DIRECT flag, there will be no memory bandwidth spent in the copies between userspace memory and kernel cache

So set bug status to VERIFIED

Comment 16 errata-xmlrpc 2011-12-06 11:15:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html


Note You need to log in before you can comment on or make changes to this bug.