Bug 1305886 (RHEV_IO_NATIVE_EVERYWHERE)

Summary: [RFE] Move to IO=native everywhere (AIO is processing only 1 request, even if >1 requests in virtio - RHEV-M)
Product: Red Hat Enterprise Virtualization Manager Reporter: Karen Noel <knoel>
Component: RFEsAssignee: Adam Litke <alitke>
Status: CLOSED NOTABUG QA Contact: Raz Tamir <ratamir>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0.0CC: amureini, areis, atheurer, gklein, huding, jen, jherrman, juzhang, knoel, lsurette, mst, nsoffer, psuriset, rbalakri, srevivo, stefanha, tnisan, virt-bugs, virt-maint, xfu, ykaul, ylavi
Target Milestone: ovirt-4.1.0-alphaKeywords: FutureFeature, Performance
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When QEMU was configured with the aio=native parameter, KVM virtual machines were slowed significantly. With this update, asynchronous I/O (AIO) can correctly process more than one request at a time, and using aio=native no longer has a negative impact on guest performance.
Story Points: ---
Clone Of: 1243548 Environment:
Last Closed: 2016-12-06 07:57:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1243548    
Bug Blocks:    

Description Karen Noel 2016-02-09 13:55:49 UTC
+++ This bug was initially created as a clone of Bug #1243548 +++

Description of problem:

While running I/O workloads on RHEL 7.1 VM on RHEL 7.1 host using qemu-kvm-rhev: 
We realised aio=native is 50% lesser than aio=threads.   There are several requests to process. But AIO is proceesing one at a time.

Systemtap logs to monitor 
virtio_queue_notify: ts=1436461565480052,vdev=140628771509224,n=0,vq=140628770978384
virtio_blk_handle_read: ts=1436461565480071,req=140628771310288,sector=0,nsectors=8
virtio_blk_handle_read: ts=1436461565480134,req=140628771703808,sector=8,nsectors=8
virtio_blk_handle_read: ts=1436461565480150,req=140628771788928,sector=16,nsectors=8
virtio_blk_handle_read: ts=1436461565480165,req=140628771838192,sector=24,nsectors=8
virtio_blk_handle_read: ts=1436461565480179,req=140628818049344,sector=32,nsectors=8
virtio_blk_handle_read: ts=1436461565480193,req=140628818098608,sector=40,nsectors=8
virtio_blk_handle_read: ts=1436461565480207,req=140628818147872,sector=48,nsectors=8
virtio_blk_handle_read: ts=1436461565480221,req=140628818197136,sector=56,nsectors=8
virtio_blk_handle_read: ts=1436461565480271,req=140628818246400,sector=64,nsectors=8


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.  Start RHEL7.1 VM on RHEL7.1 host with qemu-kvm-rhev
2.  Trigger any IO workload on VM start reading from a disk in chunks. For ex: 4K sequential read. 
3.  Monitor systemtap traces for io_submit

Actual results:

AIO  processing only one request

Expected results:


AIO should batch IO reqeusts/
Additional info:

--- Additional comment from Pradeep Kumar Surisetty on 2015-11-11 23:20:11 EST ---

stefan has provided upstream fix for this. 

Commit id:  fc73548e444ae3239f6cef44a5200b5d2c3e85d1

    The raw-posix block driver implements Linux AIO batching so multiple
    requests can be submitted with a single io_submit(2) system call.
    Batching is currently only used by virtio-scsi and
    virtio-blk-data-plane.
    
    Enable batching for regular virtio-blk so the number of io_submit(2)
    system calls is reduced for workloads with queue depth > 1.

--- Additional comment from Pradeep Kumar Surisetty on 2015-11-22 22:16:49 EST ---

But if we go with aio=threads, user would have to take huge performance impact especially xfs

With files too, i see native performing better. Especially xfs. 

Multi VM: 

http://psuriset.github.io/pbench-graphs/multi_vm_xfs_ssd_native_vs_threads_raw_sync_iodepth_1_jobs_32.html

Single VM: 

http://psuriset.github.io/pbench-graphs/single_vm_xfs_ssd_native_vs_threads_raw_sync_iodepth_1_jobs_32.html

multi vm/qcow2:

http://psuriset.github.io/pbench-graphs/multi_vm_xfs_ssd_native_vs_threads_qcow2_sync_iodepth_1_jobs_32.html

single vm/qcow2

http://psuriset.github.io/pbench-graphs/single_vm_xfs_ssd_native_vs_threads_qcow2_sync_iodepth_1_jobs_32.html

--- Additional comment from Stefan Hajnoczi on 2015-11-26 00:15:28 EST ---

(In reply to Pradeep Kumar Surisetty from comment #7)
> But if we go with aio=threads, user would have to take huge performance
> impact especially xfs
> 
> 
> With files too, i see native performing better. Especially xfs. 
> 
> Multi VM: 
> 
> http://psuriset.github.io/pbench-graphs/
> multi_vm_xfs_ssd_native_vs_threads_raw_sync_iodepth_1_jobs_32.html

ext4 behaves completely differently.  Have you filed a bug against XFS?

Comment 1 Allon Mureinik 2016-12-01 16:10:57 UTC
After discussing with Yaniv K - we should just stop configuring this, and trust libvirt/qemu to have sane defaults.

Comment 2 Nir Soffer 2016-12-01 23:21:53 UTC
Karen, RHV is using this logic when creating libvirt xml:

For block device:

    <driver cache="none" error_policy="stop"
            io="native" name="qemu" type="raw"/>

For files:

    <driver cache="none" error_policy="stop"
            io="threads" name="qemu" type="raw"/>

What is the recommended configuration? or should we let libvirt decide?

Comment 4 Stefan Hajnoczi 2016-12-02 11:32:03 UTC
For block devices io="native" is consistently the best choice.

For files the results are mixed.  QEMU and libvirt leave the choice up to the user.  They do not automatically pick the best option (because it's not possible to know the answer in general).

io="native" tends to perform well on local files although the results are not always consistent.  On remote file systems like NFS io="threads" has been the recommendation.

Comment 7 Nir Soffer 2016-12-02 15:46:38 UTC
(In reply to Stefan Hajnoczi from comment #4)
> For block devices io="native" is consistently the best choice.
> 
> For files the results are mixed.  QEMU and libvirt leave the choice up to
> the user.  They do not automatically pick the best option (because it's not
> possible to know the answer in general).

So what is the result of not specifying the io attribute?

    <driver cache="none" error_policy="stop"
            name="qemu" type="raw"/>

Does it use always "native", or the behavior can changed based on 
other conditions?

> io="native" tends to perform well on local files although the results are
> not always consistent.  

We don't use normally local files, although we can optimize the local file
case to use io="threads".

> On remote file systems like NFS io="threads" has
> been the recommendation.

This is the common case when using file based storage.

Comment 8 Stefan Hajnoczi 2016-12-05 14:16:17 UTC
(In reply to Nir Soffer from comment #7)
> (In reply to Stefan Hajnoczi from comment #4)
> > For block devices io="native" is consistently the best choice.
> > 
> > For files the results are mixed.  QEMU and libvirt leave the choice up to
> > the user.  They do not automatically pick the best option (because it's not
> > possible to know the answer in general).
> 
> So what is the result of not specifying the io attribute?
> 
>     <driver cache="none" error_policy="stop"
>             name="qemu" type="raw"/>
> 
> Does it use always "native", or the behavior can changed based on 
> other conditions?

When <driver io=> is omitted QEMU always defaults to aio=threads.

Comment 9 Allon Mureinik 2016-12-06 04:31:26 UTC
Yaniv, according to the discussion here, it seems the premise of this BZ is wrong. Should we close it?