Bug 1404303 - RFE: virtio-blk/scsi polling mode (QEMU)
Summary: RFE: virtio-blk/scsi polling mode (QEMU)
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.0
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: rc
: 7.4
Assignee: Stefan Hajnoczi
QA Contact: CongLi
URL:
Whiteboard:
Keywords: FutureFeature
Depends On: 1425700
Blocks: 1395265 1404308 1404318 1404322
TreeView+ depends on / blocked
 
Reported: 2016-12-13 15:05 UTC by Ademar Reis
Modified: 2017-08-02 03:35 UTC (History)
19 users (show)

(edit)
Clone Of:
: 1404308 (view as bug list)
(edit)
Last Closed: 2017-08-01 23:39:45 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2392 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2017-08-01 20:04:36 UTC

Description Ademar Reis 2016-12-13 15:05:22 UTC
From the patch over letter:

Recent performance investigation work done by Karl Rister shows that the
guest->host notification takes around 20 us.  This is more than the "overhead"
of QEMU itself (e.g. block layer).

One way to avoid the costly exit is to use polling instead of notification.
The main drawback of polling is that it consumes CPU resources.  In order to
benefit performance the host must have extra CPU cycles available on physical
CPUs that aren't used by the guest.

This is an experimental AioContext polling implementation.  It adds a polling
callback into the event loop.  Polling functions are implemented for virtio-blk
virtqueue guest->host kick and Linux AIO completion.

The -object iothread,poll-max-ns=NUM parameter sets the number of nanoseconds
to poll before entering the usual blocking poll(2) syscall.  Try setting this
parameter to the time from old request completion to new virtqueue kick.  By
default no polling is done so you must set this parameter to get busy polling.

Current patch series (v4):
https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00148.html

Comment 2 Stefan Hajnoczi 2017-01-16 14:39:17 UTC
AioContext polling was merged in commit e92fbc753df4fab9ee524b5ea07a51bee8b6bae4 on January 5th 2017.  It will be included in the QEMU 2.9 release.

Comment 3 Ademar Reis 2017-02-03 17:37:10 UTC
And it's now even enabled by default upstream:

commit cdd7abfdba9287a289c404dfdcb02316f9ffee7d
Author: Stefan Hajnoczi <stefanha@redhat.com>
Date:   Thu Jan 26 17:01:19 2017 +0000

    iothread: enable AioContext polling by default


I'm moving the BZ back to assigned because Stefan will backport all these patches to our current qemu-kvm-rhev package (QEMU-2.8) for early testing by QE. This BZ will be used to track it.

Comment 4 Stefan Hajnoczi 2017-02-06 14:18:04 UTC
I have posted a backport for RHEL 7.4 qemu-kvm-rhev.

Comment 5 Miroslav Rezanina 2017-02-10 13:56:44 UTC
Fix included in qemu-kvm-rhev-2.8.0-4.el7

Comment 7 Stefan Hajnoczi 2017-02-17 10:58:27 UTC
Questions from Cong Li via email:

> 1. What's the reasonable value of poll-max-ns ?

qemu-kvm-rhev uses a default value of 32 microseconds.  Benchmarks showed that 16 or 32 microseconds produce good results overall.

I think any power of 2 from 4 to 64 microseconds could be interesting.  The ioeventfd latency that AioContext polling solves is around 20-30 microseconds, so it's unlikely that much higher numbers will improve performance.

Most users should not need to set the poll-max-ns parameter.

> 2. How to check value poll-max-ns ?
>    If set poll-max-ns = 0, how to check polling mode is disabled?

Trace the poll_grow and poll_shrink trace events.  They are only emitted when polling is enabled.

> 3. Is there any tool to check the performance improvement?
>    (I'm not from the performance team, I will ask them for help
>    if necessary)

Yes, fio is the standard benchmarking tool.  It generates disk I/O and reports the performance results.

The AioContext polling latency improvement is most significant with a fast disk like an NVMe SSD drive.  Request latency is reduced.

A single thread of random 4 KB reads is expected to perform better with polling enabled than with polling disabled:

AioContext polling enabled (it's enabled by default):
-drive if=none,id=drive0,file=test.img,format=raw,cache=none,aio=native -object iothread,id=iothread0 -device virtio-blk-pci,iothread=iothread0,drive=drive0

AioContext polling disabled:
-drive if=none,id=drive0,file=test.img,format=raw,cache=none,aio=native -object iothread,id=iothread0,poll-max-ns=0 -device virtio-blk-pci,iothread=iothread0,drive=drive0

> 4. For events poll_grow and poll_shrink, how to check them?

poll_grow/poll_shrink events tell you that polling is enabled and the self-tuning algorithm is adjusting the polling interval.

I don't think it's necessary to verify these values.  If the algorithm does something wrong then that would be apparent from the performance results.

Comment 8 Quan Wenli 2017-02-20 03:10:52 UTC
Hi, yama,please help check the performance part. Thanks 

> 
> > 3. Is there any tool to check the performance improvement?
> >    (I'm not from the performance team, I will ask them for help
> >    if necessary)
> 
> Yes, fio is the standard benchmarking tool.  It generates disk I/O and
> reports the performance results.
> 
> The AioContext polling latency improvement is most significant with a fast
> disk like an NVMe SSD drive.  Request latency is reduced.
> 
> A single thread of random 4 KB reads is expected to perform better with
> polling enabled than with polling disabled:
> 
> AioContext polling enabled (it's enabled by default):
> -drive if=none,id=drive0,file=test.img,format=raw,cache=none,aio=native
> -object iothread,id=iothread0 -device
> virtio-blk-pci,iothread=iothread0,drive=drive0
> 
> AioContext polling disabled:
> -drive if=none,id=drive0,file=test.img,format=raw,cache=none,aio=native
> -object iothread,id=iothread0,poll-max-ns=0 -device
> virtio-blk-pci,iothread=iothread0,drive=drive0
>

Comment 9 Quan Wenli 2017-02-20 06:13:00 UTC
(In reply to Quan Wenli from comment #8)
> Hi, yama,please help check the performance part. Thanks 
> 
> > 
> > > 3. Is there any tool to check the performance improvement?
> > >    (I'm not from the performance team, I will ask them for help
> > >    if necessary)
> > 
> > Yes, fio is the standard benchmarking tool.  It generates disk I/O and
> > reports the performance results.
> > 
> > The AioContext polling latency improvement is most significant with a fast
> > disk like an NVMe SSD drive.  Request latency is reduced.
> > 
> > A single thread of random 4 KB reads is expected to perform better with
> > polling enabled than with polling disabled:
> > 
> > AioContext polling enabled (it's enabled by default):
> > -drive if=none,id=drive0,file=test.img,format=raw,cache=none,aio=native
> > -object iothread,id=iothread0 -device
> > virtio-blk-pci,iothread=iothread0,drive=drive0
> > 
> > AioContext polling disabled:
> > -drive if=none,id=drive0,file=test.img,format=raw,cache=none,aio=native
> > -object iothread,id=iothread0,poll-max-ns=0 -device
> > virtio-blk-pci,iothread=iothread0,drive=drive0
> >

yama, two things you need to tests:

 1. confirm that there is no regression between qemu-kvm-rhev-2.8.0-3.el7 and qemu-kvm-rhev-2.8.0-4.el7
 2. performance improvement with AioContext polling enabled within qemu-kvm-rhev-2.8.0-4.el7

Comment 10 Fam Zheng 2017-02-22 07:33:26 UTC
We need one extra patch to fix virtio-scsi a CPU usage regression, but the issue doesn't affect virtio-blk (what's explained by Stefan in comment 7), AFAICT.

Comment 11 Yanhui Ma 2017-03-02 03:20:05 UTC
performance comparision between AioContext polling enabled(poll-max-ns uses the default value) and AioContext polling disabled(poll-max-ns=0) on qemu-kvm-rhev-2.8.0-4.el7.x86_64.

For localfs and fusion-io backends, the results are almost the same between AioContext polling enabled and AioContext polling disabled. And our NVMe ssd is in purchase process, once it is ready(April or May), I will test performance with it.


localfs:
raw+blk
http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/raw.virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
raw+scsi
http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/raw.virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
qcow+blk
http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/qcow2.virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
qcow2+scsi
http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/qcow2.virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
fusion-io:
raw+blk
http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/fusion-io/raw.virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
raw+scsi
http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/fusion-io/raw.virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html

Comment 12 Stefan Hajnoczi 2017-03-16 08:18:05 UTC
(In reply to Yanhui Ma from comment #11)
> performance comparision between AioContext polling enabled(poll-max-ns uses
> the default value) and AioContext polling disabled(poll-max-ns=0) on
> qemu-kvm-rhev-2.8.0-4.el7.x86_64.
> 
> For localfs and fusion-io backends, the results are almost the same between
> AioContext polling enabled and AioContext polling disabled. And our NVMe ssd
> is in purchase process, once it is ready(April or May), I will test
> performance with it.
> 
> 
> localfs:
> raw+blk
> http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/raw.
> virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> raw+scsi
> http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/raw.
> virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> qcow+blk
> http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/
> qcow2.virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> qcow2+scsi
> http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/
> qcow2.virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> fusion-io:
> raw+blk
> http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/fusion-io/
> raw.virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html

When I look at the results it appears the "enabled" and "disabled" links are swapped.  "enabled" is running with poll-max-ns=0.  "disabled" is running without a poll-max-ns option (so it will be enabled by default).  Is my interpretation correct?

Does this mean that the data is swapped and the improvement/regression reported in the results are actually the opposite?

Another question about the benchmarks: Is there localfs raw aio=native data?  The localfs raw results I looked at use aio=threads.  That is useful but it would be nice to collect aio=native results too.

Thanks,
Stefan

Comment 13 Yanhui Ma 2017-03-16 09:05:11 UTC
(In reply to Stefan Hajnoczi from comment #12)
> (In reply to Yanhui Ma from comment #11)
> > performance comparision between AioContext polling enabled(poll-max-ns uses
> > the default value) and AioContext polling disabled(poll-max-ns=0) on
> > qemu-kvm-rhev-2.8.0-4.el7.x86_64.
> > 
> > For localfs and fusion-io backends, the results are almost the same between
> > AioContext polling enabled and AioContext polling disabled. And our NVMe ssd
> > is in purchase process, once it is ready(April or May), I will test
> > performance with it.
> > 
> > 
> > localfs:
> > raw+blk
> > http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/raw.
> > virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> > raw+scsi
> > http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/raw.
> > virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> > qcow+blk
> > http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/
> > qcow2.virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> > qcow2+scsi
> > http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/
> > qcow2.virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> > fusion-io:
> > raw+blk
> > http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/fusion-io/
> > raw.virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> 
> When I look at the results it appears the "enabled" and "disabled" links are
> swapped.  "enabled" is running with poll-max-ns=0.  "disabled" is running
> without a poll-max-ns option (so it will be enabled by default).  Is my
> interpretation correct?
> 

Yes, you are right. I am sorry for writing wrong link names. I have corrected them. 
  
> Does this mean that the data is swapped and the improvement/regression
> reported in the results are actually the opposite?
> 

The first group data are from disabled polling mode, and second group data are from enabled by default.

> Another question about the benchmarks: Is there localfs raw aio=native data?
> The localfs raw results I looked at use aio=threads.  That is useful but it
> would be nice to collect aio=native results too.
> 
There is not localfs raw aio=native data now, but I will collect it and update it asap.
> Thanks,
> Stefan

Comment 14 Yanhui Ma 2017-03-21 02:28:36 UTC
(In reply to Yanhui Ma from comment #13)

> > Another question about the benchmarks: Is there localfs raw aio=native data?
> > The localfs raw results I looked at use aio=threads.  That is useful but it
> > would be nice to collect aio=native results too.
> > 
> There is not localfs raw aio=native data now, but I will collect it and
> update it asap.
> > Thanks,
> > Stefan

Here are localfs raw aio=native data:
raw+blk:
http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/raw+native/raw.virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
raw+scsi:
http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/raw+native/raw.virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html

There is no obvious difference between diable and enable.

Comment 15 Stefan Hajnoczi 2017-04-04 13:39:21 UTC
Thanks!  I have requested that perf team also tries the qemu-kvm-rhev-2.8.0-4.el7 RPM.  This will show whether there is an issue with the RPM or the performance difference is due to the host hardware.

Comment 17 Stefan Hajnoczi 2017-05-03 15:22:11 UTC
I have yet to see performance numbers from the Perf Team.  I'll update the BZ when I have more information.

Comment 19 Yanhui Ma 2017-05-11 02:20:27 UTC
(In reply to Yanhui Ma from comment #11)
> performance comparision between AioContext polling enabled(poll-max-ns uses
> the default value) and AioContext polling disabled(poll-max-ns=0) on
> qemu-kvm-rhev-2.8.0-4.el7.x86_64.
> 
> For localfs and fusion-io backends, the results are almost the same between
> AioContext polling enabled and AioContext polling disabled. And our NVMe ssd
> is in purchase process, once it is ready(April or May), I will test
> performance with it.
> 

add NVMe backend results:
raw+blk
http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/NVMe/raw.virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
raw+scsi
http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/NVMe/raw.virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html

For NVMe, the results are almost the same between AioContext polling enabled and AioContext polling disabled

> 
> localfs:
> raw+blk
> http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/raw.
> virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> raw+scsi
> http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/raw.
> virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> qcow+blk
> http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/
> qcow2.virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> qcow2+scsi
> http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/localfs/
> qcow2.virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> fusion-io:
> raw+blk
> http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/fusion-io/
> raw.virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> raw+scsi
> http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/fusion-io/
> raw.virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html

Comment 20 Fam Zheng 2017-05-11 03:18:03 UTC
Hi Yanhui, I notice you always use threads=16. Can you do a threads=1 test?

Comment 21 Yanhui Ma 2017-05-11 04:49:29 UTC
(In reply to Fam Zheng from comment #20)
> Hi Yanhui, I notice you always use threads=16. Can you do a threads=1 test?

ok, will have a try.

Comment 22 Stefan Hajnoczi 2017-05-16 12:54:26 UTC
Karl Rister from the performance team collected the following results on NVMe:

http://pbench.perf.lab.eng.bos.redhat.com/results/dhcp31-124/fio_nvme0n1-nvme1n1-1-job__iodepth-1__virtio-blk-dp__qemu-kvm-rhev-2.8.0-3.el7_2017-05-03_14:50:21/summary-result.html

http://pbench.perf.lab.eng.bos.redhat.com/results/dhcp31-124/fio_nvme0n1-nvme1n1-1-job__iodepth-1__virtio-blk-dp__qemu-kvm-rhev-2.8.0-4.el7_2017-05-04_19:37:43/summary-result.html

It shows a noticable improvement with smaller block sizes (4-32 KB).

I wanted to share the results immediately.  Will discuss more with Karl and compare against Yanhui's results.

Comment 23 Yanhui Ma 2017-05-18 07:50:04 UTC
(In reply to Fam Zheng from comment #20)
> Hi Yanhui, I notice you always use threads=16. Can you do a threads=1 test?

Here are NVMe resutls with threads=1:
raw+blk
http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/NVMe/1threads/repeat/raw.virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
raw+scsi
http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/NVMe/1threads/raw.virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html

Results are still almost the same between AioContext polling enabled and AioContext polling disabled.

Comment 24 Yanhui Ma 2017-05-18 08:04:00 UTC
(In reply to Stefan Hajnoczi from comment #22)
> Karl Rister from the performance team collected the following results on
> NVMe:
> 
> http://pbench.perf.lab.eng.bos.redhat.com/results/dhcp31-124/fio_nvme0n1-
> nvme1n1-1-job__iodepth-1__virtio-blk-dp__qemu-kvm-rhev-2.8.0-3.el7_2017-05-
> 03_14:50:21/summary-result.html
> 
> http://pbench.perf.lab.eng.bos.redhat.com/results/dhcp31-124/fio_nvme0n1-
> nvme1n1-1-job__iodepth-1__virtio-blk-dp__qemu-kvm-rhev-2.8.0-4.el7_2017-05-
> 04_19:37:43/summary-result.html
> 
> It shows a noticable improvement with smaller block sizes (4-32 KB).
> 
> I wanted to share the results immediately.  Will discuss more with Karl and
> compare against Yanhui's results.

Oh, it indeed shows improvement.
Is it comparision between qemu-kvm-rhev-2.8.0-3.el7 and qemu-kvm-rhev-2.8.0-4.el7? What's their qemu cmd line and test steps?

Our resluts are just comparision between AioContext polling enabled(poll-max-ns uses the default value) and AioContext polling disabled(poll-max-ns=0) on the same qemu-kvm-rhev-2.8.0-4.el7.x86_64 or later version.

Comment 25 Stefan Hajnoczi 2017-05-22 15:24:35 UTC
(In reply to Yanhui Ma from comment #24)
> (In reply to Stefan Hajnoczi from comment #22)
> > Karl Rister from the performance team collected the following results on
> > NVMe:
> > 
> > http://pbench.perf.lab.eng.bos.redhat.com/results/dhcp31-124/fio_nvme0n1-
> > nvme1n1-1-job__iodepth-1__virtio-blk-dp__qemu-kvm-rhev-2.8.0-3.el7_2017-05-
> > 03_14:50:21/summary-result.html
> > 
> > http://pbench.perf.lab.eng.bos.redhat.com/results/dhcp31-124/fio_nvme0n1-
> > nvme1n1-1-job__iodepth-1__virtio-blk-dp__qemu-kvm-rhev-2.8.0-4.el7_2017-05-
> > 04_19:37:43/summary-result.html
> > 
> > It shows a noticable improvement with smaller block sizes (4-32 KB).
> > 
> > I wanted to share the results immediately.  Will discuss more with Karl and
> > compare against Yanhui's results.
> 
> Oh, it indeed shows improvement.
> Is it comparision between qemu-kvm-rhev-2.8.0-3.el7 and
> qemu-kvm-rhev-2.8.0-4.el7? What's their qemu cmd line and test steps?
> 
> Our resluts are just comparision between AioContext polling
> enabled(poll-max-ns uses the default value) and AioContext polling
> disabled(poll-max-ns=0) on the same qemu-kvm-rhev-2.8.0-4.el7.x86_64 or
> later version.

Interesting.  There is a difference between qemu-kvm-rhev-2.8.0-4 poll-max-ns=0 and qemu-kvm-rhev-2.8.0-3.  In -4 poll-max-ns=0 we always poll at least once because it's so cheap and may allow us to avoid system calls.  In -3 we do not poll at all.

Please try qemu-kvm-rhev-2.8.0-3 so we can compare -4 with polling, -4 poll-max-ns=0, and -3.  Thanks!

Comment 26 Yanhui Ma 2017-05-23 02:04:39 UTC
(In reply to Stefan Hajnoczi from comment #25)
> (In reply to Yanhui Ma from comment #24)
> > (In reply to Stefan Hajnoczi from comment #22)
> > > Karl Rister from the performance team collected the following results on
> > > NVMe:
> > > 
> > > http://pbench.perf.lab.eng.bos.redhat.com/results/dhcp31-124/fio_nvme0n1-
> > > nvme1n1-1-job__iodepth-1__virtio-blk-dp__qemu-kvm-rhev-2.8.0-3.el7_2017-05-
> > > 03_14:50:21/summary-result.html
> > > 
> > > http://pbench.perf.lab.eng.bos.redhat.com/results/dhcp31-124/fio_nvme0n1-
> > > nvme1n1-1-job__iodepth-1__virtio-blk-dp__qemu-kvm-rhev-2.8.0-4.el7_2017-05-
> > > 04_19:37:43/summary-result.html
> > > 
> > > It shows a noticable improvement with smaller block sizes (4-32 KB).
> > > 
> > > I wanted to share the results immediately.  Will discuss more with Karl and
> > > compare against Yanhui's results.
> > 
> > Oh, it indeed shows improvement.
> > Is it comparision between qemu-kvm-rhev-2.8.0-3.el7 and
> > qemu-kvm-rhev-2.8.0-4.el7? What's their qemu cmd line and test steps?
> > 
> > Our resluts are just comparision between AioContext polling
> > enabled(poll-max-ns uses the default value) and AioContext polling
> > disabled(poll-max-ns=0) on the same qemu-kvm-rhev-2.8.0-4.el7.x86_64 or
> > later version.
> 
> Interesting.  There is a difference between qemu-kvm-rhev-2.8.0-4
> poll-max-ns=0 and qemu-kvm-rhev-2.8.0-3.  In -4 poll-max-ns=0 we always poll
> at least once because it's so cheap and may allow us to avoid system calls. 
> In -3 we do not poll at all.
> 
> Please try qemu-kvm-rhev-2.8.0-3 so we can compare -4 with polling, -4
> poll-max-ns=0, and -3.  Thanks!
ok, will have a try.

Comment 29 CongLi 2017-05-24 15:13:46 UTC
Thanks Ademar.

Based on comment 16 and recent data plane functional testing, polling mode feature (disable and enable) works well, then set this bug to 'VERIFIED'.


Thanks.

Comment 31 Yanhui Ma 2017-06-05 05:12:29 UTC
(In reply to Stefan Hajnoczi from comment #25)
> (In reply to Yanhui Ma from comment #24)
> > (In reply to Stefan Hajnoczi from comment #22)
> > > Karl Rister from the performance team collected the following results on
> > > NVMe:
> > > 
> > > http://pbench.perf.lab.eng.bos.redhat.com/results/dhcp31-124/fio_nvme0n1-
> > > nvme1n1-1-job__iodepth-1__virtio-blk-dp__qemu-kvm-rhev-2.8.0-3.el7_2017-05-
> > > 03_14:50:21/summary-result.html
> > > 
> > > http://pbench.perf.lab.eng.bos.redhat.com/results/dhcp31-124/fio_nvme0n1-
> > > nvme1n1-1-job__iodepth-1__virtio-blk-dp__qemu-kvm-rhev-2.8.0-4.el7_2017-05-
> > > 04_19:37:43/summary-result.html
> > > 
> > > It shows a noticable improvement with smaller block sizes (4-32 KB).
> > > 
> > > I wanted to share the results immediately.  Will discuss more with Karl and
> > > compare against Yanhui's results.
> > 
> > Oh, it indeed shows improvement.
> > Is it comparision between qemu-kvm-rhev-2.8.0-3.el7 and
> > qemu-kvm-rhev-2.8.0-4.el7? What's their qemu cmd line and test steps?
> > 
> > Our resluts are just comparision between AioContext polling
> > enabled(poll-max-ns uses the default value) and AioContext polling
> > disabled(poll-max-ns=0) on the same qemu-kvm-rhev-2.8.0-4.el7.x86_64 or
> > later version.
> 
> Interesting.  There is a difference between qemu-kvm-rhev-2.8.0-4
> poll-max-ns=0 and qemu-kvm-rhev-2.8.0-3.  In -4 poll-max-ns=0 we always poll
> at least once because it's so cheap and may allow us to avoid system calls. 
> In -3 we do not poll at all.
> 
> Please try qemu-kvm-rhev-2.8.0-3 so we can compare -4 with polling, -4
> poll-max-ns=0, and -3.  Thanks!

Here are comparision between qemu-kvm-rhev-2.8.0-3 and -4:
raw+blk:
http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/NVMe/qemu2.8.0-3and2.8.0-4/raw.virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
raw+scsi:
http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/NVMe/qemu2.8.0-3and2.8.0-4/raw.virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html

qemu-kvm-rhel-2.8.0-3:
-device virtio-scsi-pci,id=virtio_scsi_pci0,iothread=iothread1,addr=0x4 \
-object iothread,id=iothread1 \
-drive file='/dev/nvme0n1',if=none,id=virtio-scsi2-id0,media=disk,cache=none,snapshot=off,format=raw,aio=native \

-4:
-device virtio-scsi-pci,id=virtio_scsi_pci0,iothread=iothread1,addr=0x4 \
-object iothread,id=iothread1 \
-drive file='/dev/nvme0n1',if=none,id=virtio-scsi2-id0,media=disk,cache=none,snapshot=off,format=raw,aio=native \

The results are almost the same between qemu2.8.0-3 and -4 with default polling. And previous results in comment 19 show the same performance between default polling enable and disable.

Comment 32 Stefan Hajnoczi 2017-06-06 15:07:49 UTC
(In reply to Yanhui Ma from comment #31)
> (In reply to Stefan Hajnoczi from comment #25)
> > (In reply to Yanhui Ma from comment #24)
> > > (In reply to Stefan Hajnoczi from comment #22)
> > > > Karl Rister from the performance team collected the following results on
> > > > NVMe:
> > > > 
> > > > http://pbench.perf.lab.eng.bos.redhat.com/results/dhcp31-124/fio_nvme0n1-
> > > > nvme1n1-1-job__iodepth-1__virtio-blk-dp__qemu-kvm-rhev-2.8.0-3.el7_2017-05-
> > > > 03_14:50:21/summary-result.html
> > > > 
> > > > http://pbench.perf.lab.eng.bos.redhat.com/results/dhcp31-124/fio_nvme0n1-
> > > > nvme1n1-1-job__iodepth-1__virtio-blk-dp__qemu-kvm-rhev-2.8.0-4.el7_2017-05-
> > > > 04_19:37:43/summary-result.html
> > > > 
> > > > It shows a noticable improvement with smaller block sizes (4-32 KB).
> > > > 
> > > > I wanted to share the results immediately.  Will discuss more with Karl and
> > > > compare against Yanhui's results.
> > > 
> > > Oh, it indeed shows improvement.
> > > Is it comparision between qemu-kvm-rhev-2.8.0-3.el7 and
> > > qemu-kvm-rhev-2.8.0-4.el7? What's their qemu cmd line and test steps?
> > > 
> > > Our resluts are just comparision between AioContext polling
> > > enabled(poll-max-ns uses the default value) and AioContext polling
> > > disabled(poll-max-ns=0) on the same qemu-kvm-rhev-2.8.0-4.el7.x86_64 or
> > > later version.
> > 
> > Interesting.  There is a difference between qemu-kvm-rhev-2.8.0-4
> > poll-max-ns=0 and qemu-kvm-rhev-2.8.0-3.  In -4 poll-max-ns=0 we always poll
> > at least once because it's so cheap and may allow us to avoid system calls. 
> > In -3 we do not poll at all.
> > 
> > Please try qemu-kvm-rhev-2.8.0-3 so we can compare -4 with polling, -4
> > poll-max-ns=0, and -3.  Thanks!
> 
> Here are comparision between qemu-kvm-rhev-2.8.0-3 and -4:
> raw+blk:
> http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/NVMe/qemu2.
> 8.0-3and2.8.0-4/raw.virtio_blk.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> raw+scsi:
> http://kvm-perf.englab.nay.redhat.com/results/request/bug1404303/NVMe/qemu2.
> 8.0-3and2.8.0-4/raw.virtio_scsi.smp2.virtio_net.RHEL.*.*.x86_64.fio.html
> 
> qemu-kvm-rhel-2.8.0-3:
> -device virtio-scsi-pci,id=virtio_scsi_pci0,iothread=iothread1,addr=0x4 \
> -object iothread,id=iothread1 \
> -drive
> file='/dev/nvme0n1',if=none,id=virtio-scsi2-id0,media=disk,cache=none,
> snapshot=off,format=raw,aio=native \
> 
> -4:
> -device virtio-scsi-pci,id=virtio_scsi_pci0,iothread=iothread1,addr=0x4 \
> -object iothread,id=iothread1 \
> -drive
> file='/dev/nvme0n1',if=none,id=virtio-scsi2-id0,media=disk,cache=none,
> snapshot=off,format=raw,aio=native \
> 
> The results are almost the same between qemu2.8.0-3 and -4 with default
> polling. And previous results in comment 19 show the same performance
> between default polling enable and disable.

Thanks for the info.  I think this hardware doesn't benefit.

Comment 34 errata-xmlrpc 2017-08-01 23:39:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 35 errata-xmlrpc 2017-08-02 01:17:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 36 errata-xmlrpc 2017-08-02 02:09:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 37 errata-xmlrpc 2017-08-02 02:50:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 38 errata-xmlrpc 2017-08-02 03:14:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 39 errata-xmlrpc 2017-08-02 03:35:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392


Note You need to log in before you can comment on or make changes to this bug.