Bug 1265927 - commit "virtio-blk: introduce multiread" introduces the performance regression of 4k block size write in nfs backend
commit "virtio-blk: introduce multiread" introduces the performance regressio...
Status: ASSIGNED
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.2
x86_64 Linux
high Severity high
: rc
: 7.4
Assigned To: Kevin Wolf
Yanhui Ma
: Regression
Depends On:
Blocks: 1401400 1469590 1473046 1288337
  Show dependency treegraph
 
Reported: 2015-09-24 03:39 EDT by Yanhui Ma
Modified: 2017-07-28 09:16 EDT (History)
22 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Yanhui Ma 2015-09-24 03:39:28 EDT
Description of problem:

In nfs backend block performace test, there is ~26% throughput regression with commit "virtio-blk: introduce multiread" in 4k block size write scenario.

Version-Release number of selected component (if applicable):


How reproducible:

100%

Steps to Reproduce:
1.boot up guest with 1 data disk on nfs backend:

numactl \
    -m 1 /home/qemu-kvm/x86_64-softmmu/qemu-system-x86_64  \
    -S \
    -name 'virt-tests-vm1' \
    -nodefaults \
    -chardev socket,id=hmp_id_humanmonitor1,path=/tmp/monitor-humanmonitor1-20150924-013103-vH7oCHFs,server,nowait \
    -mon chardev=hmp_id_humanmonitor1,mode=readline \
    -chardev socket,id=serial_id_serial1,path=/tmp/serial-serial1-20150924-013103-vH7oCHFs,server,nowait \
    -device isa-serial,chardev=serial_id_serial1 \
    -chardev socket,id=seabioslog_id_20150924-013103-vH7oCHFs,path=/tmp/seabios-20150924-013103-vH7oCHFs,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20150924-013103-vH7oCHFs,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x3 \
    -drive file='/usr/local/autotest/tests/virt/shared/data/images/RHEL-Server-7.2-64.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,snapshot=off,format=qcow2,aio=threads \
    -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,bootindex=0 \
    -drive file='/usr/local/autotest/tests/virt/shared/data/images/storage2.qcow2',index=2,if=none,id=drive-virtio-disk2,media=disk,cache=none,snapshot=off,format=qcow2,aio=threads \
    -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk2,bootindex=1 \
    -device virtio-net-pci,netdev=id6XIuwv,mac='9a:37:37:37:37:9e',bus=pci.0,addr=0x6,id='idem9I0u' \
    -netdev tap,id=id6XIuwv,vhost=on,vhostfd=27,fd=26 \
    -m 4096 \
    -smp 2,cores=1,threads=1,sockets=2 \
    -cpu 'Westmere' \
    -M pc \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :0 \
    -vga cirrus \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off   \
    -device sga \
    -enable-kvm
2. in host:
#numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 8175 MB
node 0 free: 7645 MB
node 1 cpus: 4 5 6 7
node 1 size: 8192 MB
node 1 free: 7503 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

pin vhost thread to cpu(4)
pin two vcpu threads to cpu(5) and cpu(6) respectively

3. in guest:

#mkfs.xfs /dev/vdb
#mount /dev/vdb /mnt
#fio --rw=write --bs=4k --iodepth=1 --runtime=1m --direct=1 --filename=/mnt/write_4k_1 --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --output=/tmp/fio_result &> /dev/null

Actual results:

Here is the results: w/ the bad commit vs w/o the bad commit
http://10.66.94.121/results/regression/2015-w39/firstbad.html

Expected results:

no regression

Additional info:
Comment 4 Ademar Reis 2015-09-24 17:14:41 EDT
For reference, this is the commit pointed out as introducing this performance regression:

commit 95f7142abc86a916682bd735aecd90172ffa0d30
Author: Peter Lieven <pl@kamp.de>
Date:   Mon Feb 2 14:52:21 2015 +0100

    virtio-blk: introduce multiread
    
    this patch finally introduces multiread support to virtio-blk. While
    multiwrite support was there for a long time, read support was missing.
    
    The complete merge logic is moved into virtio-blk.c which has
    been the only user of request merging ever since. This is required
    to be able to merge chunks of requests and immediately invoke callbacks
    for those requests. Secondly, this is required to switch to
    direct invocation of coroutines which is planned at a later stage.
    
    The following benchmarks show the performance of running fio with
    4 worker threads on a local ram disk. The numbers show the average
    of 10 test runs after 1 run as warmup phase.
    
                  |        4k        |       64k        |        4k
    MB/s          | rd seq | rd rand | rd seq | rd rand | wr seq | wr rand
    --------------+--------+---------+--------+---------+--------+--------
    master        | 1221   | 1187    | 4178   | 4114    | 1745   | 1213
    multiread     | 1829   | 1189    | 4639   | 4110    | 1894   | 1216
    
    Signed-off-by: Peter Lieven <pl@kamp.de>
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>


Given it's marked blocker, it should be investigated with high priority. I'm assigning it to Kevin.
Comment 6 Kevin Wolf 2015-09-25 11:11:23 EDT
Quick update from my side: I tried to reproduce this and at first it looked
indeed as if I could, but now it seems that my results are too unstable to be
significant. I'll have to work some more on my benchmarking environment.

I also compared the old and new version of the code and couldn't find any
suspicious differences. The principle of what the new code does isn't much
different from the old, so it might be an implementation detail that makes the
difference.

The only relatively big change I could see is that requests are now completed
individually instead of waiting for all requests of the same batch to complete.
This should be an important fix for latency (QE's benchmark result contradicts,
though), but it might actually hurt throughput if we get many small completion
events now instead of few events where each is for multiple requests.
Comment 7 Kevin Wolf 2015-09-28 10:12:02 EDT
After a few more benchmarks, it looks to me as if the "batch completion" of
the whole multiwrite array plays a role for performance indeed. I tried a hack
that doesn't complete virtio-blk requests until there is no request in flight
any more. This did help the throughput in this test case, but obviously
randomly delaying completion of requests (which must hurt in terms of latency)
is not something we want to do as a fix.

Another thing I noticed is that virtio-blk dataplane notifies the guest of
completions only in a BH, which results in kind of a batch completion for
requests that complete more or less at the same time, without introducing
arbitrary delays. This seems much more reasonable to do for the non-dataplane
case as well, and I did see some difference, but it didn't help quite as much.

Anyway, for this specific bug report, QE has the setup that the problem was
originally found on. So I would ask you to give the following scratch build
a try (it adopts the dataplane approach to notification for non-dataplane):

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9894116
Comment 8 Yanhui Ma 2015-09-29 00:34:23 EDT
(In reply to Kevin Wolf from comment #7)
> After a few more benchmarks, it looks to me as if the "batch completion" of
> the whole multiwrite array plays a role for performance indeed. I tried a
> hack
> that doesn't complete virtio-blk requests until there is no request in flight
> any more. This did help the throughput in this test case, but obviously
> randomly delaying completion of requests (which must hurt in terms of
> latency)
> is not something we want to do as a fix.
> 
> Another thing I noticed is that virtio-blk dataplane notifies the guest of
> completions only in a BH, which results in kind of a batch completion for
> requests that complete more or less at the same time, without introducing
> arbitrary delays. This seems much more reasonable to do for the non-dataplane
> case as well, and I did see some difference, but it didn't help quite as
> much.
> 
> Anyway, for this specific bug report, QE has the setup that the problem was
> originally found on. So I would ask you to give the following scratch build
> a try (it adopts the dataplane approach to notification for non-dataplane):
> 
> http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9894116

Here is the results with above scratch build:

### kvm-userspace-ver : qemu-kvm-rhev-2.3.0-26.el7.bz1265927_bh.x86_64
### kvm_version : 3.10.0-306.el7.x86_64
### guest-kernel-ver :3.10.0-306.el7.x86_64
Category:write
blocksize:4k

Iodepth| BW(MB/S)|  IOPS| Latency(ms)| Host_CPU|  BW/CPU|  KVM_Exits|  Util%
      1|    44.55| 11403|        1.40|    18.59|    2.40|    3255257|  99.91
      8|    43.00| 11006|       11.63|    16.84|    2.55|    3391894|  99.90



Based on the above results, regression still exists.
Comment 9 Kevin Wolf 2015-09-29 05:54:20 EDT
Looks like no change at all on your setup. Then I'm afraid we won't be able to
fix this in 7.2.

Can you try the following build to get some more data points? It has the patch
that enforces batch completion, but which I said won't be suitable for
inclusion. If this doesn't work either, I might have to find some entirely
different cause for the regression.

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9897363
Comment 12 Yanhui Ma 2015-10-08 05:14:19 EDT
(In reply to Kevin Wolf from comment #9)
> Looks like no change at all on your setup. Then I'm afraid we won't be able
> to
> fix this in 7.2.
> 
> Can you try the following build to get some more data points? It has the
> patch
> that enforces batch completion, but which I said won't be suitable for
> inclusion. If this doesn't work either, I might have to find some entirely
> different cause for the regression.
> 
> http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9897363

The host is busy. Once it is idle, I will update results asap.
Comment 13 Yanhui Ma 2015-10-26 04:19:18 EDT
(In reply to Yanhui Ma from comment #12)
> (In reply to Kevin Wolf from comment #9)
> > Looks like no change at all on your setup. Then I'm afraid we won't be able
> > to
> > fix this in 7.2.
> > 
> > Can you try the following build to get some more data points? It has the
> > patch
> > that enforces batch completion, but which I said won't be suitable for
> > inclusion. If this doesn't work either, I might have to find some entirely
> > different cause for the regression.
> > 
> > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9897363
> 
> The host is busy. Once it is idle, I will update results asap.

Hi,Kevin Wolf 
I am always busying with other something, so sorry for the delay.
The build (http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9897363) is overdue. Could you provide a new one? Do I still need to test the build?
Comment 15 Kevin Wolf 2015-11-19 09:22:35 EST
(In reply to Yanhui Ma from comment #13)
> I am always busying with other something, so sorry for the delay.
> The build (http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9897363) is
> overdue. Could you provide a new one? Do I still need to test the build?

Sorry for taking so much time before providing a new build. Please try this one:
https://brewweb.devel.redhat.com/taskinfo?taskID=10124453
Comment 16 Yanhui Ma 2015-11-20 05:14:09 EST
(In reply to Kevin Wolf from comment #15)
> (In reply to Yanhui Ma from comment #13)
> > I am always busying with other something, so sorry for the delay.
> > The build (http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9897363) is
> > overdue. Could you provide a new one? Do I still need to test the build?
> 
> Sorry for taking so much time before providing a new build. Please try this
> one:
> https://brewweb.devel.redhat.com/taskinfo?taskID=10124453

Here are results with above scratch build:

### kvm-userspace-ver : qemu-kvm-rhev-2.3.0-26.el7.bz1265927_gc1.x86_64
### kvm_version : 3.10.0-327.el7.x86_64
### guest-kernel-ver :3.10.0-306.el7.x86_64
Category:write
Block_size| Iodepth| Threads| BW(MB/S)|  IOPS| Latency(ms)| Host_CPU| BW/CPU| KVM_Exits| Util%
         4|       1|      16|    35.06|  8975|        1.78|     8.67|   4.04|    787472| 96.75
         4|       8|      16|    52.20| 13364|        9.58|     8.23|   6.34|    659240| 98.32



Regression still exists with Iodepth 1, but BW with Iodeptch 8 is ok.
Comment 18 Ademar Reis 2016-01-15 07:27:50 EST
(In reply to Yanhui Ma from comment #0)

> Here is the results: w/ the bad commit vs w/o the bad commit
> http://10.66.94.121/results/regression/2015-w39/firstbad.html
> 

Can you please attach these results to the BZ? We need to compare the previou and actual data to understand the regression, but the link above appears to be dead.
Comment 19 Yanhui Ma 2016-01-18 20:37:38 EST
(In reply to Ademar Reis from comment #18)
> (In reply to Yanhui Ma from comment #0)
> 
> > Here is the results: w/ the bad commit vs w/o the bad commit
> > http://10.66.94.121/results/regression/2015-w39/firstbad.html
> > 
> 
> Can you please attach these results to the BZ? We need to compare the
> previou and actual data to understand the regression, but the link above
> appears to be dead.

There is something wrong with server 10.66.94.121. Once it works, I will attach the results
Comment 20 Yanhui Ma 2016-01-19 03:32:59 EST
(In reply to Ademar Reis from comment #18)
> (In reply to Yanhui Ma from comment #0)
> 
> > Here is the results: w/ the bad commit vs w/o the bad commit
> > http://10.66.94.121/results/regression/2015-w39/firstbad.html
> > 
> 
> Can you please attach these results to the BZ? We need to compare the
> previou and actual data to understand the regression, but the link above
> appears to be dead.

Now you can visit the link.
Comment 21 Ademar Reis 2016-08-07 14:57:57 EDT
We had several changes upstream and belive the performance changed (hopefully improved). Please test again with a more recent version of QEMU. Deferring to 7.4 for now.
Comment 27 Yanhui Ma 2017-06-13 01:42:54 EDT
Results in comment 25 show regression still exist, so change the bz assigned. If any error pls correct it.

Note You need to log in before you can comment on or make changes to this bug.