Bug 1819253

Summary: Fix overzealous I/O request splitting performance regression
Product: Red Hat Enterprise Linux 7 Reporter: Stefan Hajnoczi <stefanha>
Component: qemu-kvm-rhevAssignee: Maxim Levitsky <mlevitsk>
Status: CLOSED ERRATA QA Contact: Yanhui Ma <yama>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.8CC: atheurer, chayang, dshaks, krister, mimehta, mlevitsk, smalleni, virt-maint, yama
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.12.0-47.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-03 07:23:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stefan Hajnoczi 2020-03-31 14:54:26 UTC
Description of problem:

RHEL 7 qemu-kvm-rhev splits I/O requests unnecessarily.  This causes performance degradations in sequential I/O benchmarks with large request sizes.  In this case qemu-kvm actually performs better than qemu-kvm-rhev!

Maxim Levitsky fixed this but in commit 867eccfed84f96b54f4a432c510a02c2ce03b430 ("file-posix: Use max transfer length/segment count only for SCSI passthrough").

This commit needs to be backported.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. 
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Stefan Hajnoczi 2020-03-31 15:05:43 UTC
Steps to Reproduce:
1. Configure a fully allocated raw disk image (qemu-img create -f raw -o preallocation=full test.img 10G) using virtio-blk as /dev/vdb.
1. dd count=100000 if=/dev/vdb of=/dev/null iflag=direct
2. echo 256 >/proc/sys/vbd/queue/max_sector_kb
3. dd count=100000 if=/dev/vdb of=/dev/null iflag=direct

Expected results:
Performance in Step 1 is the same or better than Step 3.

Actual results:
Performance in Step 1 is worse than Step 3.

Comment 3 Maxim Levitsky 2020-03-31 16:05:43 UTC
I'' take that bug.

Comment 4 Maxim Levitsky 2020-04-20 14:43:15 UTC
For QE and/or for my future self if I need to reproduce this again:
The given steps to reproduce have many mistakes and omissions, thus here are the steps I used:


1. Have a raw block device. 
   I used nesting so outer VM attaches to a file, and inner VM sees that file as block device and passes it to the inner guest.

   Outer VM uses this: 
  -blockdev node-name=test_disk,driver=file,discard=unmap,aio=native,cache.direct=on,filename=./test.img
  -device virtio-blk-pci,id=test-blk0,drive=test_disk,bootindex=-1

  and the file was created as Stefan said: qemu-img create -f raw -o preallocation=full test.img 10G


2. Create a nested VM that attaches to the raw block device from step 1. I used this:
   (VM type doesn't matter, but I used RHEL7 VM)

   -blockdev node-name=test_disk,driver=host_device,discard=unmap,aio=native,cache.direct=on,filename=/dev/vda
   -device virtio-blk-pci,id=test-blk0,drive=test_disk,bootindex=-1
   

3. in the nested VM run 
   dd if=/dev/vda of=/dev/null bs=1M iflag=direct

4. now in the outer VM do:
   echo 256 > /sys/block/vda/queue/max_sectors_kb
   (you can use a lower number for even worse performance)

5. repeat (3)


The results on my laptop:

Without patch:

512(default):
10737418240 bytes (11 GB) copied, 12.6746 s, 847 MB/s

128:
10737418240 bytes (11 GB) copied, 18.5417 s, 579 MB/s

32:
10737418240 bytes (11 GB) copied, 41.8474 s, 257 MB/s

4:
10737418240 bytes (11 GB) copied, 210.66 s, 51.0 MB/s


With patch:
512(default):
10737418240 bytes (11 GB) copied, 12.185 s, 881 MB/s


128:
10737418240 bytes (11 GB) copied, 10.339 s, 1.0 GB/s

32:
10737418240 bytes (11 GB) copied, 11.0392 s, 973 MB/s

4:
10737418240 bytes (11 GB) copied, 16.226 s, 662 MB/s


I'll send the backport soon.

Comment 10 Yanhui Ma 2020-05-11 08:45:25 UTC
Hi Maxim,

Could you help check my test steps? It seems no obvious performance difference between qemu-kvm-rhev-2.12.0-46 and qemu-kvm-rhev-2.12.0-47.

Thanks!

Both host and L1 guest use qemu-kvm-rhev-2.12.0-46.el7.x86_64.

1. qemu-img create -f raw -o preallocation=full test.img 10G

2. booting L1 guest on host with following command line:

/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -device pvpanic,ioport=0x505,id=idQ9nMBz  \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -blockdev node-name=file_image1,driver=file,discard=unmap,aio=native,cache.direct=on,filename=/home/kvm_autotest_root/images/rhel79-64-virtiol1.qcow2 \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,file=file_image1 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
    -blockdev node-name=test_disk,driver=file,discard=unmap,aio=native,cache.direct=on,filename=/home/kvm_autotest_root/images/test.img \
    -device virtio-blk-pci,id=test-blk0,drive=test_disk \
    -device virtio-net-pci,mac=9a:77:78:79:7a:7b,id=id3JKpIv,vectors=4,netdev=idmenXSw,bus=pci.0  \
    -netdev tap,id=idmenXSw,vhost=on \
    -m 15360  \
    -smp 16,maxcpus=16,cores=8,threads=1,sockets=2  \
    -cpu 'Westmere',+kvm_pv_unhalt,vmx \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm -monitor stdio

3. booting L2 guest in L1 guest:

/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -device pvpanic,ioport=0x505,id=idQ9nMBz  \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -blockdev node-name=file_image1,driver=file,discard=unmap,aio=native,cache.direct=on,filename=/root/rhel79-64-virtio.qcow2 \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,file=file_image1 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
    -blockdev node-name=test_disk,driver=host_device,discard=unmap,aio=native,cache.direct=on,filename=/dev/vdb \
    -device virtio-blk-pci,id=test-blk0,drive=test_disk \
    -device virtio-net-pci,id=id3JKpIv,vectors=4,netdev=idmenXSw,bus=pci.0  \
    -netdev tap,id=idmenXSw,vhost=on \
    -m 8G  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Westmere',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm -monitor stdio

4. in the L2 guest run 
   dd if=/dev/vdb of=/dev/null bs=1M iflag=direct

5. in the L1 guest do:
   echo 256/128/32/4 > /sys/block/vda/queue/max_sectors_kb

6. repeat 4.

Test results:
512
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 12.9342 s, 830 MB/s

256
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 12.9695 s, 828 MB/s

128
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 10.9139 s, 984 MB/s

32
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 10.7093 s, 1.0 GB/s

4
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 13.8723 s, 774 MB/s

Both host and L1 guest use qemu-kvm-rhev-2.12.0-47.el7.x86_64 and test it again with above steps.

Test results:

512
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 12.9055 s, 832 MB/s

256
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 11.8108 s, 909 MB/s

128
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 11.1274 s, 965 MB/s

32
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 11.6974 s, 918 MB/s

4
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 14.1666 s, 758 MB/s

Comment 11 Maxim Levitsky 2020-05-11 09:05:17 UTC
'echo 256/128/32/4 > /sys/block/vda/queue/max_sectors_kb' this is the mistake

You need to change the block queue limits on the  device that you pass to the L2 guest,
which is /dev/vdb (and it happens to be /dev/vdb in the L2 guest as well).

Comment 12 Yanhui Ma 2020-05-11 13:13:49 UTC
(In reply to Maxim Levitsky from comment #11)
> 'echo 256/128/32/4 > /sys/block/vda/queue/max_sectors_kb' this is the mistake
> 
> You need to change the block queue limits on the  device that you pass to
> the L2 guest,
> which is /dev/vdb (and it happens to be /dev/vdb in the L2 guest as well).

Sorry, I wrote it wrong, actually I ran 'echo 256/128/32/4 > /sys/block/vdb/queue/max_sectors_kb'.

Comment 13 Maxim Levitsky 2020-05-11 13:54:35 UTC
Have you restarted the L2 guest after you write to /sys/block/vdb/queue/max_sectors_kb in L2 guest?

Comment 14 Yanhui Ma 2020-05-11 14:50:37 UTC
(In reply to Maxim Levitsky from comment #13)
> Have you restarted the L2 guest after you write to
> /sys/block/vdb/queue/max_sectors_kb in L2 guest?

No, I haven't restarted the L2 guest, I need to restart L2 guest, is it right?

Comment 15 Maxim Levitsky 2020-05-11 16:04:50 UTC
Yes you need to restart the L2 guest.

Sorry for not mentioning this at the beginning, I didn't pay attention to this detail.

The block max segment size is picked up by qemu only when it creates the 'host_device' block device instance.
In theory you can hotunplug/plug that block device from L2 guest as well, but restarting it is just easier.

Comment 16 Yanhui Ma 2020-05-12 03:16:45 UTC
(In reply to Maxim Levitsky from comment #15)
> Yes you need to restart the L2 guest.
> 
> Sorry for not mentioning this at the beginning, I didn't pay attention to
> this detail.
> 
> The block max segment size is picked up by qemu only when it creates the
> 'host_device' block device instance.
> In theory you can hotunplug/plug that block device from L2 guest as well,
> but restarting it is just easier.

Thanks very much for your help! I re-tested it again, here are test resutls:


qemu-img-rhev-2.12.0-46.el7.x86_64
512
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 14.2687 s, 753 MB/s

128
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 26.4747 s, 406 MB/s

32
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 49.9726 s, 215 MB/s


4
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 133.068 s, 80.7 MB/s

qemu-img-rhev-2.12.0-47.el7.x86_64
512
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 13.7435 s, 781 MB/s

128
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 12.1444 s, 884 MB/s

32
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 12.7501 s, 842 MB/s

4
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 15.5929 s, 689 MB/s

Now I think the bug can be set verified status.

Comment 17 Maxim Levitsky 2020-05-12 07:34:45 UTC
Looks fine to me.

Comment 20 errata-xmlrpc 2020-08-03 07:23:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3267