1819253 – Fix overzealous I/O request splitting performance regression

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1819253 - Fix overzealous I/O request splitting performance regression

Summary: Fix overzealous I/O request splitting performance regression

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.8
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Maxim Levitsky
QA Contact:	Yanhui Ma
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-03-31 14:54 UTC by Stefan Hajnoczi
Modified:	2020-08-03 07:24 UTC (History)
CC List:	9 users (show)
Fixed In Version:	qemu-kvm-rhev-2.12.0-47.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-08-03 07:23:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2020:3267	0	None	None	None	2020-08-03 07:24:40 UTC

Description Stefan Hajnoczi 2020-03-31 14:54:26 UTC

Description of problem:

RHEL 7 qemu-kvm-rhev splits I/O requests unnecessarily.  This causes performance degradations in sequential I/O benchmarks with large request sizes.  In this case qemu-kvm actually performs better than qemu-kvm-rhev!

Maxim Levitsky fixed this but in commit 867eccfed84f96b54f4a432c510a02c2ce03b430 ("file-posix: Use max transfer length/segment count only for SCSI passthrough").

This commit needs to be backported.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. 
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Stefan Hajnoczi 2020-03-31 15:05:43 UTC

Steps to Reproduce:
1. Configure a fully allocated raw disk image (qemu-img create -f raw -o preallocation=full test.img 10G) using virtio-blk as /dev/vdb.
1. dd count=100000 if=/dev/vdb of=/dev/null iflag=direct
2. echo 256 >/proc/sys/vbd/queue/max_sector_kb
3. dd count=100000 if=/dev/vdb of=/dev/null iflag=direct

Expected results:
Performance in Step 1 is the same or better than Step 3.

Actual results:
Performance in Step 1 is worse than Step 3.

Comment 3 Maxim Levitsky 2020-03-31 16:05:43 UTC

I'' take that bug.

Comment 4 Maxim Levitsky 2020-04-20 14:43:15 UTC

For QE and/or for my future self if I need to reproduce this again:
The given steps to reproduce have many mistakes and omissions, thus here are the steps I used:


1. Have a raw block device. 
   I used nesting so outer VM attaches to a file, and inner VM sees that file as block device and passes it to the inner guest.

   Outer VM uses this: 
  -blockdev node-name=test_disk,driver=file,discard=unmap,aio=native,cache.direct=on,filename=./test.img
  -device virtio-blk-pci,id=test-blk0,drive=test_disk,bootindex=-1

  and the file was created as Stefan said: qemu-img create -f raw -o preallocation=full test.img 10G


2. Create a nested VM that attaches to the raw block device from step 1. I used this:
   (VM type doesn't matter, but I used RHEL7 VM)

   -blockdev node-name=test_disk,driver=host_device,discard=unmap,aio=native,cache.direct=on,filename=/dev/vda
   -device virtio-blk-pci,id=test-blk0,drive=test_disk,bootindex=-1
   

3. in the nested VM run 
   dd if=/dev/vda of=/dev/null bs=1M iflag=direct

4. now in the outer VM do:
   echo 256 > /sys/block/vda/queue/max_sectors_kb
   (you can use a lower number for even worse performance)

5. repeat (3)


The results on my laptop:

Without patch:

512(default):
10737418240 bytes (11 GB) copied, 12.6746 s, 847 MB/s

128:
10737418240 bytes (11 GB) copied, 18.5417 s, 579 MB/s

32:
10737418240 bytes (11 GB) copied, 41.8474 s, 257 MB/s

4:
10737418240 bytes (11 GB) copied, 210.66 s, 51.0 MB/s


With patch:
512(default):
10737418240 bytes (11 GB) copied, 12.185 s, 881 MB/s


128:
10737418240 bytes (11 GB) copied, 10.339 s, 1.0 GB/s

32:
10737418240 bytes (11 GB) copied, 11.0392 s, 973 MB/s

4:
10737418240 bytes (11 GB) copied, 16.226 s, 662 MB/s


I'll send the backport soon.

Comment 10 Yanhui Ma 2020-05-11 08:45:25 UTC

Hi Maxim,

Could you help check my test steps? It seems no obvious performance difference between qemu-kvm-rhev-2.12.0-46 and qemu-kvm-rhev-2.12.0-47.

Thanks!

Both host and L1 guest use qemu-kvm-rhev-2.12.0-46.el7.x86_64.

1. qemu-img create -f raw -o preallocation=full test.img 10G

2. booting L1 guest on host with following command line:

/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -device pvpanic,ioport=0x505,id=idQ9nMBz  \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -blockdev node-name=file_image1,driver=file,discard=unmap,aio=native,cache.direct=on,filename=/home/kvm_autotest_root/images/rhel79-64-virtiol1.qcow2 \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,file=file_image1 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
    -blockdev node-name=test_disk,driver=file,discard=unmap,aio=native,cache.direct=on,filename=/home/kvm_autotest_root/images/test.img \
    -device virtio-blk-pci,id=test-blk0,drive=test_disk \
    -device virtio-net-pci,mac=9a:77:78:79:7a:7b,id=id3JKpIv,vectors=4,netdev=idmenXSw,bus=pci.0  \
    -netdev tap,id=idmenXSw,vhost=on \
    -m 15360  \
    -smp 16,maxcpus=16,cores=8,threads=1,sockets=2  \
    -cpu 'Westmere',+kvm_pv_unhalt,vmx \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm -monitor stdio

3. booting L2 guest in L1 guest:

/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -device pvpanic,ioport=0x505,id=idQ9nMBz  \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -blockdev node-name=file_image1,driver=file,discard=unmap,aio=native,cache.direct=on,filename=/root/rhel79-64-virtio.qcow2 \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,file=file_image1 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
    -blockdev node-name=test_disk,driver=host_device,discard=unmap,aio=native,cache.direct=on,filename=/dev/vdb \
    -device virtio-blk-pci,id=test-blk0,drive=test_disk \
    -device virtio-net-pci,id=id3JKpIv,vectors=4,netdev=idmenXSw,bus=pci.0  \
    -netdev tap,id=idmenXSw,vhost=on \
    -m 8G  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Westmere',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm -monitor stdio

4. in the L2 guest run 
   dd if=/dev/vdb of=/dev/null bs=1M iflag=direct

5. in the L1 guest do:
   echo 256/128/32/4 > /sys/block/vda/queue/max_sectors_kb

6. repeat 4.

Test results:
512
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 12.9342 s, 830 MB/s

256
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 12.9695 s, 828 MB/s

128
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 10.9139 s, 984 MB/s

32
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 10.7093 s, 1.0 GB/s

4
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 13.8723 s, 774 MB/s

Both host and L1 guest use qemu-kvm-rhev-2.12.0-47.el7.x86_64 and test it again with above steps.

Test results:

512
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 12.9055 s, 832 MB/s

256
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 11.8108 s, 909 MB/s

128
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 11.1274 s, 965 MB/s

32
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 11.6974 s, 918 MB/s

4
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 14.1666 s, 758 MB/s

Comment 11 Maxim Levitsky 2020-05-11 09:05:17 UTC

'echo 256/128/32/4 > /sys/block/vda/queue/max_sectors_kb' this is the mistake

You need to change the block queue limits on the  device that you pass to the L2 guest,
which is /dev/vdb (and it happens to be /dev/vdb in the L2 guest as well).

Comment 12 Yanhui Ma 2020-05-11 13:13:49 UTC

(In reply to Maxim Levitsky from comment #11)
> 'echo 256/128/32/4 > /sys/block/vda/queue/max_sectors_kb' this is the mistake
> 
> You need to change the block queue limits on the  device that you pass to
> the L2 guest,
> which is /dev/vdb (and it happens to be /dev/vdb in the L2 guest as well).

Sorry, I wrote it wrong, actually I ran 'echo 256/128/32/4 > /sys/block/vdb/queue/max_sectors_kb'.

Comment 13 Maxim Levitsky 2020-05-11 13:54:35 UTC

Have you restarted the L2 guest after you write to /sys/block/vdb/queue/max_sectors_kb in L2 guest?

Comment 14 Yanhui Ma 2020-05-11 14:50:37 UTC

(In reply to Maxim Levitsky from comment #13)
> Have you restarted the L2 guest after you write to
> /sys/block/vdb/queue/max_sectors_kb in L2 guest?

No, I haven't restarted the L2 guest, I need to restart L2 guest, is it right?

Comment 15 Maxim Levitsky 2020-05-11 16:04:50 UTC

Yes you need to restart the L2 guest.

Sorry for not mentioning this at the beginning, I didn't pay attention to this detail.

The block max segment size is picked up by qemu only when it creates the 'host_device' block device instance.
In theory you can hotunplug/plug that block device from L2 guest as well, but restarting it is just easier.

Comment 16 Yanhui Ma 2020-05-12 03:16:45 UTC

(In reply to Maxim Levitsky from comment #15)
> Yes you need to restart the L2 guest.
> 
> Sorry for not mentioning this at the beginning, I didn't pay attention to
> this detail.
> 
> The block max segment size is picked up by qemu only when it creates the
> 'host_device' block device instance.
> In theory you can hotunplug/plug that block device from L2 guest as well,
> but restarting it is just easier.

Thanks very much for your help! I re-tested it again, here are test resutls:


qemu-img-rhev-2.12.0-46.el7.x86_64
512
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 14.2687 s, 753 MB/s

128
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 26.4747 s, 406 MB/s

32
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 49.9726 s, 215 MB/s


4
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 133.068 s, 80.7 MB/s

qemu-img-rhev-2.12.0-47.el7.x86_64
512
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 13.7435 s, 781 MB/s

128
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 12.1444 s, 884 MB/s

32
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 12.7501 s, 842 MB/s

4
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 15.5929 s, 689 MB/s

Now I think the bug can be set verified status.

Comment 17 Maxim Levitsky 2020-05-12 07:34:45 UTC

Looks fine to me.

Comment 20 errata-xmlrpc 2020-08-03 07:23:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3267

Note You need to log in before you can comment on or make changes to this bug.