Bug 1819253
| Summary: | Fix overzealous I/O request splitting performance regression | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Stefan Hajnoczi <stefanha> |
| Component: | qemu-kvm-rhev | Assignee: | Maxim Levitsky <mlevitsk> |
| Status: | CLOSED ERRATA | QA Contact: | Yanhui Ma <yama> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 7.8 | CC: | atheurer, chayang, dshaks, krister, mimehta, mlevitsk, smalleni, virt-maint, yama |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-rhev-2.12.0-47.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-08-03 07:23:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Steps to Reproduce: 1. Configure a fully allocated raw disk image (qemu-img create -f raw -o preallocation=full test.img 10G) using virtio-blk as /dev/vdb. 1. dd count=100000 if=/dev/vdb of=/dev/null iflag=direct 2. echo 256 >/proc/sys/vbd/queue/max_sector_kb 3. dd count=100000 if=/dev/vdb of=/dev/null iflag=direct Expected results: Performance in Step 1 is the same or better than Step 3. Actual results: Performance in Step 1 is worse than Step 3. I'' take that bug. For QE and/or for my future self if I need to reproduce this again: The given steps to reproduce have many mistakes and omissions, thus here are the steps I used: 1. Have a raw block device. I used nesting so outer VM attaches to a file, and inner VM sees that file as block device and passes it to the inner guest. Outer VM uses this: -blockdev node-name=test_disk,driver=file,discard=unmap,aio=native,cache.direct=on,filename=./test.img -device virtio-blk-pci,id=test-blk0,drive=test_disk,bootindex=-1 and the file was created as Stefan said: qemu-img create -f raw -o preallocation=full test.img 10G 2. Create a nested VM that attaches to the raw block device from step 1. I used this: (VM type doesn't matter, but I used RHEL7 VM) -blockdev node-name=test_disk,driver=host_device,discard=unmap,aio=native,cache.direct=on,filename=/dev/vda -device virtio-blk-pci,id=test-blk0,drive=test_disk,bootindex=-1 3. in the nested VM run dd if=/dev/vda of=/dev/null bs=1M iflag=direct 4. now in the outer VM do: echo 256 > /sys/block/vda/queue/max_sectors_kb (you can use a lower number for even worse performance) 5. repeat (3) The results on my laptop: Without patch: 512(default): 10737418240 bytes (11 GB) copied, 12.6746 s, 847 MB/s 128: 10737418240 bytes (11 GB) copied, 18.5417 s, 579 MB/s 32: 10737418240 bytes (11 GB) copied, 41.8474 s, 257 MB/s 4: 10737418240 bytes (11 GB) copied, 210.66 s, 51.0 MB/s With patch: 512(default): 10737418240 bytes (11 GB) copied, 12.185 s, 881 MB/s 128: 10737418240 bytes (11 GB) copied, 10.339 s, 1.0 GB/s 32: 10737418240 bytes (11 GB) copied, 11.0392 s, 973 MB/s 4: 10737418240 bytes (11 GB) copied, 16.226 s, 662 MB/s I'll send the backport soon. Hi Maxim,
Could you help check my test steps? It seems no obvious performance difference between qemu-kvm-rhev-2.12.0-46 and qemu-kvm-rhev-2.12.0-47.
Thanks!
Both host and L1 guest use qemu-kvm-rhev-2.12.0-46.el7.x86_64.
1. qemu-img create -f raw -o preallocation=full test.img 10G
2. booting L1 guest on host with following command line:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-machine pc \
-nodefaults \
-device VGA,bus=pci.0,addr=0x2 \
-device pvpanic,ioport=0x505,id=idQ9nMBz \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
-blockdev node-name=file_image1,driver=file,discard=unmap,aio=native,cache.direct=on,filename=/home/kvm_autotest_root/images/rhel79-64-virtiol1.qcow2 \
-blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,file=file_image1 \
-device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
-blockdev node-name=test_disk,driver=file,discard=unmap,aio=native,cache.direct=on,filename=/home/kvm_autotest_root/images/test.img \
-device virtio-blk-pci,id=test-blk0,drive=test_disk \
-device virtio-net-pci,mac=9a:77:78:79:7a:7b,id=id3JKpIv,vectors=4,netdev=idmenXSw,bus=pci.0 \
-netdev tap,id=idmenXSw,vhost=on \
-m 15360 \
-smp 16,maxcpus=16,cores=8,threads=1,sockets=2 \
-cpu 'Westmere',+kvm_pv_unhalt,vmx \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-vnc :0 \
-rtc base=utc,clock=host,driftfix=slew \
-boot menu=off,strict=off,order=cdn,once=c \
-enable-kvm -monitor stdio
3. booting L2 guest in L1 guest:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-machine pc \
-nodefaults \
-device VGA,bus=pci.0,addr=0x2 \
-device pvpanic,ioport=0x505,id=idQ9nMBz \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
-blockdev node-name=file_image1,driver=file,discard=unmap,aio=native,cache.direct=on,filename=/root/rhel79-64-virtio.qcow2 \
-blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,file=file_image1 \
-device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
-blockdev node-name=test_disk,driver=host_device,discard=unmap,aio=native,cache.direct=on,filename=/dev/vdb \
-device virtio-blk-pci,id=test-blk0,drive=test_disk \
-device virtio-net-pci,id=id3JKpIv,vectors=4,netdev=idmenXSw,bus=pci.0 \
-netdev tap,id=idmenXSw,vhost=on \
-m 8G \
-smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \
-cpu 'Westmere',+kvm_pv_unhalt \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-vnc :0 \
-rtc base=utc,clock=host,driftfix=slew \
-boot menu=off,strict=off,order=cdn,once=c \
-enable-kvm -monitor stdio
4. in the L2 guest run
dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
5. in the L1 guest do:
echo 256/128/32/4 > /sys/block/vda/queue/max_sectors_kb
6. repeat 4.
Test results:
512
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 12.9342 s, 830 MB/s
256
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 12.9695 s, 828 MB/s
128
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 10.9139 s, 984 MB/s
32
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 10.7093 s, 1.0 GB/s
4
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 13.8723 s, 774 MB/s
Both host and L1 guest use qemu-kvm-rhev-2.12.0-47.el7.x86_64 and test it again with above steps.
Test results:
512
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 12.9055 s, 832 MB/s
256
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 11.8108 s, 909 MB/s
128
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 11.1274 s, 965 MB/s
32
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 11.6974 s, 918 MB/s
4
[root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 14.1666 s, 758 MB/s
'echo 256/128/32/4 > /sys/block/vda/queue/max_sectors_kb' this is the mistake You need to change the block queue limits on the device that you pass to the L2 guest, which is /dev/vdb (and it happens to be /dev/vdb in the L2 guest as well). (In reply to Maxim Levitsky from comment #11) > 'echo 256/128/32/4 > /sys/block/vda/queue/max_sectors_kb' this is the mistake > > You need to change the block queue limits on the device that you pass to > the L2 guest, > which is /dev/vdb (and it happens to be /dev/vdb in the L2 guest as well). Sorry, I wrote it wrong, actually I ran 'echo 256/128/32/4 > /sys/block/vdb/queue/max_sectors_kb'. Have you restarted the L2 guest after you write to /sys/block/vdb/queue/max_sectors_kb in L2 guest? (In reply to Maxim Levitsky from comment #13) > Have you restarted the L2 guest after you write to > /sys/block/vdb/queue/max_sectors_kb in L2 guest? No, I haven't restarted the L2 guest, I need to restart L2 guest, is it right? Yes you need to restart the L2 guest. Sorry for not mentioning this at the beginning, I didn't pay attention to this detail. The block max segment size is picked up by qemu only when it creates the 'host_device' block device instance. In theory you can hotunplug/plug that block device from L2 guest as well, but restarting it is just easier. (In reply to Maxim Levitsky from comment #15) > Yes you need to restart the L2 guest. > > Sorry for not mentioning this at the beginning, I didn't pay attention to > this detail. > > The block max segment size is picked up by qemu only when it creates the > 'host_device' block device instance. > In theory you can hotunplug/plug that block device from L2 guest as well, > but restarting it is just easier. Thanks very much for your help! I re-tested it again, here are test resutls: qemu-img-rhev-2.12.0-46.el7.x86_64 512 [root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 14.2687 s, 753 MB/s 128 [root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 26.4747 s, 406 MB/s 32 [root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 49.9726 s, 215 MB/s 4 [root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 133.068 s, 80.7 MB/s qemu-img-rhev-2.12.0-47.el7.x86_64 512 [root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 13.7435 s, 781 MB/s 128 [root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 12.1444 s, 884 MB/s 32 [root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 12.7501 s, 842 MB/s 4 [root@vm-74-48 ~]# dd if=/dev/vdb of=/dev/null bs=1M iflag=direct 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 15.5929 s, 689 MB/s Now I think the bug can be set verified status. Looks fine to me. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3267 |
Description of problem: RHEL 7 qemu-kvm-rhev splits I/O requests unnecessarily. This causes performance degradations in sequential I/O benchmarks with large request sizes. In this case qemu-kvm actually performs better than qemu-kvm-rhev! Maxim Levitsky fixed this but in commit 867eccfed84f96b54f4a432c510a02c2ce03b430 ("file-posix: Use max transfer length/segment count only for SCSI passthrough"). This commit needs to be backported. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: