Bug 1876455
| Summary: | ~50% disk performance drop when comparing RHELAV-8.3 with RHELAV-8.2 on Power8 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Zhenyu Zhang <zhenyzha> |
| Component: | qemu-kvm | Assignee: | Virtualization Maintenance <virt-maint> |
| qemu-kvm sub component: | Devices | QA Contact: | Xujun Ma <xuma> |
| Status: | CLOSED WONTFIX | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | jinzhao, juzhang, lvivier, ngu, qzhang, stefanha, virt-maint, yama |
| Version: | unspecified | Keywords: | Regression, Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-07 07:27:17 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 2
Laurent Vivier
2020-09-21 18:26:29 UTC
(In reply to Laurent Vivier from comment #2) > (In reply to Zhenyu Zhang from comment #0) > ... > > > > How reproducible: > > Always > > > > Steps to Reproduce: > > 1.on host: > > qemu-img create -f qcow2 /mnt/storage2.qcow2 40G > > > > 2.boot a guest with following cmd line: > > MALLOC_PERTURB_=1 numactl \ > > -m 16 /usr/libexec/qemu-kvm \ > ... > > -m 29696 \ > > You have -m twice. Sorry, I misread. This means numactl allocates all the memory from node 16, but there is no node 16. (and cpu 64 and 65 are on node 0) (In reply to Laurent Vivier from comment #2) > Could you run the test with a raw image rather than a qcow2? > It will help to know if the problem is with the file format or with the > scheduler. Okay, testing, the results will be updated later. The regression seems to be introduced by:
commit c9b7d9ec21dfca716f0bb3b68dee75660d86629c
Author: Denis Plotnikov <dplotnikov>
Date: Fri Feb 14 10:46:48 2020 +0300
virtio: increase virtqueue size for virtio-scsi and virtio-blk
The goal is to reduce the amount of requests issued by a guest on
1M reads/writes. This rises the performance up to 4% on that kind of
disk access pattern.
The maximum chunk size to be used for the guest disk accessing is
limited with seg_max parameter, which represents the max amount of
pices in the scatter-geather list in one guest disk request.
Since seg_max is virqueue_size dependent, increasing the virtqueue
size increases seg_max, which, in turn, increases the maximum size
of data to be read/write from a guest disk.
More details in the original problem statment:
https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03721.html
Suggested-by: Denis V. Lunev <den>
Signed-off-by: Denis Plotnikov <dplotnikov>
Message-id: 20200214074648.958-1-dplotnikov
Signed-off-by: Stefan Hajnoczi <stefanha>
A simple workaround would be to restore original parameter with:
... -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4,virtqueue_size=128 ...
Stefan, any idea of what happens here?
qemu-4.2.0: write: IOPS=214, BW=857KiB/s (877kB/s)(50.8MiB/60754msec); 0 zone resets
qemu-5.1.0: write: IOPS=147, BW=591KiB/s (605kB/s)(35.1MiB/60775msec); 0 zone resets
Zhenyu, could you check the regression disappears with the "virtqueue_size=128" parameter with RHEL-AV-8.3.0? could you check if we have also the regression on x86_64 host? Thank you (In reply to Laurent Vivier from comment #6) > could you check the regression disappears with the "virtqueue_size=128" > parameter with RHEL-AV-8.3.0? > > could you check if we have also the regression on x86_64 host? Okay, I got it. Due to the small number of Power8, I just borrowed it not long ago. In addition, I will try on the x86 platform. Results updated today Through the following test, the "virtqueue_size=128" parameter has little effect on performance 1.on host: rm -rf /mnt/storage2.raw && /usr/bin/qemu-img create -f raw /mnt/storage2.raw 40G 2.boot a guest with following cmd line: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine pseries \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ -m 24576 \ -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 \ -cpu 'host' \ -chardev socket,server,id=chardev_serial0,path=/var/tmp/serial-serial0,nowait \ -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4,virtqueue_size=128 \ -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/test/os.raw,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=raw,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -blockdev node-name=file_disk1,driver=file,aio=threads,filename=/mnt/storage2.raw,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_disk1,driver=raw,cache.direct=on,cache.no-flush=off,file=file_disk1 \ -device scsi-hd,id=disk1,drive=drive_disk1,write-cache=on \ -device virtio-net-pci,mac=9a:d7:d6:87:5c:cf,id=idV6n32I,netdev=idZ3QcRa,bus=pci.0,addr=0x5 \ -netdev tap,id=idZ3QcRa \ -vnc :20 \ -rtc base=utc,clock=host \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -monitor stdio 3.With mkfs to create xfs partition on guest: mkfs.xfs /dev/sdb && mount /dev/sdb /mnt fio --rw=write --bs=4k --iodepth=8 --runtime=1m --direct=1 --filename=/mnt/read_4k_8 --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --output=/tmp/fio_result With "virtqueue_size=128" parameter: [w=20.7MiB/s][w=5297 IOPS] [w=22.2MiB/s][w=5696 IOPS] [w=22.7MiB/s][w=5811 IOPS] [w=25.9MiB/s][w=6637 IOPS] ======= average value ========== 22.9MiB/s Without "virtqueue_size=128" parameter: [w=18.8MiB/s][w=4808 IOPS] [w=14.2MiB/s][w=3634 IOPS] [w=24.7MiB/s][w=6330 IOPS] [w=20.8MiB/s][w=5324 IOPS] ======= average value ========== 19.6MiB/s ---------> Performance degradation 14% (In reply to Zhenyu Zhang from comment #9) > Through the following test, the "virtqueue_size=128" parameter has little > effect on performance > > 1.on host: > rm -rf /mnt/storage2.raw && /usr/bin/qemu-img create -f raw > /mnt/storage2.raw 40G > > 2.boot a guest with following cmd line: > /usr/libexec/qemu-kvm \ Where is the numactl command? > -name 'avocado-vt-vm1' \ > -sandbox on \ > -machine pseries \ > -nodefaults \ > -device VGA,bus=pci.0,addr=0x2 \ > -m 24576 \ You have changed the memory size. > -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 \ You have changed the number of vCPUs. Did you pin them all to host CPUs? As you changed the test conditions (I think because the host has been changed), the results are not comparable. Please, for a given host and a given configuration, run all the tests in the same condition: - rhel-av-8.2.0 - rhel-av-8.3.0 - rhel-av-8.3.0 with virtqueue_size=128 (In reply to Laurent Vivier from comment #10) > As you changed the test conditions (I think because the host has been > changed), the results are not comparable. Hi Laurent, Because sometimes the host borrowed from the beaker cannot be guaranteed to be the same, the hardware configuration will be slightly different. But my approach every time is to perform the same steps on the same host to ensure that each comparison is valid. Because if it is a software issue, then different hardware products should be able to reproduce. > Please, for a given host and a given configuration, run all the tests in the > same condition: > > - rhel-av-8.2.0 > - rhel-av-8.3.0 > - rhel-av-8.3.0 with virtqueue_size=128 Okay, I will borrow the same host for again test and update the results later. Denis Lunev from Virtuozzo is aware of this bug and is investigating. It is not POWER-specific, there are also regressions on x86. https://www.mail-archive.com/qemu-devel@nongnu.org/msg742007.html Both the virtqueue size and virtio-blk's seg-max configuration space field were changed. The seg_max commit is here: commit 1bf8a989a566b2ba41c197004ec2a02562a766a4 Author: Denis Plotnikov <dplotnikov> Date: Fri Dec 20 17:09:04 2019 +0300 virtio: make seg_max virtqueue size dependent How is the performance when you set -device virtio-blk-pci,seg-max-adjust=off,queue-size=256? If the performance is fine then we know the issue is caused by 1bf8a989a566b2ba41c197004ec2a02562a766a4. If the performance is bad then we know the issue is caused by c9b7d9ec21dfca716f0bb3b68dee75660d86629c. (Because these patches depend on each other git-bisect is not enough to figure out which one causes the problem.) *** Bug 1859048 has been marked as a duplicate of this bug. *** According to comment 12, setting "Hardware" to "All" (In reply to Stefan Hajnoczi from comment #12) > How is the performance when you set -device > virtio-blk-pci,seg-max-adjust=off,queue-size=256? > > If the performance is fine then we know the issue is caused by > 1bf8a989a566b2ba41c197004ec2a02562a766a4. > > If the performance is bad then we know the issue is caused by > c9b7d9ec21dfca716f0bb3b68dee75660d86629c. > > (Because these patches depend on each other git-bisect is not enough to > figure out which one causes the problem.) Hi Stefan, In the same qemu version, with seg-max-adjust=off, queue-size=256 has no great impact on performance. Below are my test steps, please let me know if I missed something. qemu version: qemu-kvm-5.1.0-13.module+el8.3.0+8382+afc3bbea hostkernel:4.18.0-240.el8.ppc64le guestkernel:4.18.0-240.el8.ppc64le SLOF-20200717-1.gite18ddad8.module+el8.3.0+7638+07cf13d2.noarch hostname: ibm-p8-kvm-01-qe.khw1.lab.eng.bos.redhat.com 1.on host: rm -rf /mnt/storage2.raw && /usr/bin/qemu-img create -f raw /mnt/storage2.raw 40G 2.boot a guest with following cmd line: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine pseries \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ -m 24576 \ -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 \ -cpu 'host' \ -chardev socket,server,id=chardev_serial0,path=/var/tmp/serial-serial0,nowait \ -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \ -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/test/os.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pci.0,addr=0x4,seg-max-adjust=off,queue-size=256 \ -blockdev node-name=file_disk1,driver=file,aio=threads,filename=/mnt/storage2.raw,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_disk1,driver=raw,cache.direct=on,cache.no-flush=off,file=file_disk1 \ -device virtio-blk-pci,id=disk1,drive=drive_disk1,bootindex=1,write-cache=on,bus=pci.0,addr=0x5,seg-max-adjust=off,queue-size=256 \ -device virtio-net-pci,mac=9a:d7:d6:87:5c:cf,id=idV6n32I,netdev=idZ3QcRa,bus=pci.0,addr=0x6 \ -netdev tap,id=idZ3QcRa \ -vnc :20 \ -rtc base=utc,clock=host \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -monitor stdio 3.With mkfs to create xfs partition on guest: mkfs.xfs /dev/vdb && mount /dev/vdb /mnt fio --rw=write --bs=4k --iodepth=8 --runtime=1m --direct=1 --filename=/mnt/read_4k_8 --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --output=/tmp/fio_result [w=11.8MiB/s][w=3009 IOPS] [w=10.9MiB/s][w=2794 IOPS] [w=9494KiB/s][w=2373 IOPS] [w=12.9MiB/s][w=3307 IOPS] [w=9886KiB/s][w=2471 IOPS] ======= With seg-max-adjust=off,queue-size=256 average value ========== 10.98MiB/s [w=10.2MiB/s][w=2602 IOPS] [w=11.3MiB/s][w=2903 IOPS] [w=9.00MiB/s][w=2559 IOPS] [w=12.7MiB/s][w=3242 IOPS] [w=9670KiB/s][w=2417 IOPS] ======= Without seg-max-adjust=off,queue-size=256 average value ========== 10.56MiB/s (In reply to Stefan Hajnoczi from comment #12) > Denis Lunev from Virtuozzo is aware of this bug and is investigating. It is > not POWER-specific, there are also regressions on x86. > https://www.mail-archive.com/qemu-devel@nongnu.org/msg742007.html > Stefan, do you know if Denis has provided a fix to this problem? Thanks Hi Laurent,
There is another problem in rhelav 8.4.0, throughput with queue=8 is worse than queue=1.
But this problem did not appear on the x86 platform.
And throughput with queue-size=1024 is worse than queue-size=128, also no problem on x86.
I know that performance problems are caused by multiple patches.
Do we need to open new bugs to track these issues separately to obtain a clear solution?
Kernel 4.18.0-255.el8.ppc64le
qemu-kvm-5.2.0-0.module+el8.4.0+8855+a9e237a9
How reproducible:
100%
Steps to Reproduce:
1.Boot guest
# /usr/libexec/qemu-kvm \
-S \
-name 'avocado-vt-vm1' \
-sandbox on \
-machine pseries \
-nodefaults \
-device VGA,bus=pci.0,addr=0x2 \
-m 4096 \
-smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \
-cpu 'host' \
-chardev socket,server,nowait,path=/var/tmp/avocado__1dmrhlq/monitor-qmpmonitor1-20201130-230458-KG1Zttbl,id=qmp_id_qmpmonitor1 \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,server,nowait,path=/var/tmp/avocado__1dmrhlq/monitor-catch_monitor-20201130-230458-KG1Zttbl,id=qmp_id_catch_monitor \
-mon chardev=qmp_id_catch_monitor,mode=control \
-chardev socket,server,nowait,path=/var/tmp/avocado__1dmrhlq/serial-serial0-20201130-230458-KG1Zttbl,id=chardev_serial0 \
-device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \
-device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
-blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kar/vt_test_images/rhel840-ppc64le-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-device virtio-scsi-pci,id=virtio_scsi_pci1,num_queues=1,bus=pci.0,addr=0x5 \ ==========================> set queue=1
-blockdev node-name=file_stg0,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/var/lib/avocado/data/avocado-vt/stg0.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_stg0,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_stg0 \
-device scsi-hd,id=stg0,bus=virtio_scsi_pci1.0,drive=drive_stg0,write-cache=on \
-device virtio-scsi-pci,id=virtio_scsi_pci2,num_queues=8,bus=pci.0,addr=0x6 \ ==========================> set queue=8
-blockdev node-name=file_stg1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/var/lib/avocado/data/avocado-vt/stg1.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_stg1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_stg1 \
-device scsi-hd,id=stg1,bus=virtio_scsi_pci2.0,drive=drive_stg1,write-cache=on \
-device virtio-net-pci,mac=9a:bd:09:82:1b:9c,id=idaH0RDh,netdev=idGfoXe6,bus=pci.0,addr=0x7 \
-netdev tap,id=idGfoXe6,vhost=on,vhostfd=21,fd=18 \
-vnc :0 \
-rtc base=utc,clock=host \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm
2.In guest use dd to test disk performance 5 times, and discard first test result to get a more accurate result.
23:07:55 DEBUG| Sending command: time dd if=/dev/zero of=/dev/sdb bs=1M count=5000 oflag=direct
23:09:16 DEBUG| Sending command: time dd if=/dev/zero of=/dev/sdc bs=1M count=5000 oflag=direct
23:10:38 DEBUG| Sending command: time dd if=/dev/zero of=/dev/sdb bs=1M count=5000 oflag=direct
23:11:59 DEBUG| Sending command: time dd if=/dev/zero of=/dev/sdc bs=1M count=5000 oflag=direct
23:13:19 DEBUG| Sending command: time dd if=/dev/zero of=/dev/sdb bs=1M count=5000 oflag=direct
23:14:40 DEBUG| Sending command: time dd if=/dev/zero of=/dev/sdc bs=1M count=5000 oflag=direct
23:16:01 DEBUG| Sending command: time dd if=/dev/zero of=/dev/sdb bs=1M count=5000 oflag=direct
23:17:21 DEBUG| Sending command: time dd if=/dev/zero of=/dev/sdc bs=1M count=5000 oflag=direct
23:18:43 DEBUG| Sending command: time dd if=/dev/zero of=/dev/sdb bs=1M count=5000 oflag=direct
23:20:03 DEBUG| Sending command: time dd if=/dev/zero of=/dev/sdc bs=1M count=5000 oflag=direct
23:21:24 DEBUG| Sending command: time dd if=/dev/zero of=/dev/sdb bs=1M count=5000 oflag=direct
23:22:45 DEBUG| Sending command: time dd if=/dev/zero of=/dev/sdc bs=1M count=5000 oflag=direct
3) Compare the result of two disk
23:24:06 INFO | 402.08 ==========================> set queue=1
23:24:06 INFO | 405.15 ==========================> set queue=8
(In reply to Laurent Vivier from comment #17) > (In reply to Stefan Hajnoczi from comment #12) > > Denis Lunev from Virtuozzo is aware of this bug and is investigating. It is > > not POWER-specific, there are also regressions on x86. > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg742007.html > > > > Stefan, > > do you know if Denis has provided a fix to this problem? No news. I think it will be necessary to solve this ourselves. Let's leave this BZ open because it contains information on how to trigger it on POWER. That may come in handy for investigating or verifying the issue. Feel free to assign it back to virt-maintainers, if you like. Placing in backlog ITR-'---' instead of '8.4.0' since no solution appears imminent. Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release. Its the same problem with bug 1871187,can't be reproduced stably.Can close as that bug. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |