Bug 1871187
| Summary: | ~50% disk performance drop when comparing rhel8.3.0 with rhel8.2.0 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Zhenyu Zhang <zhenyzha> |
| Component: | qemu-kvm | Assignee: | Laurent Vivier <lvivier> |
| qemu-kvm sub component: | General | QA Contact: | Xujun Ma <xuma> |
| Status: | CLOSED WORKSFORME | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | dgibson, gkurz, jinzhao, juzhang, ldoktor, lvivier, ngu, qzhang, virt-maint, yama |
| Version: | 8.3 | Keywords: | Regression, Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | ppc64le | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-05 08:49:33 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 1
Michael Roth
2020-08-28 19:13:29 UTC
Zhenyu, Are you still able to reproduce this bug and confirm it's a different issue from bug 1745823? (In reply to David Gibson from comment #2) > Zhenyu, > > Are you still able to reproduce this bug and confirm it's a different issue > from bug 1745823? David, I'm not sure. I will test further to determine if it comes from the same issue as bug 1745823. Update results later. I'm not able to reproduce the problem reliably (write, 4k, 1, 16). qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.ppc64le write: IOPS=5495, BW=21.5MiB/s (22.5MB/s)(1288MiB/60005msec) write: IOPS=2321, BW=9287KiB/s (9510kB/s)(544MiB/60002msec) write: IOPS=721, BW=2888KiB/s (2957kB/s)(169MiB/60027msec) write: IOPS=3917, BW=15.3MiB/s (16.0MB/s)(918MiB/60001msec) qemu-kvm-4.2.0-19.module+el8.2.0+6296+6b821950 write: IOPS=2407, BW=9629KiB/s (9860kB/s)(564MiB/60025msec) write: IOPS=2560, BW=10.0MiB/s (10.5MB/s)(600MiB/60026msec) write: IOPS=710, BW=2843KiB/s (2911kB/s)(167MiB/60025msec) write: IOPS=6739, BW=26.3MiB/s (27.6MB/s)(1580MiB/60027msec) I think the difference we can see in comment 0 results can depend on the host behavior (CPU load or irq balancing) and not on qemu version. Did you run the tests several times for both QEMU version? What is the purpose of attaching vCPUs to node 0 and memory to node 8? The polarion test case RHEL7-10493 doesn't describe this (and it uses raw image) So I made several dozen tests... First, the fix for the virqueue IRQ affinity doesn't improve the reliability of the results (we can have, in the same condition 11k IOPS or 50k IOPS). For reference, on my host system, results are very stable: fio --rw=write --bs=4k --iodepth=1 --runtime=1m --direct=1 --filename=/mnt/write_4k_1 --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --output=/tmp/fio_result &> /dev/null RHEL-8.4 host 4.18.0-270.el8.ppc64le write: IOPS=74.1k, BW=290MiB/s (304MB/s)(16.0GiB/60001msec); 0 zone resets write: IOPS=74.6k, BW=291MiB/s (306MB/s)(17.1GiB/60001msec); 0 zone resets write: IOPS=74.3k, BW=290MiB/s (305MB/s)(17.0GiB/60001msec); 0 zone resets write: IOPS=74.3k, BW=290MiB/s (304MB/s)(17.0GiB/60002msec); 0 zone resets For the guest, I think there is a problem, but not in perf drop, but in the reliability of the results, but it seems with qemu-5.2.0 the results seem more stable: host qemu-kvm-5.2.0-2.module+el8.4.0+9186+ec44380f.ppc64le write: IOPS=51.5k, BW=201MiB/s (211MB/s)(11.8GiB/60001msec); 0 zone resets write: IOPS=48.9k, BW=191MiB/s (200MB/s)(11.2GiB/60002msec); 0 zone resets write: IOPS=49.3k, BW=193MiB/s (202MB/s)(11.3GiB/60001msec); 0 zone resets write: IOPS=50.5k, BW=197MiB/s (207MB/s)(11.6GiB/60002msec); 0 zone resets write: IOPS=54.1k, BW=211MiB/s (222MB/s)(12.4GiB/60002msec); 0 zone resets With qemu-4.2.0, we can have 14k IOPS and 38k IOPS for the same test: qemu-kvm-4.2.0-40.module+el8.4.0+9278+dd53883d write: IOPS=13.7k, BW=53.3MiB/s (55.9MB/s)(3200MiB/60003msec); 0 zone resets write: IOPS=24.0k, BW=93.8MiB/s (98.4MB/s)(5631MiB/60003msec); 0 zone resets write: IOPS=21.3k, BW=83.1MiB/s (87.1MB/s)(4986MiB/60002msec); 0 zone resets write: IOPS=23.7k, BW=92.4MiB/s (96.9MB/s)(5543MiB/60003msec); 0 zone resets write: IOPS=38.6k, BW=151MiB/s (158MB/s)(9051MiB/60001msec); 0 zone resets But we have the same kind of problem with qemu-2.12.0, so this is not a regression: qemu-kvm-2.12.0-99.module+el8.2.0+7988+c1d02dbb.4 write: IOPS=9583, BW=37.4MiB/s (39.3MB/s)(2246MiB/60002msec); 0 zone resets write: IOPS=13.0k, BW=54.6MiB/s (57.2MB/s)(3275MiB/60002msec); 0 zone resets write: IOPS=58.3k, BW=228MiB/s (239MB/s)(13.3GiB/60001msec); 0 zone resets write: IOPS=12.3k, BW=48.1MiB/s (50.5MB/s)(2888MiB/60003msec); 0 zone resets write: IOPS=15.3k, BW=59.9MiB/s (62.8MB/s)(3595MiB/60003msec); 0 zone resets I tried to see the impact of the aio parameter on the performance, but aio=threads gives better results: upstream v5.2.0 aio=threads write: IOPS=47.7k, BW=186MiB/s (195MB/s)(10.9GiB/60001msec); 0 zone resets write: IOPS=48.7k, BW=190MiB/s (199MB/s)(11.1GiB/60002msec); 0 zone resets write: IOPS=37.1k, BW=145MiB/s (152MB/s)(8702MiB/60002msec); 0 zone resets aio=native write: IOPS=30.4k, BW=119MiB/s (125MB/s)(7129MiB/60002msec); 0 zone resets write: IOPS=33.2k, BW=130MiB/s (136MB/s)(7788MiB/60003msec); 0 zone resets write: IOPS=24.4k, BW=95.2MiB/s (99.8MB/s)(5713MiB/60003msec); 0 zone resets So I've tried to have more reliable results using more defined environment: - use raw disk image directly into a logical volume (to by-pass the FS at the host level) - adding CPU affinity before to boot the guest: each vCPU is binded to a CPU on the host from the same NUMA node. - bind QEMU memory to the NUMA node of the CPUs And I reran tests for several upstream versions of QEMU: v2.12.0 write: IOPS=19.2k, BW=74.9MiB/s (78.5MB/s)(4494MiB/60006msec); 0 zone resets write: IOPS=9984, BW=39.0MiB/s (40.9MB/s)(2340MiB/60002msec); 0 zone resets write: IOPS=15.1k, BW=59.2MiB/s (62.1MB/s)(3551MiB/60003msec); 0 zone resets write: IOPS=11.0k, BW=43.1MiB/s (45.2MB/s)(2586MiB/60003msec); 0 zone resets v3.0.0 write: IOPS=19.4k, BW=75.8MiB/s (79.4MB/s)(4546MiB/60004msec); 0 zone resets write: IOPS=11.3k, BW=44.0MiB/s (46.2MB/s)(2641MiB/60002msec); 0 zone resets write: IOPS=13.4k, BW=52.4MiB/s (54.9MB/s)(3144MiB/60003msec); 0 zone resets v3.1.0 write: IOPS=9889, BW=38.6MiB/s (40.5MB/s)(2318MiB/60002msec); 0 zone resets write: IOPS=16.7k, BW=65.2MiB/s (68.4MB/s)(3913MiB/60002msec); 0 zone resets write: IOPS=14.7k, BW=57.5MiB/s (60.3MB/s)(3449MiB/60003msec); 0 zone resets write: IOPS=14.1k, BW=55.2MiB/s (57.9MB/s)(3311MiB/60003msec); 0 zone resets v4.0.0 write: IOPS=42.9k, BW=168MiB/s (176MB/s)(9.82GiB/60001msec); 0 zone resets write: IOPS=14.6k, BW=57.2MiB/s (59.0MB/s)(3432MiB/60005msec); 0 zone resets write: IOPS=56.7k, BW=221MiB/s (232MB/s)(12.0GiB/60002msec); 0 zone resets v4.1.0 write: IOPS=12.6k, BW=49.1MiB/s (51.5MB/s)(2947MiB/60003msec); 0 zone resets write: IOPS=11.1k, BW=43.4MiB/s (45.5MB/s)(2605MiB/60003msec); 0 zone resets write: IOPS=16.4k, BW=64.2MiB/s (67.3MB/s)(3854MiB/60004msec); 0 zone resets write: IOPS=14.3k, BW=55.7MiB/s (58.4MB/s)(3344MiB/60003msec); 0 zone resets v4.2.0 write: IOPS=59.9k, BW=234MiB/s (245MB/s)(13.7GiB/60003msec); 0 zone resets write: IOPS=23.6k, BW=92.0MiB/s (96.5MB/s)(5523MiB/60004msec); 0 zone resets write: IOPS=12.4k, BW=48.4MiB/s (50.7MB/s)(2902MiB/60002msec); 0 zone resets v5.0.0 write: IOPS=56.9k, BW=222MiB/s (233MB/s)(13.0GiB/60001msec); 0 zone resets write: IOPS=16.8k, BW=65.7MiB/s (68.9MB/s)(3943MiB/60003msec); 0 zone resets write: IOPS=28.4k, BW=111MiB/s (116MB/s)(6651MiB/60003msec); 0 zone resets v5.1.0 write: IOPS=32.5k, BW=127MiB/s (133MB/s)(7606MiB/60002msec); 0 zone resets write: IOPS=16.0k, BW=66.2MiB/s (69.4MB/s)(3973MiB/60003msec); 0 zone resets v5.2.0 write: IOPS=38.6k, BW=151MiB/s (158MB/s)(9042MiB/60002msec); 0 zone resets write: IOPS=45.2k, BW=176MiB/s (185MB/s)(10.3GiB/60001msec); 0 zone resets write: IOPS=46.9k, BW=183MiB/s (192MB/s)(10.7GiB/60002msec); 0 zone resets write: IOPS=22.9k, BW=89.5MiB/s (93.8MB/s)(5369MiB/60002msec); 0 zone resets We can see a big performance improvement between v3.1.0 abd v4.0.0 (but results are not reliable), a regression between v4.0.0 and v4.1.0, and again an improvement with v4.2.0 but with no reliable results (can be either 60 kIOPS or 12 kIOPS...) Hello Laurent, have you considered using a ramdisk for testing purposes? (In reply to Lukas Doktor from comment #19) > Hello Laurent, have you considered using a ramdisk for testing purposes? I will do that for my next tests. For the moment, I think the problem comes from the fact we use fio with 16 jobs shared on 64 CPUs, but we have only 1 virtqueue to process the jobs. So I think CPUs are fighting each other to access the virtqueue. v5.2.0 has introduced a new behavior where the number of virtqueue is equal to the number of CPUs. It could explain why v5.2 behaves better, but this is not enough to remove the high variability we have on one run to another (on host it is very stable). Deferred to RHEL 8.5.0 as I don't have a fix for the moment. Seems unlikely we'll get this ready for 8.5 either, deferring to backlog. Sometimes, I can't reproduce this problem with some hosts, it seems to be related to the host environment. We can close this bug first because of unstable result. |