Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1871187

Summary:	~50% disk performance drop when comparing rhel8.3.0 with rhel8.2.0
Product:	Red Hat Enterprise Linux 8	Reporter:	Zhenyu Zhang <zhenyzha>
Component:	qemu-kvm	Assignee:	Laurent Vivier <lvivier>
qemu-kvm sub component:	General	QA Contact:	Xujun Ma <xuma>
Status:	CLOSED WORKSFORME	Docs Contact:
Severity:	high
Priority:	high	CC:	dgibson, gkurz, jinzhao, juzhang, ldoktor, lvivier, ngu, qzhang, virt-maint, yama
Version:	8.3	Keywords:	Regression, Triaged
Target Milestone:	rc	Flags:	pm-rhel: mirror+
Target Release:	---
Hardware:	ppc64le
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-05 08:49:33 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 1 Michael Roth 2020-08-28 19:13:29 UTC

is this the same issue mentioned in this earlier bug?

https://bugzilla.redhat.com/show_bug.cgi?id=1745823

Comment 2 David Gibson 2020-08-31 06:29:05 UTC

Zhenyu,

Are you still able to reproduce this bug and confirm it's a different issue from bug 1745823?

Comment 3 Zhenyu Zhang 2020-08-31 06:49:27 UTC

(In reply to David Gibson from comment #2)
> Zhenyu,
> 
> Are you still able to reproduce this bug and confirm it's a different issue
> from bug 1745823?

David,

I'm not sure.

I will test further to determine if it comes from the same issue as bug 1745823.

Update results later.

Comment 6 Laurent Vivier 2020-09-30 09:56:00 UTC

I'm not able to reproduce the problem reliably (write, 4k, 1, 16).

qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.ppc64le
  write: IOPS=5495, BW=21.5MiB/s (22.5MB/s)(1288MiB/60005msec)
  write: IOPS=2321, BW=9287KiB/s (9510kB/s)(544MiB/60002msec)
  write: IOPS=721, BW=2888KiB/s (2957kB/s)(169MiB/60027msec)
  write: IOPS=3917, BW=15.3MiB/s (16.0MB/s)(918MiB/60001msec)

qemu-kvm-4.2.0-19.module+el8.2.0+6296+6b821950
  write: IOPS=2407, BW=9629KiB/s (9860kB/s)(564MiB/60025msec)
  write: IOPS=2560, BW=10.0MiB/s (10.5MB/s)(600MiB/60026msec)
  write: IOPS=710, BW=2843KiB/s (2911kB/s)(167MiB/60025msec)
  write: IOPS=6739, BW=26.3MiB/s (27.6MB/s)(1580MiB/60027msec)

I think the difference we can see in comment 0 results can depend on the host behavior (CPU load or irq balancing) and not on qemu version.

Did you run the tests several times for both QEMU version?

What is the purpose of attaching vCPUs to node 0 and memory to node 8?

The polarion test case RHEL7-10493 doesn't describe this (and it uses raw image)

Comment 17 Laurent Vivier 2021-01-14 12:55:19 UTC

So I made several dozen tests...

First, the fix for the virqueue IRQ affinity doesn't improve the reliability of the results (we can have, in the same condition  11k IOPS or 50k IOPS).

For reference, on my host system, results are very stable:

fio --rw=write --bs=4k --iodepth=1 --runtime=1m --direct=1 --filename=/mnt/write_4k_1 --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --output=/tmp/fio_result &> /dev/null

RHEL-8.4

host 4.18.0-270.el8.ppc64le

  write: IOPS=74.1k, BW=290MiB/s (304MB/s)(16.0GiB/60001msec); 0 zone resets
  write: IOPS=74.6k, BW=291MiB/s (306MB/s)(17.1GiB/60001msec); 0 zone resets
  write: IOPS=74.3k, BW=290MiB/s (305MB/s)(17.0GiB/60001msec); 0 zone resets
  write: IOPS=74.3k, BW=290MiB/s (304MB/s)(17.0GiB/60002msec); 0 zone resets


For the guest, I think there is a problem, but not in perf drop, but in the reliability of the results, but it seems with qemu-5.2.0 the results seem more stable:

host  qemu-kvm-5.2.0-2.module+el8.4.0+9186+ec44380f.ppc64le

  write: IOPS=51.5k, BW=201MiB/s (211MB/s)(11.8GiB/60001msec); 0 zone resets
  write: IOPS=48.9k, BW=191MiB/s (200MB/s)(11.2GiB/60002msec); 0 zone resets
  write: IOPS=49.3k, BW=193MiB/s (202MB/s)(11.3GiB/60001msec); 0 zone resets
  write: IOPS=50.5k, BW=197MiB/s (207MB/s)(11.6GiB/60002msec); 0 zone resets
  write: IOPS=54.1k, BW=211MiB/s (222MB/s)(12.4GiB/60002msec); 0 zone resets

With qemu-4.2.0, we can have 14k IOPS and 38k IOPS for the same test:

qemu-kvm-4.2.0-40.module+el8.4.0+9278+dd53883d

  write: IOPS=13.7k, BW=53.3MiB/s (55.9MB/s)(3200MiB/60003msec); 0 zone resets
  write: IOPS=24.0k, BW=93.8MiB/s (98.4MB/s)(5631MiB/60003msec); 0 zone resets
  write: IOPS=21.3k, BW=83.1MiB/s (87.1MB/s)(4986MiB/60002msec); 0 zone resets
  write: IOPS=23.7k, BW=92.4MiB/s (96.9MB/s)(5543MiB/60003msec); 0 zone resets
  write: IOPS=38.6k, BW=151MiB/s (158MB/s)(9051MiB/60001msec); 0 zone resets

But we have the same kind of problem with qemu-2.12.0, so this is not a regression:

qemu-kvm-2.12.0-99.module+el8.2.0+7988+c1d02dbb.4

  write: IOPS=9583, BW=37.4MiB/s (39.3MB/s)(2246MiB/60002msec); 0 zone resets
  write: IOPS=13.0k, BW=54.6MiB/s (57.2MB/s)(3275MiB/60002msec); 0 zone resets
  write: IOPS=58.3k, BW=228MiB/s (239MB/s)(13.3GiB/60001msec); 0 zone resets
  write: IOPS=12.3k, BW=48.1MiB/s (50.5MB/s)(2888MiB/60003msec); 0 zone resets
  write: IOPS=15.3k, BW=59.9MiB/s (62.8MB/s)(3595MiB/60003msec); 0 zone resets

I tried to see the impact of the aio parameter on the performance, but aio=threads gives better results:

upstream v5.2.0

aio=threads

  write: IOPS=47.7k, BW=186MiB/s (195MB/s)(10.9GiB/60001msec); 0 zone resets
  write: IOPS=48.7k, BW=190MiB/s (199MB/s)(11.1GiB/60002msec); 0 zone resets
  write: IOPS=37.1k, BW=145MiB/s (152MB/s)(8702MiB/60002msec); 0 zone resets

aio=native

  write: IOPS=30.4k, BW=119MiB/s (125MB/s)(7129MiB/60002msec); 0 zone resets
  write: IOPS=33.2k, BW=130MiB/s (136MB/s)(7788MiB/60003msec); 0 zone resets
  write: IOPS=24.4k, BW=95.2MiB/s (99.8MB/s)(5713MiB/60003msec); 0 zone resets

Comment 18 Laurent Vivier 2021-01-14 13:00:16 UTC

So I've tried to have more reliable results using more defined environment:

- use raw disk image directly into a logical volume (to by-pass the FS at the host level)

- adding CPU affinity before to boot the guest: each vCPU is binded to a CPU on the host from the same NUMA node.

- bind QEMU memory to the NUMA node of the CPUs

And I reran tests for several upstream versions of QEMU:

v2.12.0

  write: IOPS=19.2k, BW=74.9MiB/s (78.5MB/s)(4494MiB/60006msec); 0 zone resets
  write: IOPS=9984, BW=39.0MiB/s (40.9MB/s)(2340MiB/60002msec); 0 zone resets
  write: IOPS=15.1k, BW=59.2MiB/s (62.1MB/s)(3551MiB/60003msec); 0 zone resets
  write: IOPS=11.0k, BW=43.1MiB/s (45.2MB/s)(2586MiB/60003msec); 0 zone resets

v3.0.0

  write: IOPS=19.4k, BW=75.8MiB/s (79.4MB/s)(4546MiB/60004msec); 0 zone resets
  write: IOPS=11.3k, BW=44.0MiB/s (46.2MB/s)(2641MiB/60002msec); 0 zone resets
  write: IOPS=13.4k, BW=52.4MiB/s (54.9MB/s)(3144MiB/60003msec); 0 zone resets

v3.1.0

  write: IOPS=9889, BW=38.6MiB/s (40.5MB/s)(2318MiB/60002msec); 0 zone resets
  write: IOPS=16.7k, BW=65.2MiB/s (68.4MB/s)(3913MiB/60002msec); 0 zone resets
  write: IOPS=14.7k, BW=57.5MiB/s (60.3MB/s)(3449MiB/60003msec); 0 zone resets
  write: IOPS=14.1k, BW=55.2MiB/s (57.9MB/s)(3311MiB/60003msec); 0 zone resets

v4.0.0

  write: IOPS=42.9k, BW=168MiB/s (176MB/s)(9.82GiB/60001msec); 0 zone resets
  write: IOPS=14.6k, BW=57.2MiB/s (59.0MB/s)(3432MiB/60005msec); 0 zone resets
  write: IOPS=56.7k, BW=221MiB/s (232MB/s)(12.0GiB/60002msec); 0 zone resets

v4.1.0

  write: IOPS=12.6k, BW=49.1MiB/s (51.5MB/s)(2947MiB/60003msec); 0 zone resets
  write: IOPS=11.1k, BW=43.4MiB/s (45.5MB/s)(2605MiB/60003msec); 0 zone resets
  write: IOPS=16.4k, BW=64.2MiB/s (67.3MB/s)(3854MiB/60004msec); 0 zone resets
  write: IOPS=14.3k, BW=55.7MiB/s (58.4MB/s)(3344MiB/60003msec); 0 zone resets

v4.2.0

  write: IOPS=59.9k, BW=234MiB/s (245MB/s)(13.7GiB/60003msec); 0 zone resets
  write: IOPS=23.6k, BW=92.0MiB/s (96.5MB/s)(5523MiB/60004msec); 0 zone resets
  write: IOPS=12.4k, BW=48.4MiB/s (50.7MB/s)(2902MiB/60002msec); 0 zone resets

v5.0.0

  write: IOPS=56.9k, BW=222MiB/s (233MB/s)(13.0GiB/60001msec); 0 zone resets
  write: IOPS=16.8k, BW=65.7MiB/s (68.9MB/s)(3943MiB/60003msec); 0 zone resets
  write: IOPS=28.4k, BW=111MiB/s (116MB/s)(6651MiB/60003msec); 0 zone resets

v5.1.0

  write: IOPS=32.5k, BW=127MiB/s (133MB/s)(7606MiB/60002msec); 0 zone resets
  write: IOPS=16.0k, BW=66.2MiB/s (69.4MB/s)(3973MiB/60003msec); 0 zone resets

v5.2.0

  write: IOPS=38.6k, BW=151MiB/s (158MB/s)(9042MiB/60002msec); 0 zone resets
  write: IOPS=45.2k, BW=176MiB/s (185MB/s)(10.3GiB/60001msec); 0 zone resets
  write: IOPS=46.9k, BW=183MiB/s (192MB/s)(10.7GiB/60002msec); 0 zone resets
  write: IOPS=22.9k, BW=89.5MiB/s (93.8MB/s)(5369MiB/60002msec); 0 zone resets

We can see a big performance improvement between v3.1.0 abd v4.0.0 (but results are not reliable), a regression between v4.0.0 and v4.1.0, and again an improvement with v4.2.0 but with no reliable results (can be either 60 kIOPS or 12 kIOPS...)

Comment 19 Lukáš Doktor 2021-01-15 07:24:20 UTC

Hello Laurent, have you considered using a ramdisk for testing purposes?

Comment 20 Laurent Vivier 2021-01-15 08:54:22 UTC

(In reply to Lukas Doktor from comment #19)
> Hello Laurent, have you considered using a ramdisk for testing purposes?

I will do that for my next tests.

For the moment, I think the problem comes from the fact we use fio with 16 jobs shared on 64 CPUs, but we have only 1 virtqueue to process the jobs.

So I think CPUs are fighting each other to access the virtqueue. v5.2.0 has introduced a new behavior where the number of virtqueue is equal to the number of CPUs. It could explain why v5.2 behaves better, but this is not enough to remove the high variability we have on one run to another (on host it is very stable).

Comment 21 Laurent Vivier 2021-02-01 10:42:14 UTC

Deferred to RHEL 8.5.0 as I don't have a fix for the moment.

Comment 22 David Gibson 2021-05-26 05:47:40 UTC

Seems unlikely we'll get this ready for 8.5 either, deferring to backlog.

Comment 24 Xujun Ma 2021-07-05 07:32:14 UTC

Sometimes, I can't reproduce this problem with some hosts, it seems to be related to the host environment.
We can close this bug first because of unstable result.