Bug 2227799

Summary: [Azure][RHEL-9][CVM][Performance] Disk IOPS lower than ordinary VM
Product: Red Hat Enterprise Linux 9 Reporter: Li Tian <litian>
Component: kernelAssignee: Vitaly Kuznetsov <vkuznets>
kernel sub component: Hyper-V QA Contact: Li Tian <litian>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: andavis, bdas, litian, vkuznets, xuli, yacao, yuxisun
Version: 9.3   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-03 09:00:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Li Tian 2023-07-31 14:28:11 UTC
Description of problem:
Use below fio command to test disk reading performance on Azure Confidential VM. And the result shows IOPS is dramatically lower than that of ordinary VM.

# fio --filename=/dev/sdb --direct=1 --rw=read --sync=0 --time_based --bs=4k --size=4096M --numjobs=1 --iodepth=1 --runtime=30 --group_reporting --name=test --thread --ioengine=libaio

...
bw (  KiB/s): min= 3296, max= 3544, per=100.00%, avg=3444.20, stdev=49.35, samples=59
   iops        : min=  824, max=  886, avg=861.05, stdev=12.34, samples=59
  lat (usec)   : 750=86.91%, 1000=11.57%
  lat (msec)   : 2=1.46%, 4=0.05%, 10=0.01%
...

Version-Release number of selected component (if applicable):
5.14.0-340.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
The VM sizes I used are Standard_DC16ads_v5 and Standard_DC96ads_v5. Tested on the temp storage. 
1. # fio --filename=/dev/sdb --direct=1 --rw=read --sync=0 --time_based --bs=4k --size=4096M --numjobs=1 --iodepth=1 --runtime=30 --group_reporting --name=test --thread --ioengine=libaio

Actual results:
iops        : min=  824, max=  886, avg=861.05, stdev=12.34, samples=59

Expected results:
Tested on Standard_D4ds_v5:
iops        : min=10734, max=18974, avg=13713.49, stdev=1538.76, samples=59

Additional info:
1. Bisected issue to minor version - 195.el9:
5.14.0-194.el9.x86_64
   bw (  KiB/s): min=25256, max=56160, per=99.98%, avg=50932.88, stdev=6357.32, samples=59
   iops        : min= 6314, max=14040, avg=12733.22, stdev=1589.33, samples=59

5.14.0-195.el9.x86_64
   bw (  KiB/s): min= 1072, max= 3584, per=100.00%, avg=3452.24, stdev=320.44, samples=59
   iops        : min=  268, max=  896, avg=863.05, stdev=80.11, samples=59


2. Latest RHEL 8.9 also has this issue but not as bad:
   bw (  KiB/s): min=14816, max=36472, per=100.00%, avg=25072.03, stdev=4593.09, samples=59
   iops        : min= 3704, max= 9118, avg=6268.00, stdev=1148.27, samples=59

3. Pbench test suite with kernel-5.14.0-340.el9.x86_64:
http://pbench.perf.lab.eng.bos.redhat.com/users/virt-perftest-test/EC2::Standard-DC96ads-v5/fio_Azure_9.3.202307190_x86_64_gen2_localssd_quick_D230723T234533/fio_bs_4_1024_iod_1_njobs_1_read_write_rw_2023.07.24T03.45.33/result.html

4. Not sure if this BZ is related but introduced in the same version:
https://bugzilla.redhat.com/show_bug.cgi?id=2215362

Comment 1 Li Tian 2023-07-31 14:32:14 UTC
Hi Vitaly, just wanna bring your attention to this one. Seems happened on the same minor version as https://bugzilla.redhat.com/show_bug.cgi?id=2215362

Comment 2 Vitaly Kuznetsov 2023-08-02 17:15:09 UTC
This regression bisects to

commit 1d45e77775b808dcd7716f6b8219ee12f73d1855
Author: Nico Pache <npache>
Date:   Wed Nov 2 08:54:46 2022 -0600

    x86/PAT: Have pat_enabled() properly reflect state when running on Xen
    
    commit 72cbc8f04fe2fa93443c0fcccb7ad91dfea3d9ce
    Author: Jan Beulich <jbeulich>
    Date:   Thu Apr 28 16:50:29 2022 +0200
    
        x86/PAT: Have pat_enabled() properly reflect state when running on Xen

So I think we need 

commit 90b926e68f500844dff16b5bcea178dc55cf580a
Author: Juergen Gross <jgross>
Date:   Tue Jan 10 07:54:27 2023 +0100

    x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case

to fix the problem. Let me run some tests.