Bug 1772738

Summary: kvm nx_huge_pages_recovery_ratio=0 is needed to meet KVM-RT low latency requirement
Product: Red Hat Enterprise Linux 8 Reporter: Pei Zhang <pezhang>
Component: kernel-rtAssignee: Clark Williams <williams>
kernel-rt sub component: KVM QA Contact: Pei Zhang <pezhang>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: bhu, chayang, jinzhao, jlelli, juzhang, knoel, mstowell, pbonzini, trix, virt-maint
Version: 8.2   
Target Milestone: rc   
Target Release: 8.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-rt-4.18.0-173.rt13.30.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1772894 (view as bug list) Environment:
Last Closed: 2020-04-28 15:25:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1640832, 1722609, 1772894    

Description Pei Zhang 2019-11-15 02:37:13 UTC
Description of problem:
With a series of CVE bzs fix(will be added in next comment), we need to set kvm nx_huge_pages_recovery_ratio=0 to keep KVM-RT low latency. 

Version-Release number of selected component (if applicable):
microcode_ctl-2.1-47.8.el7_6.x86_64
kernel-rt-3.10.0-957.38.2.rt56.952.el7.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.7.x86_64
tuned-2.10.0-6.el7_6.4.noarch
libvirt-4.5.0-10.el7_6.14.x86_64

How reproducible:
100% 

Steps to Reproduce:
1. Testing three standard KVM-RT testing scenarios


Actual results:
cyclictest 1h testing: max latency is 5585us.

Expected results:
cyclictest 24h testing: max latency should < 40us.

Additional info:

Comment 4 Paolo Bonzini 2019-11-15 13:13:52 UTC
Patch at https://patchwork.kernel.org/patch/11242035/

Comment 5 Paolo Bonzini 2019-11-15 13:16:14 UTC
*** Bug 1772653 has been marked as a duplicate of this bug. ***

Comment 6 Beth Uptagrafft 2019-12-04 20:36:17 UTC
*** Bug 1779458 has been marked as a duplicate of this bug. ***

Comment 8 Juri Lelli 2020-01-25 10:01:05 UTC
Patches merged in kernel-rt-4.18.0-173.rt13.30.el8.

Comment 11 Pei Zhang 2020-02-06 00:51:00 UTC
Verified with kernel-rt-4.18.0-176.rt13.33.el8.x86_64.

1. Default value of kvm nx_huge_pages_recovery_ratio=0. This is expected.

# systool -vm kvm | grep nx_huge
    nx_huge_pages_recovery_ratio= "0"
    nx_huge_pages       = "Y"


2. kvm-rt acceptance testing get PASS.

With mitigation on, 1h cyclictest max latency is 31.


==Results==
(1)Single VM with 1 rt vCPU:
# Min Latencies: 00008
# Avg Latencies: 00009
# Max Latencies: 00028

(2)Single VM with 8 rt vCPUs:
# Min Latencies: 00008 00012 00012 00012 00012 00012 00012 00012
# Avg Latencies: 00010 00012 00012 00012 00012 00012 00012 00012
# Max Latencies: 00024 00025 00024 00029 00027 00025 00023 00028

(3)Multiple VMs each with 1 rt vCPU:
- VM1
# Min Latencies: 00007
# Avg Latencies: 00009
# Max Latencies: 00028

- VM2
# Min Latencies: 00007
# Avg Latencies: 00009
# Max Latencies: 00026

- VM3
# Min Latencies: 00007
# Avg Latencies: 00009
# Max Latencies: 00031

- VM4
# Min Latencies: 00007
# Avg Latencies: 00009
# Max Latencies: 00025


==Versions==
kernel-rt-4.18.0-176.rt13.33.el8.x86_64
tuned-2.13.0-3.el8.noarch
microcode_ctl-20191115-4.el8.x86_64
qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64
python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64


==Details of this testing==
- Host kernel line:
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-176.rt13.33.el8.x86_64 root=/dev/mapper/rhel_dell--per430--09-root ro crashkernel=auto resume=/dev/mapper/rhel_dell--per430--09-swap rd.lvm.lv=rhel_dell-per430-09/root rd.lvm.lv=rhel_dell-per430-09/swap console=ttyS0,115200n81 skew_tick=1 isolcpus=1,3,5,7,9,11,13,15,17,19,12,14,16,18 intel_pstate=disable nosoftlockup nohz=on nohz_full=1,3,5,7,9,11,13,15,17,19,12,14,16,18 rcu_nocbs=1,3,5,7,9,11,13,15,17,19,12,14,16,18 default_hugepagesz=1G iommu=pt intel_iommu=on tsc=nowatchdog


- Testing info of three test cases:
(1)Single VM with 1 rt vCPU:
Test started at:     2020-02-05 15:19:28 Wednesday
Kernel cmdline:      BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-176.rt13.33.el8.x86_64 root=/dev/mapper/rhel_vm--73--232-root ro console=tty0 console=ttyS0,115200n8 biosdevname=0 crashkernel=auto resume=/dev/mapper/rhel_vm--73--232-swap rd.lvm.lv=rhel_vm-73-232/root rd.lvm.lv=rhel_vm-73-232/swap skew_tick=1 isolcpus=1 intel_pstate=disable nosoftlockup nohz=on nohz_full=1 rcu_nocbs=1 default_hugepagesz=1G iommu=pt intel_iommu=on tsc=nowatchdog
X86 debug pts:       pti_enable= ibpb_enabled= ibrs_enabled= retp_enabled=
Machine:             vm-73-232.lab.eng.pek2.redhat.com
CPU:                 Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
Test duration(plan): 1h
Test ended at:       2020-02-05 16:19:30 Wednesday
cyclictest cmdline:  taskset -c 1 cyclictest -m -q -p95 -D 1h -h60 -t 1 -a 1 -i 200
cyclictest results: 

# Min Latencies: 00008
# Avg Latencies: 00009
# Max Latencies: 00028


(2)Single VM with 8 rt vCPUs:
Test started at:     2020-02-05 17:35:43 Wednesday
Kernel cmdline:      BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-176.rt13.33.el8.x86_64 root=/dev/mapper/rhel_vm--74--38-root ro console=tty0 console=ttyS0,115200n8 biosdevname=0 crashkernel=auto resume=/dev/mapper/rhel_vm--74--38-swap rd.lvm.lv=rhel_vm-74-38/root rd.lvm.lv=rhel_vm-74-38/swap skew_tick=1 isolcpus=2,3,4,5,6,7,8,9 intel_pstate=disable nosoftlockup nohz=on nohz_full=2,3,4,5,6,7,8,9 rcu_nocbs=2,3,4,5,6,7,8,9 default_hugepagesz=1G iommu=pt intel_iommu=on tsc=nowatchdog
X86 debug pts:       pti_enable= ibpb_enabled= ibrs_enabled= retp_enabled=
Machine:             vm-74-38.lab.eng.pek2.redhat.com
CPU:                 Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
Test duration(plan): 1h
Test ended at:       2020-02-05 18:35:44 Wednesday
cyclictest cmdline:  taskset -c 2,3,4,5,6,7,8,9 cyclictest -m -q -p95 -D 1h -h60 -t 8 -a 2,3,4,5,6,7,8,9 -i 200
cyclictest results: 

# Min Latencies: 00008 00012 00012 00012 00012 00012 00012 00012
# Avg Latencies: 00010 00012 00012 00012 00012 00012 00012 00012
# Max Latencies: 00024 00025 00024 00029 00027 00025 00023 00028


(3)Multiple VMs each with 1 rt vCPU:
- VM1
Test started at:     2020-02-05 20:35:22 Wednesday
Kernel cmdline:      BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-176.rt13.33.el8.x86_64 root=/dev/mapper/rhel-root ro console=tty0 console=ttyS0,115200n8 biosdevname=0 crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap skew_tick=1 isolcpus=1 intel_pstate=disable nosoftlockup nohz=on nohz_full=1 rcu_nocbs=1 default_hugepagesz=1G iommu=pt intel_iommu=on tsc=nowatchdog
X86 debug pts:       pti_enable= ibpb_enabled= ibrs_enabled= retp_enabled=
Machine:             bootp-73-75-32.lab.eng.pek2.redhat.com
CPU:                 Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
Test duration(plan): 1h
Test ended at:       2020-02-05 21:35:24 Wednesday
cyclictest cmdline:  taskset -c 1 cyclictest -m -q -p95 -D 1h -h60 -t 1 -a 1 -i 200
cyclictest results: 

# Min Latencies: 00007
# Avg Latencies: 00009
# Max Latencies: 00028


- VM2
Test started at:     2020-02-05 20:35:22 Wednesday
Kernel cmdline:      BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-176.rt13.33.el8.x86_64 root=/dev/mapper/rhel-root ro console=tty0 console=ttyS0,115200n8 biosdevname=0 crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap skew_tick=1 isolcpus=1 intel_pstate=disable nosoftlockup nohz=on nohz_full=1 rcu_nocbs=1 default_hugepagesz=1G iommu=pt intel_iommu=on tsc=nowatchdog
X86 debug pts:       pti_enable= ibpb_enabled= ibrs_enabled= retp_enabled=
Machine:             vm-73-207.lab.eng.pek2.redhat.com
CPU:                 Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
Test duration(plan): 1h
Test ended at:       2020-02-05 21:35:24 Wednesday
cyclictest cmdline:  taskset -c 1 cyclictest -m -q -p95 -D 1h -h60 -t 1 -a 1 -i 200
cyclictest results: 

# Min Latencies: 00007
# Avg Latencies: 00009
# Max Latencies: 00026


- VM3
Test started at:     2020-02-05 20:35:22 Wednesday
Kernel cmdline:      BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-176.rt13.33.el8.x86_64 root=/dev/mapper/rhel-root ro console=tty0 console=ttyS0,115200n8 biosdevname=0 crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap skew_tick=1 isolcpus=1 intel_pstate=disable nosoftlockup nohz=on nohz_full=1 rcu_nocbs=1 default_hugepagesz=1G iommu=pt intel_iommu=on tsc=nowatchdog
X86 debug pts:       pti_enable= ibpb_enabled= ibrs_enabled= retp_enabled=
Machine:             bootp-73-75-34.lab.eng.pek2.redhat.com
CPU:                 Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
Test duration(plan): 1h
Test ended at:       2020-02-05 21:35:24 Wednesday
cyclictest cmdline:  taskset -c 1 cyclictest -m -q -p95 -D 1h -h60 -t 1 -a 1 -i 200
cyclictest results: 

# Min Latencies: 00007
# Avg Latencies: 00009
# Max Latencies: 00031


- VM4
Test started at:     2020-02-05 20:35:22 Wednesday
Kernel cmdline:      BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-176.rt13.33.el8.x86_64 root=/dev/mapper/rhel_vm--74--133-root ro console=tty0 console=ttyS0,115200n8 biosdevname=0 crashkernel=auto resume=/dev/mapper/rhel_vm--74--133-swap rd.lvm.lv=rhel_vm-74-133/root rd.lvm.lv=rhel_vm-74-133/swap skew_tick=1 isolcpus=1 intel_pstate=disable nosoftlockup nohz=on nohz_full=1 rcu_nocbs=1 default_hugepagesz=1G iommu=pt intel_iommu=on tsc=nowatchdog
X86 debug pts:       pti_enable= ibpb_enabled= ibrs_enabled= retp_enabled=
Machine:             vm-74-133.lab.eng.pek2.redhat.com
CPU:                 Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
Test duration(plan): 1h
Test ended at:       2020-02-05 21:35:24 Wednesday
cyclictest cmdline:  taskset -c 1 cyclictest -m -q -p95 -D 1h -h60 -t 1 -a 1 -i 200
cyclictest results: 

# Min Latencies: 00007
# Avg Latencies: 00009
# Max Latencies: 00025

So this bug has been fixed well. Move to 'VERIFIED'.

Comment 14 errata-xmlrpc 2020-04-28 15:25:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1567