Bug 1779455

Summary:	There is latency spike(99us) with kvm nx_huge_pages_recovery_ratio=0 nx_huge_pages=Y
Product:	Red Hat Enterprise Linux 7	Reporter:	Pei Zhang <pezhang>
Component:	kernel-rt	Assignee:	Luis Claudio R. Goncalves <lgoncalv>
kernel-rt sub component:	KVM	QA Contact:	Pei Zhang <pezhang>
Status:	CLOSED DUPLICATE	Docs Contact:
Severity:	high
Priority:	high	CC:	bhu, chayang, jinzhao, juzhang, knoel, lcapitulino, lgoncalv, mtosatti, pbonzini, peterx, trix, virt-maint, williams
Version:	7.8	Keywords:	Reopened
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1779458 (view as bug list)		Environment:
Last Closed:	2020-01-03 09:08:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1672377, 1779458

Description Pei Zhang 2019-12-04 02:30:32 UTC

Description of problem:
Doing cyclictest with kvm nx_huge_pages_recovery_ratio=0, the 6h max latency can reach 99us.

Version-Release number of selected component (if applicable):
kernel-rt-3.10.0-957.42.1.rt56.955.el7.x86_64
microcode_ctl-2.1-47.12.el7_6.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.7.x86_64
rt-tests-1.0-12.el7.x86_64
tuned-2.10.0-6.el7_6.4.noarch
libvirt-4.5.0-10.el7_6.15.x86_64

How reproducible:
2/3

Steps to Reproduce:
1. On RT host, boot RT guest with 10 vCPUs(8 are RT vcpus)

2. Start cyclictest in RT guest for 6 hours, the max latency can reach 99us.
(1) On RT host, compiling kernel with $twice_of_host_housekeeping_cores

(2) On RT guest,compiling kernel with $twice_of_guest_housekeeping_cores

(3) On RT guest, start cyclictest, running 6 hours
# taskset -c 1,2,3,4,5,6,7,8 cyclictest -m -n -q -p95 -D 6h -h60 -t 8 -a 1,2,3,4,5,6,7,8 --notrace -i 200

Result:

run1(max latency is 82us):
Single VM with 8 rt vCPUs:
# Min Latencies: 00006 00007 00007 00007 00007 00007 00007 00007
# Avg Latencies: 00009 00007 00009 00007 00007 00007 00007 00007
# Max Latencies: 00032 00019 00023 00019 00019 00025 00019 00082


run2(max latency is 99us):
Single VM with 8 rt vCPUs:
# Min Latencies: 00006 00007 00007 00007 00007 00007 00007 00007
# Avg Latencies: 00008 00007 00007 00007 00007 00007 00007 00007
# Max Latencies: 00084 00082 00098 00099 00085 00033 00027 00019

run3(max latency looks good, is 24us):
Single VM with 8 rt vCPUs:
# Min Latencies: 00006 00006 00007 00006 00006 00006 00006 00006
# Avg Latencies: 00008 00007 00007 00007 00007 00007 00007 00007
# Max Latencies: 00024 00018 00030 00018 00019 00022 00018 00019

Actual results:
The max latency exceeds 40us.

Expected results:
The max latency should less then 40us.

Additional info:

Comment 5 Clark Williams 2019-12-04 20:33:06 UTC


*** This bug has been marked as a duplicate of bug 1772894 ***

Comment 14 Pei Zhang 2019-12-06 15:59:56 UTC

Hi Peter, Luiz, Marcelo, Paolo,

With kvm nx_huge_pages_recovery_ratio=0 nx_huge_pages=N, the latency looks good, 6h cyclictest max latency is 29us (the meltdown&spectre mitigations were enabled in these runs: pti_enable=1 ibpb_enabled=1 ibrs_enabled=0 retp_enabled=1). 3/3 PASS.

Run1:
6h cyclictest testing results:
(2)Single VM with 8 rt vCPUs(max latency is 22):
# Min Latencies: 00006 00006 00006 00006 00006 00006 00006 00006
# Avg Latencies: 00008 00007 00006 00007 00007 00007 00007 00007
# Max Latencies: 00022 00020 00018 00018 00018 00018 00019 00019

Run2:
6h cyclictest testing results(max latency is 29):
(2)Single VM with 8 rt vCPUs:
# Min Latencies: 00006 00007 00007 00007 00007 00007 00006 00006
# Avg Latencies: 00008 00007 00007 00007 00007 00007 00007 00007
# Max Latencies: 00023 00019 00020 00022 00019 00019 00019 00029

Run3:
6h cyclictest testing results(max latency is 23):
(2)Single VM with 8 rt vCPUs:
# Min Latencies: 00006 00007 00007 00007 00007 00006 00006 00006
# Avg Latencies: 00008 00007 00007 00007 00007 00007 00007 00006
# Max Latencies: 00023 00019 00018 00019 00019 00019 00018 00018

Beaker jobs:
https://beaker.engineering.redhat.com/jobs/3939499
https://beaker.engineering.redhat.com/jobs/3939500
https://beaker.engineering.redhat.com/jobs/3938095

So besides nx_huge_pages_recovery_ratio=0, we also need nx_huge_pages=N for expected latency results.

I'll submit 24h jobs to double confirm this conclusion. Testing results will be updated soon.

Comment 21 Paolo Bonzini 2019-12-09 15:21:54 UTC

> Well, a guest switching code between 2MB -> 4K (say khugepaged), without a TLB flush, can
> cause this condition as i understand.
> 
> And in that case, a buggy guest can crash the host.
> 
> Am i missing something?

That would also be true of a bare-metal Linux system, and the memory management subsystem was audited. So it's only about untrusted guests. Still, the default should be nx_huge_pages=1.

Comment 22 Marcelo Tosatti 2019-12-09 18:57:38 UTC

Karen, 

The latency spike happened because the upstream patch to set nx_huge_pages_recovery_ratio=0 on realtime kernels was not backported.

Now it has been backported, which means the spike should be gone.

Worrying is the fact that the all instruction pages are now cached by 4K TLB entries, 
and the recovery thread is disabled.

This might slowdown certain workloads. 

Regarding whether or not to enable the security mitigation, Intel mentions:

Once these updates are applied, it may be appropriate for some customers to consider additional steps. This includes customers who cannot guarantee that trusted software is running on their system(s) and are using Simultaneous Multi-Threading (SMT). In these cases, customers should consider how they utilize SMT for their particular workload(s), guidance from their OS and VMM software providers, and the security threat model for their particular environment. Because these factors will vary considerably by customer, Intel is not recommending that Intel® HT be disabled, and it’s important to understand that doing so does not alone provide protection against MDS.

I think providing a "trusted_code=Y/N" tunable to control the Spectre/Meltdown and the vulnerability above is useful.

I don't think unstruted code runs on most of these Telco deployments.

Comment 24 Pei Zhang 2019-12-19 09:52:19 UTC

(In reply to Pei Zhang from comment #14)
> Hi Peter, Luiz, Marcelo, Paolo,
> 
> With kvm nx_huge_pages_recovery_ratio=0 nx_huge_pages=N, the latency looks
> good, 6h cyclictest max latency is 29us (the meltdown&spectre mitigations
> were enabled in these runs: pti_enable=1 ibpb_enabled=1 ibrs_enabled=0
> retp_enabled=1). 3/3 PASS.
> 
> Run1:
> 6h cyclictest testing results:
> (2)Single VM with 8 rt vCPUs(max latency is 22):
> # Min Latencies: 00006 00006 00006 00006 00006 00006 00006 00006
> # Avg Latencies: 00008 00007 00006 00007 00007 00007 00007 00007
> # Max Latencies: 00022 00020 00018 00018 00018 00018 00019 00019
> 
> Run2:
> 6h cyclictest testing results(max latency is 29):
> (2)Single VM with 8 rt vCPUs:
> # Min Latencies: 00006 00007 00007 00007 00007 00007 00006 00006
> # Avg Latencies: 00008 00007 00007 00007 00007 00007 00007 00007
> # Max Latencies: 00023 00019 00020 00022 00019 00019 00019 00029
> 
> Run3:
> 6h cyclictest testing results(max latency is 23):
> (2)Single VM with 8 rt vCPUs:
> # Min Latencies: 00006 00007 00007 00007 00007 00006 00006 00006
> # Avg Latencies: 00008 00007 00007 00007 00007 00007 00007 00006
> # Max Latencies: 00023 00019 00018 00019 00019 00019 00018 00018
> 
> Beaker jobs:
> https://beaker.engineering.redhat.com/jobs/3939499
> https://beaker.engineering.redhat.com/jobs/3939500
> https://beaker.engineering.redhat.com/jobs/3938095
> 
> So besides nx_huge_pages_recovery_ratio=0, we also need nx_huge_pages=N for
> expected latency results.
> 
> I'll submit 24h jobs to double confirm this conclusion. Testing results will
> be updated soon.

Testing with kvm nx_huge_pages_recovery_ratio=0 nx_huge_pages=N, the 24h cyclictest testings looks good. Max latency is 34us, no spike any more.

==Results==
(1)Single VM with 1 rt vCPU:
# Min Latencies: 00006
# Avg Latencies: 00008
# Max Latencies: 00023

(2)Single VM with 8 rt vCPUs:
# Min Latencies: 00006 00006 00006 00006 00006 00006 00006 00006
# Avg Latencies: 00008 00007 00007 00007 00007 00007 00007 00008
# Max Latencies: 00021 00019 00019 00018 00018 00018 00019 00028

(3)Multiple VMs each with 1 rt vCPU:
- VM1
# Min Latencies: 00006
# Avg Latencies: 00008
# Max Latencies: 00023

- VM2
# Min Latencies: 00006
# Avg Latencies: 00008
# Max Latencies: 00020

- VM3
# Min Latencies: 00006
# Avg Latencies: 00008
# Max Latencies: 00021

- VM4
# Min Latencies: 00006
# Avg Latencies: 00008
# Max Latencies: 00034


==Versions==
kernel-rt-3.10.0-957.43.1.rt56.957.el7.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.7.x86_64
rt-tests-1.0-12.el7.x86_64
qemu-kvm-common-rhev-2.12.0-18.el7_6.7.x86_64
tuned-2.10.0-6.el7_6.4.noarch
microcode_ctl-2.1-47.12.el7_6.x86_64
qemu-kvm-tools-rhev-2.12.0-18.el7_6.7.x86_64
libvirt-4.5.0-10.el7_6.15.x86_64

Comment 25 Luiz Capitulino 2019-12-19 18:49:22 UTC

Thanks a lot for testing, Pei!

Comment 26 Pei Zhang 2019-12-27 08:08:34 UTC

Scenarios summary:
(1)With kvm nx_huge_pages_recovery_ratio=0 nx_huge_pages=Y: max latency is 99us
(2)With kvm nx_huge_pages_recovery_ratio=0 nx_huge_pages=N: max latency is 34us

With scenario (1), enabling nx_huge_pages mitigation will cause the max latency is 99us which should not be acceptable. Thanks Marcelo's suggestion by mail, I agree re-open this bz for further discussion.

Comment 28 Luiz Capitulino 2020-01-02 17:55:35 UTC

(In reply to Pei Zhang from comment #26)
> Scenarios summary:
> (1)With kvm nx_huge_pages_recovery_ratio=0 nx_huge_pages=Y: max latency is
> 99us
> (2)With kvm nx_huge_pages_recovery_ratio=0 nx_huge_pages=N: max latency is
> 34us
> 
> With scenario (1), enabling nx_huge_pages mitigation will cause the max
> latency is 99us which should not be acceptable. Thanks Marcelo's suggestion
> by mail, I agree re-open this bz for further discussion.

Pei,

I think the conclusion is that, we want nx_huge_pages=N in addition to
nx_huge_pages_recovery_ratio=0. Correct?

If yes, then would you open a new BZ and keep this one as a dupe? I think this
got too confusing at this point (not your fault!!) and just asking for nx_huge_pages=N
will simplify it.

Comment 30 Pei Zhang 2020-01-03 09:08:32 UTC


*** This bug has been marked as a duplicate of bug 1772894 ***