Bug 1757165

Summary: 8 vCPU guest need max latency < 20 us with stress [RT-8.2]
Product: Red Hat Enterprise Linux 8 Reporter: Beth Uptagrafft <bhu>
Component: kernel-rtAssignee: Marcelo Tosatti <mtosatti>
kernel-rt sub component: KVM QA Contact: Pei Zhang <pezhang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: bhu, broskos, chayang, cww, daolivei, derli, eelena, fiezzi, hhuang, jhsiao, jianzzha, jinzhao, jlelli, juzhang, kabbott, lcapitulino, mtosatti, ngu, peterx, pezhang, pvaanane, snagar, sputhenp, virt-maint, williams
Version: 8.2Flags: pvaanane: needinfo-
Target Milestone: rc   
Target Release: 8.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-rt-4.18.0-148.rt13.5.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1690543
: 1813007 (view as bug list) Environment:
Last Closed: 2020-04-28 15:25:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1640832, 1680412, 1722609, 1813007    

Comment 5 Marcelo Tosatti 2019-11-12 12:37:47 UTC
Pei, do you have the "nowatchdog" kernel command line option set, on both host and guest?

Comment 6 Pei Zhang 2019-11-12 12:44:58 UTC
(In reply to Marcelo Tosatti from comment #5)
> Pei, do you have the "nowatchdog" kernel command line option set, on both
> host and guest?

Hi Marcelo,

Yes, nowatchdog is in both host and guest kernel line.

Best regards,

Pei

Comment 7 Marcelo Tosatti 2019-11-12 19:27:56 UTC
(In reply to Pei Zhang from comment #6)
> (In reply to Marcelo Tosatti from comment #5)
> > Pei, do you have the "nowatchdog" kernel command line option set, on both
> > host and guest?
> 
> Hi Marcelo,
> 
> Yes, nowatchdog is in both host and guest kernel line.
> 
> Best regards,
> 
> Pei

Hi Pei,

Can you please provide access to the machine?

My tests show max=17us.

TIA

Comment 20 Pei Zhang 2020-02-28 15:56:41 UTC
Verified with kernel-rt-4.18.0-184.rt13.42.el8.x86_64:

KVM-RT testing get PASS:  24h cyclictest max latency is 21us.

==Results==
(1)Single VM with 1 rt vCPU:
# Min Latencies: 00004
# Avg Latencies: 00005
# Max Latencies: 00018

(2)Single VM with 8 rt vCPUs:
# Min Latencies: 00004 00004 00004 00005 00004 00005 00005 00005
# Avg Latencies: 00005 00005 00005 00005 00005 00005 00005 00005
# Max Latencies: 00017 00014 00015 00015 00016 00019 00016 00014

(3)Multiple VMs each with 1 rt vCPU:
- VM1
# Min Latencies: 00004
# Avg Latencies: 00004
# Max Latencies: 00021

- VM2
# Min Latencies: 00004
# Avg Latencies: 00005
# Max Latencies: 00017

- VM3
# Min Latencies: 00004
# Avg Latencies: 00004
# Max Latencies: 00016

- VM4
# Min Latencies: 00004
# Avg Latencies: 00004
# Max Latencies: 00017


==Versions==
kernel-rt-4.18.0-184.rt13.42.el8.x86_64
tuned-2.13.0-5.el8.noarch
microcode_ctl-20191115-4.el8.x86_64
rt-tests-1.5-18.el8.x86_64
qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64
python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64

So this bug has been fixed very well. Move to 'VERIFIED'.

Comment 25 Luiz Capitulino 2020-03-02 02:23:21 UTC
(In reply to Pei Zhang from comment #20)
> Verified with kernel-rt-4.18.0-184.rt13.42.el8.x86_64:
> 
> KVM-RT testing get PASS:  24h cyclictest max latency is 21us.
> 
> ==Results==
> (1)Single VM with 1 rt vCPU:
> # Min Latencies: 00004
> # Avg Latencies: 00005
> # Max Latencies: 00018
> 
> (2)Single VM with 8 rt vCPUs:
> # Min Latencies: 00004 00004 00004 00005 00004 00005 00005 00005
> # Avg Latencies: 00005 00005 00005 00005 00005 00005 00005 00005
> # Max Latencies: 00017 00014 00015 00015 00016 00019 00016 00014
> 
> (3)Multiple VMs each with 1 rt vCPU:
> - VM1
> # Min Latencies: 00004
> # Avg Latencies: 00004
> # Max Latencies: 00021

Pei, since we get max latency greater than 20us in
this case, maybe this BZ has failed verification?

I mean, for KVM-RT automated testing, having a max
latency of about 20us is good enough. However, this
BZ is about achieving less than 20us.

> 
> - VM2
> # Min Latencies: 00004
> # Avg Latencies: 00005
> # Max Latencies: 00017
> 
> - VM3
> # Min Latencies: 00004
> # Avg Latencies: 00004
> # Max Latencies: 00016
> 
> - VM4
> # Min Latencies: 00004
> # Avg Latencies: 00004
> # Max Latencies: 00017
> 
> 
> ==Versions==
> kernel-rt-4.18.0-184.rt13.42.el8.x86_64
> tuned-2.13.0-5.el8.noarch
> microcode_ctl-20191115-4.el8.x86_64
> rt-tests-1.5-18.el8.x86_64
> qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64
> python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64
> 
> So this bug has been fixed very well. Move to 'VERIFIED'.

Comment 26 Pei Zhang 2020-03-02 03:01:45 UTC
(In reply to Luiz Capitulino from comment #25)
...
> > 
> > (3)Multiple VMs each with 1 rt vCPU:
> > - VM1
> > # Min Latencies: 00004
> > # Avg Latencies: 00004
> > # Max Latencies: 00021
> 
> Pei, since we get max latency greater than 20us in
> this case, maybe this BZ has failed verification?
> 
> I mean, for KVM-RT automated testing, having a max
> latency of about 20us is good enough. However, this
> BZ is about achieving less than 20us.
> 

Hi Luiz,

Make sense. I agree we should have strict latency results to verify this bug.

This run was testing with default /sys/kernel/ktimer_lockless_check 0.  I'll try with /sys/kernel/ktimer_lockless_check 1 to see if the testing results can achieve <=20us. Testing results will be updated soon.

I think I need to recover to ON_QA status now. 

Thank you.

Best regards,

Pei

Comment 27 Pei Zhang 2020-03-05 01:08:21 UTC
With "# echo 1 > /sys/kernel/ktimer_lockless_check", the slight spike (1us) is gone.

[PASS]With mitigation off, 24h cyclictest max latency is 19us.

Hightlight:
1. This run was testing with "# echo 1 > /sys/kernel/ktimer_lockless_check" in both host and guest.

2. This run was testing with manually adding "isolcpus=managed_irq," in /boot/grub2/grub.cfg in both host and guest.


==Results==
(1)Single VM with 1 rt vCPU:
# Min Latencies: 00005
# Avg Latencies: 00006
# Max Latencies: 00019

(2)Single VM with 8 rt vCPUs:
# Min Latencies: 00005 00007 00007 00007 00007 00007 00007 00007
# Avg Latencies: 00005 00007 00008 00007 00007 00007 00007 00007
# Max Latencies: 00017 00015 00018 00014 00016 00015 00015 00015

(3)Multiple VMs each with 1 rt vCPU:
- VM1
# Min Latencies: 00004
# Avg Latencies: 00005
# Max Latencies: 00015

- VM2
# Min Latencies: 00004
# Avg Latencies: 00005
# Max Latencies: 00015

- VM3
# Min Latencies: 00004
# Avg Latencies: 00005
# Max Latencies: 00015

- VM4
# Min Latencies: 00004
# Avg Latencies: 00005
# Max Latencies: 00019

==Versions==
kernel-rt-4.18.0-185.rt13.43.el8.x86_64
tuned-2.13.0-5.el8.noarch
microcode_ctl-20191115-4.el8.x86_64
rt-tests-1.5-18.el8.x86_64
qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64
python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64

Comment 29 Pei Zhang 2020-03-06 01:23:55 UTC
As Comment 27 and Comment 28,  24h cyclictest max latency both < 20us (2/2 PASS). 

Move this bug to 'VERIFIED'.

Comment 32 Luiz Capitulino 2020-03-09 18:35:50 UTC
*** Bug 1761735 has been marked as a duplicate of this bug. ***

Comment 43 errata-xmlrpc 2020-04-28 15:25:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1567