1905799 – stalld enablement for KVM-RT

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1905799 - stalld enablement for KVM-RT

Summary: stalld enablement for KVM-RT

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	kernel-rt
Sub Component:
Version:	8.4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Luiz Capitulino
QA Contact:	Pei Zhang
Docs Contact:
URL:
Whiteboard:
Depends On:	1983167
Blocks:	1825271 1883636
TreeView+	depends on / blocked

Reported:	2020-12-09 05:41 UTC by Pei Zhang
Modified:	2022-02-23 03:30 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-01-21 17:44:55 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Comment 27 Pei Zhang 2021-05-18 08:53:04 UTC

Thanks Nitesh and Clark for above comments. I tried with the minimum value -r 8000. 12h oslat max latency is 56us.


Testing update(2/2):

1) Test oslat with -w memmove -w 16K, stalld with -r 8000:

12h oslat max latency is 56us.

(1)Single VM with 1 rt vCPU:
     Maximum:	 49 (us)

(2)Single VM with 8 rt vCPUs:
     Maximum:	 45 44 46 45 44 44 48 47 (us)

(3)Multiple VMs each with 1 rt vCPU:
- VM1
     Maximum:	 12 (us)

- VM2
     Maximum:	 56 (us)

- VM3
     Maximum:	 50 (us)

- VM4
     Maximum:	 51 (us)



==oslat cmd==
scenario (1) and (3): /home/nfv-virt-rt-kvm/tools/oslat --cpu-list 1 --rtprio 1 --runtime 12h -w memmove -m 16K
scenario (2): /home/nfv-virt-rt-kvm/tools/oslat --cpu-list 2,3,4,5,6,7,8,9 --rtprio 1 --runtime 12h -w memmove -m 16K

==Versions==
kernel-rt-4.18.0-305.rt7.72.el8.x86_64
qemu-kvm-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64
tuned-2.15.0-2.el8.noarch
libvirt-7.0.0-14.module+el8.4.0+10886+79296686.x86_64
python3-libvirt-7.0.0-1.module+el8.4.0+9469+2eaf72bc.x86_64
microcode_ctl-20210216-1.el8.x86_64
rt-tests-1.10-3.el8.x86_64
stalld-1.9-2.el8.x86_64

Comment 29 Nitesh Narayan Lal 2021-05-18 14:41:35 UTC

Hi Pei,

Thank you for testing and for sharing the access.

Quick question, what is the maximum latency that you get without stalld but
with "-w memmove -m 16K"?

I just disabled stalld on your machine and ran a quick 5m test in the guest
and got a maximum of 42 us.

     Minimum:	 1 1 1 1 1 1 1 1 (us)
     Average:	 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 (us)
     Maximum:	 43 42 43 6 43 43 43 38 (us)
     Max-Min:	 42 41 42 5 42 42 42 37 (us)


Have we ever defined a baseline with -w mememove -m 16K? (Marcelo?)

Also, during your run, the only boosted process that I saw was swapper but
let's clarify the above first before drawing conclusions (IMHO).

Thanks

Comment 30 Pei Zhang 2021-05-18 14:57:15 UTC

(In reply to Nitesh Narayan Lal from comment #29)
> Hi Pei,
> 
> Thank you for testing and for sharing the access.
> 
> Quick question, what is the maximum latency that you get without stalld but
> with "-w memmove -m 16K"?
> 
> I just disabled stalld on your machine and ran a quick 5m test in the guest
> and got a maximum of 42 us.
> 
>      Minimum:	 1 1 1 1 1 1 1 1 (us)
>      Average:	 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 (us)
>      Maximum:	 43 42 43 6 43 43 43 38 (us)
>      Max-Min:	 42 41 42 5 42 42 42 37 (us)
>

Nitesh,

I think you need to disable stalld on both host and guest. 

In my previous 12h testing with "-w memmove -m 16K" and disable the stalld on both host and guest, the testing results look like below:

==Results==
(1)Single VM with 1 rt vCPU:
     Maximum:	 11 (us)

(2)Single VM with 8 rt vCPUs:
     Maximum:	 11 20 20 20 20 20 19 20 (us)

(3)Multiple VMs each with 1 rt vCPU:
- VM1
     Maximum:	 12 (us)

- VM2
     Maximum:	 12 (us)

- VM3
     Maximum:	 11 (us)

- VM4
     Maximum:	 11 (us)



==Versions==
kernel-rt-4.18.0-305.rt7.72.el8.x86_64
qemu-kvm-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64
tuned-2.15.0-2.el8.noarch
libvirt-7.0.0-14.module+el8.4.0+10886+79296686.x86_64
python3-libvirt-7.0.0-1.module+el8.4.0+9469+2eaf72bc.x86_64
microcode_ctl-20210216-1.el8.x86_64
rt-tests-1.10-3.el8.x86_64


==Reference Links==
- Beaker job:
https://beaker.engineering.redhat.com/jobs/5353998

Best regards,

Pei

> 
> Have we ever defined a baseline with -w mememove -m 16K? (Marcelo?)
> 
> Also, during your run, the only boosted process that I saw was swapper but
> let's clarify the above first before drawing conclusions (IMHO).
> 
> Thanks

Comment 31 Nitesh Narayan Lal 2021-05-18 15:00:06 UTC

Thanks, Pei for the quick response.
IMHO we don't need to install/enable stalld on the guest but I will try to
run some tests in your environment again by disabling it in the guest as well.

Comment 32 Marcelo Tosatti 2021-05-18 15:08:39 UTC

(In reply to Nitesh Narayan Lal from comment #29)
> Hi Pei,
> 
> Thank you for testing and for sharing the access.
> 
> Quick question, what is the maximum latency that you get without stalld but
> with "-w memmove -m 16K"?
> 
> I just disabled stalld on your machine and ran a quick 5m test in the guest
> and got a maximum of 42 us.
> 
>      Minimum:	 1 1 1 1 1 1 1 1 (us)
>      Average:	 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 (us)
>      Maximum:	 43 42 43 6 43 43 43 38 (us)
>      Max-Min:	 42 41 42 5 42 42 42 37 (us)
> 
> 
> Have we ever defined a baseline with -w mememove -m 16K? (Marcelo?)

It depends on HW specifics: processor, cache size, memory bandwidth, etc.

Its the time measured in:

a = rdtsc();
memmove(...)
b = rdtsc();

time = b-a in nanoseconds.

>
> Also, during your run, the only boosted process that I saw was swapper but
> let's clarify the above first before drawing conclusions (IMHO).
> 
> Thanks

Comment 33 Nitesh Narayan Lal 2021-05-18 15:10:39 UTC

(In reply to Nitesh Narayan Lal from comment #31)
> Thanks, Pei for the quick response.
> IMHO we don't need to install/enable stalld on the guest but I will try to
> run some tests in your environment again by disabling it in the guest as
> well.

(In reply to Nitesh Narayan Lal from comment #31)
> Thanks, Pei for the quick response.
> IMHO we don't need to install/enable stalld on the guest but I will try to
> run some tests in your environment again by disabling it in the guest as
> well.

Interestingly enough, there are several kworker that are boosted in the guest:

May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/2:1-141 might starve on CPU 2 (waiting for 15 seconds)
May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/3:1-142 might starve on CPU 3 (waiting for 15 seconds)
May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/4:1-143 might starve on CPU 4 (waiting for 15 seconds)
May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/6:1-145 might starve on CPU 6 (waiting for 15 seconds)
May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/7:1-146 might starve on CPU 7 (waiting for 15 seconds)
May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/8:1-147 might starve on CPU 8 (waiting for 15 seconds)
May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/9:1-148 might starve on CPU 9 (waiting for 15 seconds)
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/6:1-145 starved on CPU 6 for 30 seconds
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 145 using SCHED_DEADLINE
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/4:1-143 starved on CPU 4 for 30 seconds
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 143 using SCHED_DEADLINE
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/2:1-141 starved on CPU 2 for 30 seconds
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 141 using SCHED_DEADLINE
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/8:1-147 starved on CPU 8 for 30 seconds
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 147 using SCHED_DEADLINE
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/9:1-148 starved on CPU 9 for 30 seconds
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 148 using SCHED_DEADLINE
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/7:1-146 starved on CPU 7 for 30 seconds
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/3:1-142 starved on CPU 3 for 30 seconds
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 146 using SCHED_DEADLINE
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 142 using SCHED_DEADLINE

Have to check what these kworkers are doing.

Comment 34 Nitesh Narayan Lal 2021-05-18 15:10:56 UTC

(In reply to Nitesh Narayan Lal from comment #31)
> Thanks, Pei for the quick response.
> IMHO we don't need to install/enable stalld on the guest but I will try to
> run some tests in your environment again by disabling it in the guest as
> well.

(In reply to Nitesh Narayan Lal from comment #31)
> Thanks, Pei for the quick response.
> IMHO we don't need to install/enable stalld on the guest but I will try to
> run some tests in your environment again by disabling it in the guest as
> well.

Interestingly enough, there are several kworker that are boosted in the guest:

May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/2:1-141 might starve on CPU 2 (waiting for 15 seconds)
May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/3:1-142 might starve on CPU 3 (waiting for 15 seconds)
May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/4:1-143 might starve on CPU 4 (waiting for 15 seconds)
May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/6:1-145 might starve on CPU 6 (waiting for 15 seconds)
May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/7:1-146 might starve on CPU 7 (waiting for 15 seconds)
May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/8:1-147 might starve on CPU 8 (waiting for 15 seconds)
May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/9:1-148 might starve on CPU 9 (waiting for 15 seconds)
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/6:1-145 starved on CPU 6 for 30 seconds
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 145 using SCHED_DEADLINE
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/4:1-143 starved on CPU 4 for 30 seconds
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 143 using SCHED_DEADLINE
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/2:1-141 starved on CPU 2 for 30 seconds
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 141 using SCHED_DEADLINE
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/8:1-147 starved on CPU 8 for 30 seconds
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 147 using SCHED_DEADLINE
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/9:1-148 starved on CPU 9 for 30 seconds
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 148 using SCHED_DEADLINE
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/7:1-146 starved on CPU 7 for 30 seconds
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/3:1-142 starved on CPU 3 for 30 seconds
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 146 using SCHED_DEADLINE
May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 142 using SCHED_DEADLINE

Have to check what these kworkers are doing.

Comment 36 Daniel Bristot de Oliveira 2021-07-13 07:53:34 UTC

Hi Nitesh, did you find out which workers are running, and how to avoid them?

Comment 37 Nitesh Narayan Lal 2021-07-13 13:18:44 UTC

(In reply to Daniel Bristot de Oliveira from comment #36)
> Hi Nitesh, did you find out which workers are running, and how to avoid them?

Hi Daniel,

No, I didn't. I took a quick look at Pei's setup and that's how I found those kworkers in the guest. 
Marcelo was going to work on this enablement so maybe he did but not sure.


Thanks

Comment 38 Marcelo Tosatti 2021-07-13 13:30:34 UTC

(In reply to Nitesh Narayan Lal from comment #37)
> (In reply to Daniel Bristot de Oliveira from comment #36)
> > Hi Nitesh, did you find out which workers are running, and how to avoid them?
> 
> Hi Daniel,
> 
> No, I didn't. I took a quick look at Pei's setup and that's how I found
> those kworkers in the guest. 

One possibility are vmstat workers (which will be triggered from oslat due to 
mlock).

> Marcelo was going to work on this enablement so maybe he did but not sure.

Pei's testing found that the cyclictest/oslat latency, when executing in the
guest, with stalld on guest and host is ~= 100us.

So, as far as i understand, this BZ can be closed.

Comment 39 Marcelo Tosatti 2021-07-13 17:55:29 UTC

(In reply to Marcelo Tosatti from comment #38)
> (In reply to Nitesh Narayan Lal from comment #37)
> > (In reply to Daniel Bristot de Oliveira from comment #36)
> > > Hi Nitesh, did you find out which workers are running, and how to avoid them?
> > 
> > Hi Daniel,
> > 
> > No, I didn't. I took a quick look at Pei's setup and that's how I found
> > those kworkers in the guest. 
> 
> One possibility are vmstat workers (which will be triggered from oslat due
> to 
> mlock).
> 
> > Marcelo was going to work on this enablement so maybe he did but not sure.
> 
> Pei's testing found that the cyclictest/oslat latency, when executing in the
> guest, with stalld on guest and host is ~= 100us.
> 
> So, as far as i understand, this BZ can be closed.

That said, its up to the customer whether to enable stalld or not.

Cisco currently does not enable it AFAIK (probably because stalld was written
after they created their deployment), so i suppose internal QA should match
with that (which it does).

Nitesh, Pei, do you see any reason not to close this BZ ?

Comment 40 Nitesh Narayan Lal 2021-07-13 18:31:09 UTC

(In reply to Marcelo Tosatti from comment #39)
> (In reply to Marcelo Tosatti from comment #38)
> > (In reply to Nitesh Narayan Lal from comment #37)
> > > (In reply to Daniel Bristot de Oliveira from comment #36)
> > > > Hi Nitesh, did you find out which workers are running, and how to avoid them?
> > > 
> > > Hi Daniel,
> > > 
> > > No, I didn't. I took a quick look at Pei's setup and that's how I found
> > > those kworkers in the guest. 
> > 
> > One possibility are vmstat workers (which will be triggered from oslat due
> > to 
> > mlock).
> > 
> > > Marcelo was going to work on this enablement so maybe he did but not sure.
> > 
> > Pei's testing found that the cyclictest/oslat latency, when executing in the
> > guest, with stalld on guest and host is ~= 100us.
> > 
> > So, as far as i understand, this BZ can be closed.
> 
> That said, its up to the customer whether to enable stalld or not.
> 
> Cisco currently does not enable it AFAIK (probably because stalld was written
> after they created their deployment), so i suppose internal QA should match
> with that (which it does).
>

Right
 
> Nitesh, Pei, do you see any reason not to close this BZ ?

IMHO we should atleast check the work that is performed by these boosted
kworkers.

If we are already sure that it is only the vmstat work that is being queued
then that should be resolved once your kernel fix gets merged.

Another thing to consider is if there can be similar kworker starvation in
Karl's CNV-RT environment with oslat.

Thanks

Comment 41 Pei Zhang 2021-07-14 02:03:17 UTC

(In reply to Marcelo Tosatti from comment #39)
> (In reply to Marcelo Tosatti from comment #38)
> > (In reply to Nitesh Narayan Lal from comment #37)
> > > (In reply to Daniel Bristot de Oliveira from comment #36)
> > > > Hi Nitesh, did you find out which workers are running, and how to avoid them?
> > > 
> > > Hi Daniel,
> > > 
> > > No, I didn't. I took a quick look at Pei's setup and that's how I found
> > > those kworkers in the guest. 
> > 
> > One possibility are vmstat workers (which will be triggered from oslat due
> > to 
> > mlock).
> > 
> > > Marcelo was going to work on this enablement so maybe he did but not sure.
> > 
> > Pei's testing found that the cyclictest/oslat latency, when executing in the
> > guest, with stalld on guest and host is ~= 100us.
> > 
> > So, as far as i understand, this BZ can be closed.
> 
> That said, its up to the customer whether to enable stalld or not.
> 
> Cisco currently does not enable it AFAIK (probably because stalld was written
> after they created their deployment), so i suppose internal QA should match
> with that (which it does).
> 
> Nitesh, Pei, do you see any reason not to close this BZ ?

Marcelo, sounds ok to me to close this bz. As stalled was default disabled. So we don't need to do extra things to keep the current latency performance. Thanks.

Comment 42 Marcelo Tosatti 2021-07-14 17:07:32 UTC

Closing based on comment #39.

If Cisco (or other customers) decide they prefer to enable stalld, then they can be 
given the information gathered here (of OS interruptions close to 100us when scheduling
task out of CPU, then scheduling qemu-kvm-vcpu back in).

Thanks Pei!

Comment 43 Nitesh Narayan Lal 2021-07-14 17:21:14 UTC

There are two scenarios possible FWIU:

- If you have stalld enabled and it is not boosting any kworker then there
  should not be any impact on the latency

- If stalld is boosting any kworker that is getting starved then we have an
  issue that can cause unexpected behavior which IMO is not desirable

Can you please confirm if the kworkers that are boosted in the guest are only
doing the vmstat update work and nothing else?

Thanks

Comment 44 Marcelo Tosatti 2021-07-14 17:39:07 UTC

(In reply to Nitesh Narayan Lal from comment #43)
> There are two scenarios possible FWIU:
> 
> - If you have stalld enabled and it is not boosting any kworker then there
>   should not be any impact on the latency

Correct.

> 
> - If stalld is boosting any kworker that is getting starved then we have an
>   issue that can cause unexpected behavior which IMO is not desirable

If the use-case requires < 40us maximum interruption (per window of time), then you have a problem.

If the use-case can accept, say 150us per second interruption, then the schedule-out/schedule-in 
of qemu-kvm-vcpu thread (isolated) is not a problem (and enabling stalld by default makes sense).

If the use-case requires < 40 us maximum interruption, but the user would like to enable stalld 
anyway, they can do so and monitor the logs.

> 
> Can you please confirm if the kworkers that are boosted in the guest are only
> doing the vmstat update work and nothing else?

I am pretty sure they are (Pei is running oslat, and oslat only triggers vmstat_update).

Comment 45 Marcelo Tosatti 2021-07-16 17:17:14 UTC

(In reply to Nitesh Narayan Lal from comment #43)
> There are two scenarios possible FWIU:
> 
> - If you have stalld enabled and it is not boosting any kworker then there
>   should not be any impact on the latency
> 
> - If stalld is boosting any kworker that is getting starved then we have an
>   issue that can cause unexpected behavior which IMO is not desirable
> 
> Can you please confirm if the kworkers that are boosted in the guest are only
> doing the vmstat update work and nothing else?
> 
> Thanks

Had a conversation with Luiz and he came up with the following which makes sense:

Before recommending stalld to customers (which should involve explanation of the consequences
of switching between tasks), should make sure QE tests pass with < 40us, when 
stalld is enabled, under a "meaningful workload" (which we expect the customer to use).

Reassigning to Luiz as well (because this is a task assigned to the team).

Comment 47 Marcelo Tosatti 2021-07-16 17:20:24 UTC

(In reply to Nitesh Narayan Lal from comment #43)
> There are two scenarios possible FWIU:
> 
> - If you have stalld enabled and it is not boosting any kworker then there
>   should not be any impact on the latency
> 
> - If stalld is boosting any kworker that is getting starved then we have an
>   issue that can cause unexpected behavior which IMO is not desirable
> 
> Can you please confirm if the kworkers that are boosted in the guest are only
> doing the vmstat update work and nothing else?
> 
> Thanks

Had a conversation with Luiz and he came up with the following which makes sense:

Before recommending stalld to customers (which should involve explanation of the consequences
of switching between tasks), should make sure QE tests pass with < 40us, when 
stalld is enabled, under a "meaningful workload" (which we expect the customer to use).

Reassigning to Luiz as well (because this is a task assigned to the team).

Comment 48 Nitesh Narayan Lal 2021-07-20 18:44:14 UTC

*** Bug 1921601 has been marked as a duplicate of this bug. ***

Comment 50 RHEL Program Management 2022-01-16 07:27:03 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 51 Luiz Capitulino 2022-01-17 15:38:54 UTC

We may need stalld for RT-CNV, reopening.

Comment 53 Marcelo Tosatti 2022-01-21 13:19:40 UTC

(In reply to Luiz Capitulino from comment #51)
> We may need stalld for RT-CNV, reopening.

Luiz,

Actual results:
Oslat max latency is 102us when enabling stalld service.

Do you remember the latencies that Daniel Froehlich mentioned? Would be 
good to have his slides.

Comment 55 Luiz Capitulino 2022-01-21 17:44:55 UTC

On discussing this issue with Marcelo, we decided to close it because as stated in comment 53 oslat max latency with stalld is 102us but max latency threshold for a possible known customer for RT-CNV is 150us.

Note You need to log in before you can comment on or make changes to this bug.