Bug 1905799
| Summary: | stalld enablement for KVM-RT | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Pei Zhang <pezhang> |
| Component: | kernel-rt | Assignee: | Luiz Capitulino <lcapitulino> |
| kernel-rt sub component: | KVM | QA Contact: | Pei Zhang <pezhang> |
| Status: | CLOSED WONTFIX | Docs Contact: | |
| Severity: | low | ||
| Priority: | medium | CC: | bhu, chayang, daolivei, jinzhao, jlelli, juzhang, kcarcia, lcapitulino, mstowell, mtosatti, nilal, peterx, psahoo, virt-maint, williams |
| Version: | 8.4 | Keywords: | Reopened, Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-01-21 17:44:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1983167 | ||
| Bug Blocks: | 1825271, 1883636 | ||
Hi Pei,
Thank you for testing and for sharing the access.
Quick question, what is the maximum latency that you get without stalld but
with "-w memmove -m 16K"?
I just disabled stalld on your machine and ran a quick 5m test in the guest
and got a maximum of 42 us.
Minimum: 1 1 1 1 1 1 1 1 (us)
Average: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 (us)
Maximum: 43 42 43 6 43 43 43 38 (us)
Max-Min: 42 41 42 5 42 42 42 37 (us)
Have we ever defined a baseline with -w mememove -m 16K? (Marcelo?)
Also, during your run, the only boosted process that I saw was swapper but
let's clarify the above first before drawing conclusions (IMHO).
Thanks
(In reply to Nitesh Narayan Lal from comment #29) > Hi Pei, > > Thank you for testing and for sharing the access. > > Quick question, what is the maximum latency that you get without stalld but > with "-w memmove -m 16K"? > > I just disabled stalld on your machine and ran a quick 5m test in the guest > and got a maximum of 42 us. > > Minimum: 1 1 1 1 1 1 1 1 (us) > Average: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 (us) > Maximum: 43 42 43 6 43 43 43 38 (us) > Max-Min: 42 41 42 5 42 42 42 37 (us) > Nitesh, I think you need to disable stalld on both host and guest. In my previous 12h testing with "-w memmove -m 16K" and disable the stalld on both host and guest, the testing results look like below: ==Results== (1)Single VM with 1 rt vCPU: Maximum: 11 (us) (2)Single VM with 8 rt vCPUs: Maximum: 11 20 20 20 20 20 19 20 (us) (3)Multiple VMs each with 1 rt vCPU: - VM1 Maximum: 12 (us) - VM2 Maximum: 12 (us) - VM3 Maximum: 11 (us) - VM4 Maximum: 11 (us) ==Versions== kernel-rt-4.18.0-305.rt7.72.el8.x86_64 qemu-kvm-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64 tuned-2.15.0-2.el8.noarch libvirt-7.0.0-14.module+el8.4.0+10886+79296686.x86_64 python3-libvirt-7.0.0-1.module+el8.4.0+9469+2eaf72bc.x86_64 microcode_ctl-20210216-1.el8.x86_64 rt-tests-1.10-3.el8.x86_64 ==Reference Links== - Beaker job: https://beaker.engineering.redhat.com/jobs/5353998 Best regards, Pei > > Have we ever defined a baseline with -w mememove -m 16K? (Marcelo?) > > Also, during your run, the only boosted process that I saw was swapper but > let's clarify the above first before drawing conclusions (IMHO). > > Thanks Thanks, Pei for the quick response. IMHO we don't need to install/enable stalld on the guest but I will try to run some tests in your environment again by disabling it in the guest as well. (In reply to Nitesh Narayan Lal from comment #29) > Hi Pei, > > Thank you for testing and for sharing the access. > > Quick question, what is the maximum latency that you get without stalld but > with "-w memmove -m 16K"? > > I just disabled stalld on your machine and ran a quick 5m test in the guest > and got a maximum of 42 us. > > Minimum: 1 1 1 1 1 1 1 1 (us) > Average: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 (us) > Maximum: 43 42 43 6 43 43 43 38 (us) > Max-Min: 42 41 42 5 42 42 42 37 (us) > > > Have we ever defined a baseline with -w mememove -m 16K? (Marcelo?) It depends on HW specifics: processor, cache size, memory bandwidth, etc. Its the time measured in: a = rdtsc(); memmove(...) b = rdtsc(); time = b-a in nanoseconds. > > Also, during your run, the only boosted process that I saw was swapper but > let's clarify the above first before drawing conclusions (IMHO). > > Thanks (In reply to Nitesh Narayan Lal from comment #31) > Thanks, Pei for the quick response. > IMHO we don't need to install/enable stalld on the guest but I will try to > run some tests in your environment again by disabling it in the guest as > well. (In reply to Nitesh Narayan Lal from comment #31) > Thanks, Pei for the quick response. > IMHO we don't need to install/enable stalld on the guest but I will try to > run some tests in your environment again by disabling it in the guest as > well. Interestingly enough, there are several kworker that are boosted in the guest: May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/2:1-141 might starve on CPU 2 (waiting for 15 seconds) May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/3:1-142 might starve on CPU 3 (waiting for 15 seconds) May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/4:1-143 might starve on CPU 4 (waiting for 15 seconds) May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/6:1-145 might starve on CPU 6 (waiting for 15 seconds) May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/7:1-146 might starve on CPU 7 (waiting for 15 seconds) May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/8:1-147 might starve on CPU 8 (waiting for 15 seconds) May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/9:1-148 might starve on CPU 9 (waiting for 15 seconds) May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/6:1-145 starved on CPU 6 for 30 seconds May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 145 using SCHED_DEADLINE May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/4:1-143 starved on CPU 4 for 30 seconds May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 143 using SCHED_DEADLINE May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/2:1-141 starved on CPU 2 for 30 seconds May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 141 using SCHED_DEADLINE May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/8:1-147 starved on CPU 8 for 30 seconds May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 147 using SCHED_DEADLINE May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/9:1-148 starved on CPU 9 for 30 seconds May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 148 using SCHED_DEADLINE May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/7:1-146 starved on CPU 7 for 30 seconds May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/3:1-142 starved on CPU 3 for 30 seconds May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 146 using SCHED_DEADLINE May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 142 using SCHED_DEADLINE Have to check what these kworkers are doing. (In reply to Nitesh Narayan Lal from comment #31) > Thanks, Pei for the quick response. > IMHO we don't need to install/enable stalld on the guest but I will try to > run some tests in your environment again by disabling it in the guest as > well. (In reply to Nitesh Narayan Lal from comment #31) > Thanks, Pei for the quick response. > IMHO we don't need to install/enable stalld on the guest but I will try to > run some tests in your environment again by disabling it in the guest as > well. Interestingly enough, there are several kworker that are boosted in the guest: May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/2:1-141 might starve on CPU 2 (waiting for 15 seconds) May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/3:1-142 might starve on CPU 3 (waiting for 15 seconds) May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/4:1-143 might starve on CPU 4 (waiting for 15 seconds) May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/6:1-145 might starve on CPU 6 (waiting for 15 seconds) May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/7:1-146 might starve on CPU 7 (waiting for 15 seconds) May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/8:1-147 might starve on CPU 8 (waiting for 15 seconds) May 18 22:06:52 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/9:1-148 might starve on CPU 9 (waiting for 15 seconds) May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/6:1-145 starved on CPU 6 for 30 seconds May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 145 using SCHED_DEADLINE May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/4:1-143 starved on CPU 4 for 30 seconds May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 143 using SCHED_DEADLINE May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/2:1-141 starved on CPU 2 for 30 seconds May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 141 using SCHED_DEADLINE May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/8:1-147 starved on CPU 8 for 30 seconds May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 147 using SCHED_DEADLINE May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/9:1-148 starved on CPU 9 for 30 seconds May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 148 using SCHED_DEADLINE May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/7:1-146 starved on CPU 7 for 30 seconds May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: kworker/3:1-142 starved on CPU 3 for 30 seconds May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 146 using SCHED_DEADLINE May 18 22:07:07 bootp-73-75-14.lab.eng.pek2.redhat.com stalld[1104]: boosted pid 142 using SCHED_DEADLINE Have to check what these kworkers are doing. Hi Nitesh, did you find out which workers are running, and how to avoid them? (In reply to Daniel Bristot de Oliveira from comment #36) > Hi Nitesh, did you find out which workers are running, and how to avoid them? Hi Daniel, No, I didn't. I took a quick look at Pei's setup and that's how I found those kworkers in the guest. Marcelo was going to work on this enablement so maybe he did but not sure. Thanks (In reply to Nitesh Narayan Lal from comment #37) > (In reply to Daniel Bristot de Oliveira from comment #36) > > Hi Nitesh, did you find out which workers are running, and how to avoid them? > > Hi Daniel, > > No, I didn't. I took a quick look at Pei's setup and that's how I found > those kworkers in the guest. One possibility are vmstat workers (which will be triggered from oslat due to mlock). > Marcelo was going to work on this enablement so maybe he did but not sure. Pei's testing found that the cyclictest/oslat latency, when executing in the guest, with stalld on guest and host is ~= 100us. So, as far as i understand, this BZ can be closed. (In reply to Marcelo Tosatti from comment #38) > (In reply to Nitesh Narayan Lal from comment #37) > > (In reply to Daniel Bristot de Oliveira from comment #36) > > > Hi Nitesh, did you find out which workers are running, and how to avoid them? > > > > Hi Daniel, > > > > No, I didn't. I took a quick look at Pei's setup and that's how I found > > those kworkers in the guest. > > One possibility are vmstat workers (which will be triggered from oslat due > to > mlock). > > > Marcelo was going to work on this enablement so maybe he did but not sure. > > Pei's testing found that the cyclictest/oslat latency, when executing in the > guest, with stalld on guest and host is ~= 100us. > > So, as far as i understand, this BZ can be closed. That said, its up to the customer whether to enable stalld or not. Cisco currently does not enable it AFAIK (probably because stalld was written after they created their deployment), so i suppose internal QA should match with that (which it does). Nitesh, Pei, do you see any reason not to close this BZ ? (In reply to Marcelo Tosatti from comment #39) > (In reply to Marcelo Tosatti from comment #38) > > (In reply to Nitesh Narayan Lal from comment #37) > > > (In reply to Daniel Bristot de Oliveira from comment #36) > > > > Hi Nitesh, did you find out which workers are running, and how to avoid them? > > > > > > Hi Daniel, > > > > > > No, I didn't. I took a quick look at Pei's setup and that's how I found > > > those kworkers in the guest. > > > > One possibility are vmstat workers (which will be triggered from oslat due > > to > > mlock). > > > > > Marcelo was going to work on this enablement so maybe he did but not sure. > > > > Pei's testing found that the cyclictest/oslat latency, when executing in the > > guest, with stalld on guest and host is ~= 100us. > > > > So, as far as i understand, this BZ can be closed. > > That said, its up to the customer whether to enable stalld or not. > > Cisco currently does not enable it AFAIK (probably because stalld was written > after they created their deployment), so i suppose internal QA should match > with that (which it does). > Right > Nitesh, Pei, do you see any reason not to close this BZ ? IMHO we should atleast check the work that is performed by these boosted kworkers. If we are already sure that it is only the vmstat work that is being queued then that should be resolved once your kernel fix gets merged. Another thing to consider is if there can be similar kworker starvation in Karl's CNV-RT environment with oslat. Thanks (In reply to Marcelo Tosatti from comment #39) > (In reply to Marcelo Tosatti from comment #38) > > (In reply to Nitesh Narayan Lal from comment #37) > > > (In reply to Daniel Bristot de Oliveira from comment #36) > > > > Hi Nitesh, did you find out which workers are running, and how to avoid them? > > > > > > Hi Daniel, > > > > > > No, I didn't. I took a quick look at Pei's setup and that's how I found > > > those kworkers in the guest. > > > > One possibility are vmstat workers (which will be triggered from oslat due > > to > > mlock). > > > > > Marcelo was going to work on this enablement so maybe he did but not sure. > > > > Pei's testing found that the cyclictest/oslat latency, when executing in the > > guest, with stalld on guest and host is ~= 100us. > > > > So, as far as i understand, this BZ can be closed. > > That said, its up to the customer whether to enable stalld or not. > > Cisco currently does not enable it AFAIK (probably because stalld was written > after they created their deployment), so i suppose internal QA should match > with that (which it does). > > Nitesh, Pei, do you see any reason not to close this BZ ? Marcelo, sounds ok to me to close this bz. As stalled was default disabled. So we don't need to do extra things to keep the current latency performance. Thanks. Closing based on comment #39. If Cisco (or other customers) decide they prefer to enable stalld, then they can be given the information gathered here (of OS interruptions close to 100us when scheduling task out of CPU, then scheduling qemu-kvm-vcpu back in). Thanks Pei! There are two scenarios possible FWIU: - If you have stalld enabled and it is not boosting any kworker then there should not be any impact on the latency - If stalld is boosting any kworker that is getting starved then we have an issue that can cause unexpected behavior which IMO is not desirable Can you please confirm if the kworkers that are boosted in the guest are only doing the vmstat update work and nothing else? Thanks (In reply to Nitesh Narayan Lal from comment #43) > There are two scenarios possible FWIU: > > - If you have stalld enabled and it is not boosting any kworker then there > should not be any impact on the latency Correct. > > - If stalld is boosting any kworker that is getting starved then we have an > issue that can cause unexpected behavior which IMO is not desirable If the use-case requires < 40us maximum interruption (per window of time), then you have a problem. If the use-case can accept, say 150us per second interruption, then the schedule-out/schedule-in of qemu-kvm-vcpu thread (isolated) is not a problem (and enabling stalld by default makes sense). If the use-case requires < 40 us maximum interruption, but the user would like to enable stalld anyway, they can do so and monitor the logs. > > Can you please confirm if the kworkers that are boosted in the guest are only > doing the vmstat update work and nothing else? I am pretty sure they are (Pei is running oslat, and oslat only triggers vmstat_update). (In reply to Nitesh Narayan Lal from comment #43) > There are two scenarios possible FWIU: > > - If you have stalld enabled and it is not boosting any kworker then there > should not be any impact on the latency > > - If stalld is boosting any kworker that is getting starved then we have an > issue that can cause unexpected behavior which IMO is not desirable > > Can you please confirm if the kworkers that are boosted in the guest are only > doing the vmstat update work and nothing else? > > Thanks Had a conversation with Luiz and he came up with the following which makes sense: Before recommending stalld to customers (which should involve explanation of the consequences of switching between tasks), should make sure QE tests pass with < 40us, when stalld is enabled, under a "meaningful workload" (which we expect the customer to use). Reassigning to Luiz as well (because this is a task assigned to the team). (In reply to Nitesh Narayan Lal from comment #43) > There are two scenarios possible FWIU: > > - If you have stalld enabled and it is not boosting any kworker then there > should not be any impact on the latency > > - If stalld is boosting any kworker that is getting starved then we have an > issue that can cause unexpected behavior which IMO is not desirable > > Can you please confirm if the kworkers that are boosted in the guest are only > doing the vmstat update work and nothing else? > > Thanks Had a conversation with Luiz and he came up with the following which makes sense: Before recommending stalld to customers (which should involve explanation of the consequences of switching between tasks), should make sure QE tests pass with < 40us, when stalld is enabled, under a "meaningful workload" (which we expect the customer to use). Reassigning to Luiz as well (because this is a task assigned to the team). *** Bug 1921601 has been marked as a duplicate of this bug. *** After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. We may need stalld for RT-CNV, reopening. (In reply to Luiz Capitulino from comment #51) > We may need stalld for RT-CNV, reopening. Luiz, Actual results: Oslat max latency is 102us when enabling stalld service. Do you remember the latencies that Daniel Froehlich mentioned? Would be good to have his slides. On discussing this issue with Marcelo, we decided to close it because as stated in comment 53 oslat max latency with stalld is 102us but max latency threshold for a possible known customer for RT-CNV is 150us. |
Thanks Nitesh and Clark for above comments. I tried with the minimum value -r 8000. 12h oslat max latency is 56us. Testing update(2/2): 1) Test oslat with -w memmove -w 16K, stalld with -r 8000: 12h oslat max latency is 56us. (1)Single VM with 1 rt vCPU: Maximum: 49 (us) (2)Single VM with 8 rt vCPUs: Maximum: 45 44 46 45 44 44 48 47 (us) (3)Multiple VMs each with 1 rt vCPU: - VM1 Maximum: 12 (us) - VM2 Maximum: 56 (us) - VM3 Maximum: 50 (us) - VM4 Maximum: 51 (us) ==oslat cmd== scenario (1) and (3): /home/nfv-virt-rt-kvm/tools/oslat --cpu-list 1 --rtprio 1 --runtime 12h -w memmove -m 16K scenario (2): /home/nfv-virt-rt-kvm/tools/oslat --cpu-list 2,3,4,5,6,7,8,9 --rtprio 1 --runtime 12h -w memmove -m 16K ==Versions== kernel-rt-4.18.0-305.rt7.72.el8.x86_64 qemu-kvm-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64 tuned-2.15.0-2.el8.noarch libvirt-7.0.0-14.module+el8.4.0+10886+79296686.x86_64 python3-libvirt-7.0.0-1.module+el8.4.0+9469+2eaf72bc.x86_64 microcode_ctl-20210216-1.el8.x86_64 rt-tests-1.10-3.el8.x86_64 stalld-1.9-2.el8.x86_64