Bug 1833196
| Summary: | adaptive-tick appears to not be functioning as expected on RHEL 8.2 RT | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Troy Wilson <trwilson> | ||||||||
| Component: | kernel-rt | Assignee: | Juri Lelli <jlelli> | ||||||||
| kernel-rt sub component: | Scheduler | QA Contact: | Qiao Zhao <qzhao> | ||||||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||||||
| Severity: | medium | ||||||||||
| Priority: | unspecified | CC: | aklimov, atheurer, bhu, chuhu, daolivei, jlelli, jmario, jskarvad, kcarcia, lcapitulino, lgoncalv, mtosatti, pauld, peterx, pezhang, qzhao, williams | ||||||||
| Version: | 8.2 | Flags: | pm-rhel:
mirror+
|
||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | 8.0 | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | 4.18.0-233 | Doc Type: | If docs needed, set a value | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2020-11-04 02:25:17 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | 1833096 | ||||||||||
| Bug Blocks: | 1817732, 1823810 | ||||||||||
| Attachments: |
|
||||||||||
Hi Troy, did you already try running with the corresponding (non RT) RHEL kernel? Trying with RHEL might help us isolating where the problem is. Also, what about extending tracing to other interesting events (e.g. scheduling, other types or irqs)? So that we can collect more info about what caused the tick to start again. (In reply to Juri Lelli from comment #1) > Hi Troy, > > did you already try running with the corresponding (non RT) RHEL > kernel? Trying with RHEL might help us isolating where the problem > is. > > Also, what about extending tracing to other interesting events > (e.g. scheduling, other types or irqs)? So that we can collect > more info about what caused the tick to start again. I do not believe I am seeing the situation with a non-RT kernel [root@e25-h33-740xd ~]# perf stat -C 5 -e irq_vectors:local_timer_entry taskset --cpu-list 5 ./sysjitter/sysjitter --runtime 600 200 core_i: 5 threshold(ns): 200 cpu_mhz: 2096 runtime(ns): 599729456412 runtime(s): 599.729 int_n: 66 int_n_per_sec: 0.110 int_min(ns): 233 int_median(ns): 2935 int_mean(ns): 4474 int_90(ns): 8770 int_99(ns): 14707 int_999(ns): 14707 int_9999(ns): 14707 int_99999(ns): 14707 int_max(ns): 14707 int_total(ns): 295333 int_total(%): 0.000 Performance counter stats for 'CPU(s) 5': 35 irq_vectors:local_timer_entry 601.011019915 seconds time elapsed [root@e25-h33-740xd ~]# uname -a Linux e25-h33-740xd.alias.bos.scalelab.redhat.com 4.18.0-167.el8.x86_64 #1 SMP Sun Dec 15 01:24:23 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux [root@e25-h33-740xd ~]# I will look at tracing other events-- please let me know if there are any specific events you'd like traced. OK. I could replicate the issue on another test box. On RT, tick seems to still be disabled at some point, but then it is turned back on. Will first see if upstream shows the same behavior. Created attachment 1687600 [details]
RHEL trace with tick_stop events
Created attachment 1687601 [details]
RHEL-RT trace with tick_stop events
Created attachment 1687937 [details]
UPSTREAM (v5.6-rt) trace with tick_stop events
Upstream seems to be working OK. I noticed bz1833096. Wonder if the two issues are related. My current understanding is that we are missing a chance to
enter full_no_hz when softirqs are serviced. In RT, sched tick
activates ksoftirqd(s), the NOHZ SCHED dep is correctly cleared
after ksoftirqd goes to sleep (leaving only 1 task on the cpu, the
sysjitter one(s)), but tick stop is not evaluated again after this
happens.
Something like the following (not completely correct, but might be
a start) seems to fix the issue.
--->8---
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cfeab6ef0f163..d4c4f860562e4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1894,10 +1894,12 @@ static inline void sched_update_tick_dependency(struct rq *rq)
if (!tick_nohz_full_cpu(cpu))
return;
- if (sched_can_stop_tick(rq))
+ if (sched_can_stop_tick(rq)) {
tick_nohz_dep_clear_cpu(cpu, TICK_DEP_BIT_SCHED);
- else
+ tick_nohz_irq_exit();
+ } else {
tick_nohz_dep_set_cpu(cpu, TICK_DEP_BIT_SCHED);
+ }
}
#else
static inline int sched_tick_offload_init(void) { return 0; }
--->8---
Could you please test if things get better with following
kernel?
http://brew-task-repos.usersys.redhat.com/repos/scratch/jlelli/kernel-rt/4.18.0/198.rt5.10.el8.dyntick.1833196.test.cki.kt1/x86_64/
I've encountered a similar issue even using fifo:1 priority when testing with sysjitter2 (https://github.com/xzpeter/sysjitter2) in a RT virtual machine, where I can observe 1000hz * 10us latency conditionally. The brew in comment 8 seems to have fixed that for me (for a 1H run). [root@localhost ~]# ./sysjitter2 --cpu-list 2,3,4,5,6,7,8,9 --rtprio 1 --runtime 3600 Version: 2.01 core_i: 2 3 4 5 6 7 8 9 cpu_mhz: 2197 2197 2197 2197 2197 2197 2197 2197 001 (us): 93068541026 93068541249 93068541429 93068541411 93068541565 93068541296 93068541320 93068541409 002 (us): 51 51 51 51 51 51 51 51 003 (us): 2 2 2 2 2 2 2 2 004 (us): 4 4 8 3 26 3 4 2 005 (us): 40 39 35 41 18 42 40 41 006 (us): 1 1 1 0 0 0 1 1 007 (us): 0 0 0 0 0 0 0 0 008 (us): 0 0 0 0 0 0 0 0 009 (us): 0 0 0 0 0 0 0 0 010 (us): 0 0 0 0 0 0 0 0 011 (us): 0 0 0 0 0 0 0 0 012 (us): 0 0 0 0 0 0 0 0 013 (us): 0 0 0 0 0 0 0 0 014 (us): 0 0 0 0 0 0 0 0 015 (us): 0 0 0 0 0 0 0 0 016 (us): 0 0 0 0 0 0 0 0 017 (us): 0 0 0 0 0 0 0 0 018 (us): 0 0 0 0 0 0 0 0 019 (us): 0 0 0 0 0 0 0 0 020 (us): 0 0 0 0 0 0 0 0 021 (us): 0 0 0 0 0 0 0 0 022 (us): 0 0 0 0 0 0 0 0 023 (us): 0 0 0 0 0 0 0 0 024 (us): 0 0 0 0 0 0 0 0 025 (us): 0 0 0 0 0 0 0 0 026 (us): 0 0 0 0 0 0 0 0 027 (us): 0 0 0 0 0 0 0 0 028 (us): 0 0 0 0 0 0 0 0 029 (us): 0 0 0 0 0 0 0 0 030 (us): 0 0 0 0 0 0 0 0 031 (us): 0 0 0 0 0 0 0 0 032 (us): 0 0 0 0 0 0 0 0 maxlat: 6 6 6 5 5 5 6 6 (us) runtime: 3600.740 3600.740 3600.740 3600.740 3600.740 3600.740 3600.740 3600.740 (sec) The proposed fix in comment 8 also works for me: [root@e25-h33-740xd sysjitter]# ./sysjitter --runtime 600 --rtprio 95 --accept-cpuset --cores 6,8,10,12,14,16,18,20 --master-core 4 200 core_i: 6 8 10 12 14 16 18 20 threshold(ns): 200 200 200 200 200 200 200 200 cpu_mhz: 2096 2096 2096 2096 2096 2096 2096 2096 runtime(ns): 599734002777 599734002948 599734002886 599734003307 599734002693 599734002740 599734002851 599734002759 runtime(s): 599.734 599.734 599.734 599.734 599.734 599.734 599.734 599.734 int_n: 11 10 7 599315 8 7 5 6 int_n_per_sec: 0.018 0.017 0.012 999.301 0.013 0.012 0.008 0.010 int_min(ns): 208 211 225 205 243 211 216 203 int_median(ns): 284 326 638 2759 582 283 291 312 int_mean(ns): 15629734 10795656 6283980 2871 124946153 344 375 351 int_90(ns): 4072 107949676 43981736 3291 999562041 595 565 615 int_99(ns): 171920095 107949676 43981736 4104 999562041 595 565 615 int_999(ns): 171920095 107949676 43981736 4641 999562041 595 565 615 int_9999(ns): 171920095 107949676 43981736 5074 999562041 595 565 615 int_99999(ns): 171920095 107949676 43981736 5513 999562041 595 565 615 int_max(ns): 171920095 107949676 43981736 5647 999562041 595 565 615 int_total(ns): 171927085 107956573 43987864 1720828725 999569226 2415 1875 2106 int_total(%): 0.029 0.018 0.007 0.287 0.167 0.000 0.000 0.000 [root@e25-h33-740xd sysjitter]# RFC patch sent upstream https://lore.kernel.org/lkml/20200520140402.358880-1-juri.lelli@redhat.com/ *** Bug 1833096 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: kernel-rt security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4609 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |
Description of problem: Adaptive-tick appears to not be functioning as expected on RHEL 8.2 RT Version-Release number of selected component (if applicable): RHEL 8.2 Beta, 4.18.0-167.rt13.24.el8.x86_64 #1 SMP PREEMPT RT How reproducible: Very Steps to Reproduce: 1. boot kernel with nosmt nohz=on rcu_nocbs=<cores> nohz_full=<cores> 2. run workload with taskset matching cores specified above 3. observe interrupts/sec Actual results: 1000 int/sec Expected results: adaptive tick Additional info: Environment: [root@e25-h33-740xd ~]# uname -a Linux e25-h33-740xd.alias.bos.scalelab.redhat.com 4.18.0-167.rt13.24.el8.x86_64 #1 SMP PREEMPT RT Sun Dec 15 15:46:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux [root@e25-h33-740xd ~]# cat /etc/redhat-release Red Hat Enterprise Linux release 8.2 Beta (Ootpa) [root@e25-h33-740xd ~]# cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-167.rt13.24.el8.x86_64 root=/dev/mapper/vg_e25--h33--740xd-lv_root ro nofb quiet splash=quiet crashkernel=auto resume=/dev/mapper/vg_e25--h33--740xd-lv_swap rd.lvm.lv=vg_e25-h33-740xd/lv_root rd.lvm.lv=vg_e25-h33-740xd/lv_swap nohz=on nosoftlockup skew_tick=1 intel_pstate=disable intel_iommu=on iommu=pt rcu_nocbs=4-31 default_hugepagesz=1G hugepagesz=1G hugepages=10 nosmt nohz_full=4-31 [root@e25-h33-740xd ~]# cat /sys/devices/system/cpu/nohz_full 4-31 [root@e25-h33-740xd ~]# dmesg | grep dyntick [ 0.001000] NO_HZ: Full dynticks CPUs: 4-31. [root@e25-h33-740xd ~]# tuned-adm active Current active profile: cpu-partitioning [root@e25-h33-740xd ~]# cat /etc/tuned/cpu-partitioning-variables.conf # Examples: # isolated_cores=2,4-7 # isolated_cores=2-23 # isolated_cores=4-31 no_balance_cores=4-31 [root@e25-h33-740xd ~]# Reproduction: [root@e25-h33-740xd ~]# perf stat -C 4-31 -e irq_vectors:local_timer_entry taskset --cpu-list 4-31 ./sysjitter/sysjitter --runtime 30 200 core_i: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 threshold(ns): 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 cpu_mhz: 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 runtime(ns): 29950328431 29950326261 29950326962 29950327898 29950328522 29950328322 29950326619 29950326795 29950326774 29950327259 29950329591 29950327421 29950326418 29950328252 29950327661 29950326687 29950327558 29950327755 29950326878 29950326352 29950326875 29950327803 29950326263 29950326194 29950327019 29950329613 29950328285 29950327259 runtime(s): 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 int_n: 30050 30050 30050 30049 30049 30050 30050 30051 30050 30050 30050 30051 30050 30051 30050 30050 30050 30051 30050 30050 30050 30051 30050 30050 30050 30051 30050 30050 int_n_per_sec: 1003.328 1003.328 1003.328 1003.294 1003.294 1003.328 1003.328 1003.361 1003.328 1003.328 1003.328 1003.361 1003.328 1003.361 1003.328 1003.328 1003.328 1003.361 1003.328 1003.328 1003.328 1003.361 1003.328 1003.328 1003.328 1003.361 1003.328 1003.328 int_min(ns): 2196 2563 2213 2531 2170 405 2197 619 2075 2246 2192 544 2001 474 2041 2472 2273 266 2040 2284 2241 435 2012 2328 2166 516 1958 2331 int_median(ns): 5652 5640 5716 5530 5590 5506 5544 5447 5489 5413 5479 5382 5520 5433 5465 5408 5482 5423 5527 5423 5495 5415 5479 5415 5493 5424 5476 5414 int_mean(ns): 5663 5643 5729 5529 5604 5511 5547 5445 5496 5411 5478 5375 5522 5431 5464 5406 5485 5418 5534 5423 5499 5411 5481 5416 5496 5421 5479 5420 int_90(ns): 5818 5766 5906 5670 5774 5640 5691 5561 5663 5520 5625 5488 5664 5561 5605 5539 5641 5531 5690 5552 5636 5528 5624 5544 5638 5530 5624 5554 int_99(ns): 6067 5997 6148 5883 6012 5864 5927 5780 5923 5740 5860 5739 5904 5805 5846 5763 5872 5751 5918 5763 5875 5765 5871 5806 5879 5750 5871 5783 int_999(ns): 6308 6211 6470 6162 6325 6100 6184 5969 6142 5949 6126 5967 6142 6074 6010 5974 6116 5952 6134 6012 6069 6046 6134 6102 6111 5982 6094 6098 int_9999(ns): 8800 9522 9053 9609 9129 9496 8777 9726 8940 9362 9127 9217 8962 9380 8969 9721 9064 9237 9137 9218 8994 9405 9207 9344 9339 9481 8958 9346 int_99999(ns): 10171 11753 9681 13806 12161 13978 9564 11625 9692 11232 9487 9718 9792 11883 9646 10784 9433 11566 10078 11617 9852 11904 9692 12270 10251 11328 9461 9842 int_max(ns): 10171 11753 9681 13806 12161 13978 9564 11625 9692 11232 9487 9718 9792 11883 9646 10784 9433 11566 10078 11617 9852 11904 9692 12270 10251 11328 9461 9842 int_total(ns): 170185201 169598860 172168158 166161221 168398867 165642447 166698343 163648149 165179138 162625778 164624939 161559322 165965275 163239238 164225808 162458674 164839346 162832061 166329121 162987361 165246431 162630280 164732554 162770577 165185805 162923791 164658672 162890787 int_total(%): 0.568 0.566 0.575 0.555 0.562 0.553 0.557 0.546 0.552 0.543 0.550 0.539 0.554 0.545 0.548 0.542 0.550 0.544 0.555 0.544 0.552 0.543 0.550 0.543 0.552 0.544 0.550 0.544 Performance counter stats for 'CPU(s) 4-31': 871,129 irq_vectors:local_timer_entry 31.109521796 seconds time elapsed [root@e25-h33-740xd ~]# Cores 4-31 is 27 cores 31.1 seconds runtime 871,129 timer ticks = ~1037 ticks/sec per core [root@e25-h33-740xd ~]# perf stat -C 5 -e irq_vectors:local_timer_entry taskset --cpu-list 5 ./sysjitter/sysjitter --runtime 600 200 core_i: 5 threshold(ns): 200 cpu_mhz: 2096 runtime(ns): 599730049348 runtime(s): 599.730 int_n: 602347 int_n_per_sec: 1004.364 int_min(ns): 1955 int_median(ns): 5284 int_mean(ns): 5286 int_90(ns): 5363 int_99(ns): 5614 int_999(ns): 5826 int_9999(ns): 8361 int_99999(ns): 9633 int_max(ns): 11386 int_total(ns): 3184185454 int_total(%): 0.531 Performance counter stats for 'CPU(s) 5': 601,097 irq_vectors:local_timer_entry 601.095113026 seconds time elapsed [root@e25-h33-740xd ~]# Tracing shows tick_stop sysjitter-3797 [008] d.L.... 13747.698714: tick_stop: success=0 dependency=SCHED sysjitter-3798 [009] d.L.... 13747.698721: tick_stop: success=0 dependency=SCHED sysjitter-3798 [009] d.L.... 13747.698722: tick_stop: success=0 dependency=SCHED sysjitter-3799 [010] d.L.... 13747.698729: tick_stop: success=0 dependency=SCHED sysjitter-3799 [010] d.L.... 13747.698730: tick_stop: success=0 dependency=SCHED sysjitter-3800 [011] d.L.... 13747.698737: tick_stop: success=0 dependency=SCHED sysjitter-3800 [011] d.L.... 13747.698737: tick_stop: success=0 dependency=SCHED sysjitter-3801 [012] d.L.... 13747.698745: tick_stop: success=0 dependency=SCHED sysjitter-3801 [012] d.L.... 13747.698745: tick_stop: success=0 dependency=SCHED sysjitter-3802 [013] d.L.... 13747.698752: tick_stop: success=0 dependency=SCHED sysjitter-3802 [013] d.L.... 13747.698753: tick_stop: success=0 dependency=SCHED sysjitter-3803 [014] d.L.... 13747.698760: tick_stop: success=0 dependency=SCHED sysjitter-3803 [014] d.L.... 13747.698761: tick_stop: success=0 dependency=SCHED sysjitter-3804 [015] d.L.... 13747.698768: tick_stop: success=0 dependency=SCHED sysjitter-3804 [015] d.L.... 13747.698769: tick_stop: success=0 dependency=SCHED sysjitter-3805 [016] d.L.... 13747.698776: tick_stop: success=0 dependency=SCHED sysjitter-3805 [016] d.L.... 13747.698776: tick_stop: success=0 dependency=SCHED sysjitter-3806 [017] d.L.... 13747.698784: tick_stop: success=0 dependency=SCHED sysjitter-3806 [017] d.L.... 13747.698784: tick_stop: success=0 dependency=SCHED sysjitter-3807 [018] d.L.... 13747.698791: tick_stop: success=0 dependency=SCHED sysjitter-3807 [018] d.L.... 13747.698792: tick_stop: success=0 dependency=SCHED sysjitter-3808 [019] d.L.... 13747.698799: tick_stop: success=0 dependency=SCHED sysjitter-3808 [019] d.L.... 13747.698800: tick_stop: success=0 dependency=SCHED sysjitter-3809 [020] d.L.... 13747.698807: tick_stop: success=0 dependency=SCHED sysjitter-3809 [020] d.L.... 13747.698808: tick_stop: success=0 dependency=SCHED sysjitter-3810 [021] d.L.... 13747.698815: tick_stop: success=0 dependency=SCHED sysjitter-3810 [021] d.L.... 13747.698815: tick_stop: success=0 dependency=SCHED sysjitter-3811 [022] d.L.... 13747.698823: tick_stop: success=0 dependency=SCHED sysjitter-3811 [022] d.L.... 13747.698823: tick_stop: success=0 dependency=SCHED sysjitter-3812 [023] d.L.... 13747.698830: tick_stop: success=0 dependency=SCHED sysjitter-3812 [023] d.L.... 13747.698831: tick_stop: success=0 dependency=SCHED sysjitter-3813 [024] d.L.... 13747.698838: tick_stop: success=0 dependency=SCHED sysjitter-3813 [024] d.L.... 13747.698839: tick_stop: success=0 dependency=SCHED sysjitter-3814 [025] d.L.... 13747.698846: tick_stop: success=0 dependency=SCHED sysjitter-3814 [025] d.L.... 13747.698847: tick_stop: success=0 dependency=SCHED sysjitter-3815 [026] d.L.... 13747.698854: tick_stop: success=0 dependency=SCHED