Bug 1833196

Summary:

adaptive-tick appears to not be functioning as expected on RHEL 8.2 RT

Product:

Red Hat Enterprise Linux 8

Reporter:

Troy Wilson <trwilson>

Component:

kernel-rt

Assignee:

Juri Lelli <jlelli>

kernel-rt sub component:

Scheduler

QA Contact:

Qiao Zhao <qzhao>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

medium

Priority:

unspecified

CC:

aklimov, atheurer, bhu, chuhu, daolivei, jlelli, jmario, jskarvad, kcarcia, lcapitulino, lgoncalv, mtosatti, pauld, peterx, pezhang, qzhao, williams

Version:

8.2

Target Milestone:

Target Release:

8.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

4.18.0-233

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-11-04 02:25:17 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1833096

Bug Blocks:

1817732, 1823810

Attachments:

Description	Flags
RHEL trace with tick_stop events	none
RHEL-RT trace with tick_stop events	none
UPSTREAM (v5.6-rt) trace with tick_stop events	none

Description Troy Wilson 2020-05-08 03:23:31 UTC

Description of problem:
Adaptive-tick appears to not be functioning as expected on RHEL 8.2 RT

Version-Release number of selected component (if applicable):
RHEL 8.2 Beta,  4.18.0-167.rt13.24.el8.x86_64 #1 SMP PREEMPT RT

How reproducible:
Very

Steps to Reproduce:
1. boot kernel with nosmt nohz=on rcu_nocbs=<cores> nohz_full=<cores>
2. run workload with taskset matching cores specified above
3. observe interrupts/sec

Actual results:
1000 int/sec

Expected results:
adaptive tick

Additional info:

Environment:
 
[root@e25-h33-740xd ~]# uname -a
Linux e25-h33-740xd.alias.bos.scalelab.redhat.com 4.18.0-167.rt13.24.el8.x86_64 #1 SMP PREEMPT RT Sun Dec 15 15:46:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@e25-h33-740xd ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.2 Beta (Ootpa)
[root@e25-h33-740xd ~]# cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-167.rt13.24.el8.x86_64 root=/dev/mapper/vg_e25--h33--740xd-lv_root ro nofb quiet splash=quiet crashkernel=auto resume=/dev/mapper/vg_e25--h33--740xd-lv_swap rd.lvm.lv=vg_e25-h33-740xd/lv_root rd.lvm.lv=vg_e25-h33-740xd/lv_swap nohz=on nosoftlockup skew_tick=1 intel_pstate=disable intel_iommu=on iommu=pt rcu_nocbs=4-31 default_hugepagesz=1G hugepagesz=1G hugepages=10 nosmt nohz_full=4-31
[root@e25-h33-740xd ~]# cat /sys/devices/system/cpu/nohz_full 
4-31
[root@e25-h33-740xd ~]# dmesg | grep dyntick
[    0.001000] NO_HZ: Full dynticks CPUs: 4-31.
[root@e25-h33-740xd ~]# tuned-adm active
Current active profile: cpu-partitioning
[root@e25-h33-740xd ~]# cat  /etc/tuned/cpu-partitioning-variables.conf 
# Examples:
# isolated_cores=2,4-7
# isolated_cores=2-23
#
isolated_cores=4-31
no_balance_cores=4-31
[root@e25-h33-740xd ~]# 

Reproduction:

[root@e25-h33-740xd ~]# perf stat -C 4-31 -e irq_vectors:local_timer_entry taskset --cpu-list 4-31 ./sysjitter/sysjitter --runtime 30  200
core_i: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
threshold(ns): 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200
cpu_mhz: 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096
runtime(ns): 29950328431 29950326261 29950326962 29950327898 29950328522 29950328322 29950326619 29950326795 29950326774 29950327259 29950329591 29950327421 29950326418 29950328252 29950327661 29950326687 29950327558 29950327755 29950326878 29950326352 29950326875 29950327803 29950326263 29950326194 29950327019 29950329613 29950328285 29950327259
runtime(s): 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950 29.950
int_n: 30050 30050 30050 30049 30049 30050 30050 30051 30050 30050 30050 30051 30050 30051 30050 30050 30050 30051 30050 30050 30050 30051 30050 30050 30050 30051 30050 30050
int_n_per_sec: 1003.328 1003.328 1003.328 1003.294 1003.294 1003.328 1003.328 1003.361 1003.328 1003.328 1003.328 1003.361 1003.328 1003.361 1003.328 1003.328 1003.328 1003.361 1003.328 1003.328 1003.328 1003.361 1003.328 1003.328 1003.328 1003.361 1003.328 1003.328
int_min(ns): 2196 2563 2213 2531 2170 405 2197 619 2075 2246 2192 544 2001 474 2041 2472 2273 266 2040 2284 2241 435 2012 2328 2166 516 1958 2331
int_median(ns): 5652 5640 5716 5530 5590 5506 5544 5447 5489 5413 5479 5382 5520 5433 5465 5408 5482 5423 5527 5423 5495 5415 5479 5415 5493 5424 5476 5414
int_mean(ns): 5663 5643 5729 5529 5604 5511 5547 5445 5496 5411 5478 5375 5522 5431 5464 5406 5485 5418 5534 5423 5499 5411 5481 5416 5496 5421 5479 5420
int_90(ns): 5818 5766 5906 5670 5774 5640 5691 5561 5663 5520 5625 5488 5664 5561 5605 5539 5641 5531 5690 5552 5636 5528 5624 5544 5638 5530 5624 5554
int_99(ns): 6067 5997 6148 5883 6012 5864 5927 5780 5923 5740 5860 5739 5904 5805 5846 5763 5872 5751 5918 5763 5875 5765 5871 5806 5879 5750 5871 5783
int_999(ns): 6308 6211 6470 6162 6325 6100 6184 5969 6142 5949 6126 5967 6142 6074 6010 5974 6116 5952 6134 6012 6069 6046 6134 6102 6111 5982 6094 6098
int_9999(ns): 8800 9522 9053 9609 9129 9496 8777 9726 8940 9362 9127 9217 8962 9380 8969 9721 9064 9237 9137 9218 8994 9405 9207 9344 9339 9481 8958 9346
int_99999(ns): 10171 11753 9681 13806 12161 13978 9564 11625 9692 11232 9487 9718 9792 11883 9646 10784 9433 11566 10078 11617 9852 11904 9692 12270 10251 11328 9461 9842
int_max(ns): 10171 11753 9681 13806 12161 13978 9564 11625 9692 11232 9487 9718 9792 11883 9646 10784 9433 11566 10078 11617 9852 11904 9692 12270 10251 11328 9461 9842
int_total(ns): 170185201 169598860 172168158 166161221 168398867 165642447 166698343 163648149 165179138 162625778 164624939 161559322 165965275 163239238 164225808 162458674 164839346 162832061 166329121 162987361 165246431 162630280 164732554 162770577 165185805 162923791 164658672 162890787
int_total(%): 0.568 0.566 0.575 0.555 0.562 0.553 0.557 0.546 0.552 0.543 0.550 0.539 0.554 0.545 0.548 0.542 0.550 0.544 0.555 0.544 0.552 0.543 0.550 0.543 0.552 0.544 0.550 0.544
 
 Performance counter stats for 'CPU(s) 4-31':
 
           871,129      irq_vectors:local_timer_entry                                   
 
      31.109521796 seconds time elapsed
 
[root@e25-h33-740xd ~]#

Cores 4-31 is 27 cores
31.1 seconds runtime
871,129 timer ticks
= ~1037 ticks/sec per core


[root@e25-h33-740xd ~]# perf stat -C 5 -e irq_vectors:local_timer_entry taskset --cpu-list 5 ./sysjitter/sysjitter --runtime 600  200
core_i: 5
threshold(ns): 200
cpu_mhz: 2096
runtime(ns): 599730049348
runtime(s): 599.730
int_n: 602347
int_n_per_sec: 1004.364
int_min(ns): 1955
int_median(ns): 5284
int_mean(ns): 5286
int_90(ns): 5363
int_99(ns): 5614
int_999(ns): 5826
int_9999(ns): 8361
int_99999(ns): 9633
int_max(ns): 11386
int_total(ns): 3184185454
int_total(%): 0.531
 
 Performance counter stats for 'CPU(s) 5':
 
           601,097      irq_vectors:local_timer_entry                                   
 
     601.095113026 seconds time elapsed
 
[root@e25-h33-740xd ~]#

Tracing shows tick_stop

       sysjitter-3797  [008] d.L.... 13747.698714: tick_stop: success=0 dependency=SCHED
       sysjitter-3798  [009] d.L.... 13747.698721: tick_stop: success=0 dependency=SCHED
       sysjitter-3798  [009] d.L.... 13747.698722: tick_stop: success=0 dependency=SCHED
       sysjitter-3799  [010] d.L.... 13747.698729: tick_stop: success=0 dependency=SCHED
       sysjitter-3799  [010] d.L.... 13747.698730: tick_stop: success=0 dependency=SCHED
       sysjitter-3800  [011] d.L.... 13747.698737: tick_stop: success=0 dependency=SCHED
       sysjitter-3800  [011] d.L.... 13747.698737: tick_stop: success=0 dependency=SCHED
       sysjitter-3801  [012] d.L.... 13747.698745: tick_stop: success=0 dependency=SCHED
       sysjitter-3801  [012] d.L.... 13747.698745: tick_stop: success=0 dependency=SCHED
       sysjitter-3802  [013] d.L.... 13747.698752: tick_stop: success=0 dependency=SCHED
       sysjitter-3802  [013] d.L.... 13747.698753: tick_stop: success=0 dependency=SCHED
       sysjitter-3803  [014] d.L.... 13747.698760: tick_stop: success=0 dependency=SCHED
       sysjitter-3803  [014] d.L.... 13747.698761: tick_stop: success=0 dependency=SCHED
       sysjitter-3804  [015] d.L.... 13747.698768: tick_stop: success=0 dependency=SCHED
       sysjitter-3804  [015] d.L.... 13747.698769: tick_stop: success=0 dependency=SCHED
       sysjitter-3805  [016] d.L.... 13747.698776: tick_stop: success=0 dependency=SCHED
       sysjitter-3805  [016] d.L.... 13747.698776: tick_stop: success=0 dependency=SCHED
       sysjitter-3806  [017] d.L.... 13747.698784: tick_stop: success=0 dependency=SCHED
       sysjitter-3806  [017] d.L.... 13747.698784: tick_stop: success=0 dependency=SCHED
       sysjitter-3807  [018] d.L.... 13747.698791: tick_stop: success=0 dependency=SCHED
       sysjitter-3807  [018] d.L.... 13747.698792: tick_stop: success=0 dependency=SCHED
       sysjitter-3808  [019] d.L.... 13747.698799: tick_stop: success=0 dependency=SCHED
       sysjitter-3808  [019] d.L.... 13747.698800: tick_stop: success=0 dependency=SCHED
       sysjitter-3809  [020] d.L.... 13747.698807: tick_stop: success=0 dependency=SCHED
       sysjitter-3809  [020] d.L.... 13747.698808: tick_stop: success=0 dependency=SCHED
       sysjitter-3810  [021] d.L.... 13747.698815: tick_stop: success=0 dependency=SCHED
       sysjitter-3810  [021] d.L.... 13747.698815: tick_stop: success=0 dependency=SCHED
       sysjitter-3811  [022] d.L.... 13747.698823: tick_stop: success=0 dependency=SCHED
       sysjitter-3811  [022] d.L.... 13747.698823: tick_stop: success=0 dependency=SCHED
       sysjitter-3812  [023] d.L.... 13747.698830: tick_stop: success=0 dependency=SCHED
       sysjitter-3812  [023] d.L.... 13747.698831: tick_stop: success=0 dependency=SCHED
       sysjitter-3813  [024] d.L.... 13747.698838: tick_stop: success=0 dependency=SCHED
       sysjitter-3813  [024] d.L.... 13747.698839: tick_stop: success=0 dependency=SCHED
       sysjitter-3814  [025] d.L.... 13747.698846: tick_stop: success=0 dependency=SCHED
       sysjitter-3814  [025] d.L.... 13747.698847: tick_stop: success=0 dependency=SCHED
       sysjitter-3815  [026] d.L.... 13747.698854: tick_stop: success=0 dependency=SCHED

Comment 1 Juri Lelli 2020-05-11 06:14:14 UTC

Hi Troy,

did you already try running with the corresponding (non RT) RHEL
kernel? Trying with RHEL might help us isolating where the problem
is.

Also, what about extending tracing to other interesting events
(e.g. scheduling, other types or irqs)? So that we can collect
more info about what caused the tick to start again.

Comment 2 Troy Wilson 2020-05-11 16:00:06 UTC

(In reply to Juri Lelli from comment #1)
> Hi Troy,
> 
> did you already try running with the corresponding (non RT) RHEL
> kernel? Trying with RHEL might help us isolating where the problem
> is.
> 
> Also, what about extending tracing to other interesting events
> (e.g. scheduling, other types or irqs)? So that we can collect
> more info about what caused the tick to start again.

I do not believe I am seeing the situation with a non-RT kernel

[root@e25-h33-740xd ~]# perf stat -C 5 -e irq_vectors:local_timer_entry taskset --cpu-list 5 ./sysjitter/sysjitter --runtime 600  200
core_i: 5
threshold(ns): 200
cpu_mhz: 2096
runtime(ns): 599729456412
runtime(s): 599.729
int_n: 66
int_n_per_sec: 0.110
int_min(ns): 233
int_median(ns): 2935
int_mean(ns): 4474
int_90(ns): 8770
int_99(ns): 14707
int_999(ns): 14707
int_9999(ns): 14707
int_99999(ns): 14707
int_max(ns): 14707
int_total(ns): 295333
int_total(%): 0.000

 Performance counter stats for 'CPU(s) 5':

                35      irq_vectors:local_timer_entry                                   

     601.011019915 seconds time elapsed

[root@e25-h33-740xd ~]# uname -a
Linux e25-h33-740xd.alias.bos.scalelab.redhat.com 4.18.0-167.el8.x86_64 #1 SMP Sun Dec 15 01:24:23 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@e25-h33-740xd ~]#

I will look at tracing other events-- please let me know if there are any specific events you'd like traced.

Comment 3 Juri Lelli 2020-05-12 09:23:37 UTC

OK. I could replicate the issue on another test box.

On RT, tick seems to still be disabled at some point, but then it
is turned back on.

Will first see if upstream shows the same behavior.

Comment 4 Juri Lelli 2020-05-12 09:26:23 UTC

Created attachment 1687600 [details]
RHEL trace with tick_stop events

Comment 5 Juri Lelli 2020-05-12 09:27:16 UTC

Created attachment 1687601 [details]
RHEL-RT trace with tick_stop events

Comment 6 Juri Lelli 2020-05-13 07:07:55 UTC

Created attachment 1687937 [details]
UPSTREAM (v5.6-rt) trace with tick_stop events

Comment 7 Juri Lelli 2020-05-13 07:09:24 UTC

Upstream seems to be working OK.

I noticed bz1833096. Wonder if the two issues are related.

Comment 8 Juri Lelli 2020-05-14 11:02:37 UTC

My current understanding is that we are missing a chance to
enter full_no_hz when softirqs are serviced. In RT, sched tick
activates ksoftirqd(s), the NOHZ SCHED dep is correctly cleared
after ksoftirqd goes to sleep (leaving only 1 task on the cpu, the
sysjitter one(s)), but tick stop is not evaluated again after this
happens.

Something like the following (not completely correct, but might be
a start) seems to fix the issue.

--->8---
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cfeab6ef0f163..d4c4f860562e4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1894,10 +1894,12 @@ static inline void sched_update_tick_dependency(struct rq *rq)
        if (!tick_nohz_full_cpu(cpu))
                return;

-       if (sched_can_stop_tick(rq))
+       if (sched_can_stop_tick(rq)) {
                tick_nohz_dep_clear_cpu(cpu, TICK_DEP_BIT_SCHED);
-       else
+               tick_nohz_irq_exit();
+       } else {
                tick_nohz_dep_set_cpu(cpu, TICK_DEP_BIT_SCHED);
+       }
 }
 #else
 static inline int sched_tick_offload_init(void) { return 0; }
--->8---

Could you please test if things get better with following
kernel?

http://brew-task-repos.usersys.redhat.com/repos/scratch/jlelli/kernel-rt/4.18.0/198.rt5.10.el8.dyntick.1833196.test.cki.kt1/x86_64/

Comment 9 Peter Xu 2020-05-18 14:01:35 UTC

I've encountered a similar issue even using fifo:1 priority when testing with sysjitter2 (https://github.com/xzpeter/sysjitter2) in a RT virtual machine, where I can observe 1000hz * 10us latency conditionally.  The brew in comment 8 seems to have fixed that for me (for a 1H run).

    [root@localhost ~]# ./sysjitter2 --cpu-list 2,3,4,5,6,7,8,9 --rtprio 1 --runtime 3600

        Version: 2.01

        core_i:    2 3 4 5 6 7 8 9
        cpu_mhz:    2197 2197 2197 2197 2197 2197 2197 2197
        001 (us):    93068541026 93068541249 93068541429 93068541411 93068541565 93068541296 93068541320 93068541409
        002 (us):    51 51 51 51 51 51 51 51
        003 (us):    2 2 2 2 2 2 2 2
        004 (us):    4 4 8 3 26 3 4 2
        005 (us):    40 39 35 41 18 42 40 41
        006 (us):    1 1 1 0 0 0 1 1
        007 (us):    0 0 0 0 0 0 0 0
        008 (us):    0 0 0 0 0 0 0 0
        009 (us):    0 0 0 0 0 0 0 0
        010 (us):    0 0 0 0 0 0 0 0
        011 (us):    0 0 0 0 0 0 0 0
        012 (us):    0 0 0 0 0 0 0 0
        013 (us):    0 0 0 0 0 0 0 0
        014 (us):    0 0 0 0 0 0 0 0
        015 (us):    0 0 0 0 0 0 0 0
        016 (us):    0 0 0 0 0 0 0 0
        017 (us):    0 0 0 0 0 0 0 0
        018 (us):    0 0 0 0 0 0 0 0
        019 (us):    0 0 0 0 0 0 0 0
        020 (us):    0 0 0 0 0 0 0 0
        021 (us):    0 0 0 0 0 0 0 0
        022 (us):    0 0 0 0 0 0 0 0
        023 (us):    0 0 0 0 0 0 0 0
        024 (us):    0 0 0 0 0 0 0 0
        025 (us):    0 0 0 0 0 0 0 0
        026 (us):    0 0 0 0 0 0 0 0
        027 (us):    0 0 0 0 0 0 0 0
        028 (us):    0 0 0 0 0 0 0 0
        029 (us):    0 0 0 0 0 0 0 0
        030 (us):    0 0 0 0 0 0 0 0
        031 (us):    0 0 0 0 0 0 0 0
        032 (us):    0 0 0 0 0 0 0 0
        maxlat:    6 6 6 5 5 5 6 6 (us)
        runtime:    3600.740 3600.740 3600.740 3600.740 3600.740 3600.740 3600.740 3600.740 (sec)

Comment 10 Troy Wilson 2020-05-18 15:29:20 UTC

The proposed fix in comment 8 also works for me:

[root@e25-h33-740xd sysjitter]# ./sysjitter --runtime 600 --rtprio 95 --accept-cpuset --cores 6,8,10,12,14,16,18,20 --master-core 4 200
core_i: 6 8 10 12 14 16 18 20
threshold(ns): 200 200 200 200 200 200 200 200
cpu_mhz: 2096 2096 2096 2096 2096 2096 2096 2096
runtime(ns): 599734002777 599734002948 599734002886 599734003307 599734002693 599734002740 599734002851 599734002759
runtime(s): 599.734 599.734 599.734 599.734 599.734 599.734 599.734 599.734
int_n: 11 10 7 599315 8 7 5 6
int_n_per_sec: 0.018 0.017 0.012 999.301 0.013 0.012 0.008 0.010
int_min(ns): 208 211 225 205 243 211 216 203
int_median(ns): 284 326 638 2759 582 283 291 312
int_mean(ns): 15629734 10795656 6283980 2871 124946153 344 375 351
int_90(ns): 4072 107949676 43981736 3291 999562041 595 565 615
int_99(ns): 171920095 107949676 43981736 4104 999562041 595 565 615
int_999(ns): 171920095 107949676 43981736 4641 999562041 595 565 615
int_9999(ns): 171920095 107949676 43981736 5074 999562041 595 565 615
int_99999(ns): 171920095 107949676 43981736 5513 999562041 595 565 615
int_max(ns): 171920095 107949676 43981736 5647 999562041 595 565 615
int_total(ns): 171927085 107956573 43987864 1720828725 999569226 2415 1875 2106
int_total(%): 0.029 0.018 0.007 0.287 0.167 0.000 0.000 0.000
[root@e25-h33-740xd sysjitter]#

Comment 11 Juri Lelli 2020-05-20 14:06:43 UTC

RFC patch sent upstream

https://lore.kernel.org/lkml/20200520140402.358880-1-juri.lelli@redhat.com/

Comment 12 Marcelo Tosatti 2020-05-25 16:34:15 UTC

*** Bug 1833096 has been marked as a duplicate of this bug. ***

Comment 53 errata-xmlrpc 2020-11-04 02:25:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel-rt security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4609

Comment 54 Red Hat Bugzilla 2023-09-15 01:29:32 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days