Bug 438205

Summary: high cputime values for softirq-timer
Product: Red Hat Enterprise MRG Reporter: IBM Bug Proxy <bugproxy>
Component: realtime-kernelAssignee: Steven Rostedt <srostedt>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: betaCC: bhu, lgoncalv, williams
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-23 19:29:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch to move calc_load and update_wall_time out of softirq-timer context
none
patch to move calc_load and update_wall_time out of softirq-timer context none

Description IBM Bug Proxy 2008-03-19 17:49:54 UTC
=Comment: #0=================================================
John G. Stultz <jstultz.com> - 2008-03-18 14:26 EDT
Problem description:

Alan noticed oddly high cputime values for
softirq-timer when running Sripathi's 2.6.24.1-24ibmrt kernel. I've reproduced
this with that kernel, and wanted to open this bug.

I'm now working to reproduce the issue w/ the 2.6.24.3-29ibmrt2.0 kernel.
=Comment: #1=================================================
John G. Stultz <jstultz.com> - 2008-03-18 14:58 EDT
So I've managed to repeat this issue on a idle system running 2.6.24.3-29ibm2.0
kernel.

$ ps -eLo  bsdtime,rtprio,comm| sort | tail
  0:00      - xinetd
  0:00      - yum-updatesd
  0:01     50 softirq-timer/1
  0:01     50 softirq-timer/2
  0:01     50 softirq-timer/3
  0:01      - hald
  0:01      - udevd
  0:03      - init
  0:04     50 softirq-timer/0
  TIME RTPRIO COMMAND

[jstultz@elm3b207 ~]$ uptime
 14:54:40 up 34 min,  1 user,  load average: 0.01, 0.02, 0.00


So from here we can see softirq-timer/0 has reportedly run for 4 minutes in the
34 minutes of uptime since the system booted (so that's more then 10% of the
cputime). The softirq-timer's on other cpus are not as high, but still up there
(~3%).


=Comment: #3=================================================
John G. Stultz <jstultz.com> - 2008-03-18 22:43 EDT
Looking at the code, I'm suspecting the deferred update_wall_time and calc_load
work that is done in the softirq-timer may be partly responsible. In this case
the xtime_lock must be re-aquried and could cause added contention. 

I'm testing a patch that moves the update_wall_time() and calc_load()
functionality back into the hard-timer_interrupt context. This may increase
worst case hard-interrupt preemption, but it may also reduce lock-wait delays.
=Comment: #4=================================================
John G. Stultz <jstultz.com> - 2008-03-18 22:44 EDT

patch to move calc_load and update_wall_time out of softirq-timer context

Here's the patch I'm testing. Right off it doesn't seem to make much difference
either way, so it might be a subtle change if anything.
=Comment: #5=================================================
John G. Stultz <jstultz.com> - 2008-03-18 22:46 EDT
Although, it should be noted that the patch did reduce the cputime of the
softirq-timer/0. Now all cpus have aproximately the same cputime (4 minutes over
1.25 hours), where as before softirq-timer/0 usually moved much faster.

Comment 1 IBM Bug Proxy 2008-03-19 17:49:57 UTC
Created attachment 298553 [details]
patch to move calc_load and update_wall_time out of softirq-timer context

Comment 2 IBM Bug Proxy 2008-03-27 20:24:59 UTC
------- Comment From jstultz.com 2008-03-27 16:22 EDT-------
I enabled timer_stats and looked at what a basically idle system was doing over
10 seconds. I've removed all entries that triggered less then 10 times.

Timer Stats Version: v0.2
Sample period: 10.001 s
10002,    30 softirq-timer/2  tick_setup_sched_timer (tick_sched_timer)
10002,    42 softirq-timer/3  tick_setup_sched_timer (tick_sched_timer)
10001,     6 softirq-timer/0  tick_setup_sched_timer (tick_sched_timer)
10001,    18 softirq-timer/1  tick_setup_sched_timer (tick_sched_timer)
10,  2970 ntpd             do_setitimer (it_real_fn)
10,    63 ftraced          schedule_timeout (process_timeout)
20,     1 swapper          clocksource_register (clocksource_watchdog)
100,  1091 kpktgend_0       schedule_timeout (process_timeout)
100,  1092 kpktgend_1       schedule_timeout (process_timeout)
100,  1093 kpktgend_2       schedule_timeout (process_timeout)
100,  1094 kpktgend_3       schedule_timeout (process_timeout)
100,  3154 yum-updatesd     schedule_timeout (process_timeout)
10,  2902 automount        futex_wait (hrtimer_wakeup)
10,     1 swapper          ip_vs_control_init (delayed_work_timer_fn)
10D,     1 swapper          queue_delayed_work_on (delayed_work_timer_fn)
10D,     1 swapper          queue_delayed_work_on (delayed_work_timer_fn)
10D,     1 swapper          queue_delayed_work_on (delayed_work_timer_fn)
10D,     1 swapper          queue_delayed_work_on (delayed_work_timer_fn)
10,  2970 ntpd             do_adjtimex (sync_cmos_clock)
10,  2899 pcscd            do_nanosleep (hrtimer_wakeup)
10,  2643 ip               tg3_open (tg3_timer)
40687 total events, 4068.293 events/sec

So you can see the softirq-timers are triggering every HZ timer tick. However it
seems this isn't quite the data I was hoping to collect, as the softirq-timers
should be expiring timers, not running them.

Apparently the timer_stats info only captures hrtimers? So this method won't
help us investigate how softirq-timers are being utilized.

Comment 3 Luis Claudio R. Goncalves 2008-03-27 20:56:27 UTC
John, you could reboot your system with 'nohz=on'. This way you would have fewer
calls to softirq-timer threads (one per second + timer expirations).

update_wall_times() and update_vsyscall() are called by tick_sched_timer() and
they may take some time to finish. Calling them 1000 times a second may eat
several CPU cycles. But I confess the CPU usage you have observed surprised me.

Comment 4 IBM Bug Proxy 2008-03-28 01:40:50 UTC
------- Comment From jstultz.com 2008-03-27 21:34 EDT-------
nohz=on seems to have minimal effect. The cputime gains at close to the same
rate (maybe a touch slower though)

However, looking at the numbers closer, I'm apparently quite an order of
magnitude off. bsdtime is in *seconds*, not minutes. So maybe this isn't as
urgent, however the cputime being high for the softirq timers and not for other
frequently run kernel deamons (posix-cpu-timers, softirq-rcu, etc) is still a
bit mysterious.

Comment 5 IBM Bug Proxy 2008-05-01 23:24:33 UTC
------- Comment From jstultz.com 2008-05-01 19:17 EDT-------
So I somewhat bisected this down, and the behavior changed between 2.6.21.4-rt10
and 2.6.21.4-rt11, which is when the CFS scheduler was merged.

Comment 6 Clark Williams 2008-07-02 14:37:49 UTC
is this still an issues?

Comment 7 IBM Bug Proxy 2008-08-03 04:35:02 UTC
Created attachment 313278 [details]
patch to move calc_load and update_wall_time out of softirq-timer context

Comment 8 IBM Bug Proxy 2008-09-05 07:00:30 UTC
On 2.6.24.7-74 kernel, I see:
ps -eLo  bsdtime,rtprio,comm| sort | tail
0:27      - setroubleshootd
1:05      - pcscd
1:10     30 sirq-sched/0
1:10     30 sirq-sched/2
1:11     30 sirq-sched/3
3:01     30 sirq-timer/1
3:17     30 sirq-timer/2
3:27     30 sirq-timer/0
3:32     30 sirq-timer/3
uptime
12:02:29 up 3 days, 14:36,  1 user,  load average: 0.05, 0.02, 0.00

Comment 9 Luis Claudio R. Goncalves 2008-10-23 10:58:40 UTC
Are you seeing this issue nowadays?

Comment 10 IBM Bug Proxy 2009-03-12 12:30:42 UTC
(In reply to comment #19)
> Are you seeing this issue nowadays?
>

Yes, this seems to be still present:

[root@elm3c32 ~]# ps -eLo  bsdtime,rtprio,comm| sort |grep sirq-timer
0:03     70 sirq-timer/0
0:03     70 sirq-timer/1
0:03     70 sirq-timer/3
0:05     70 sirq-timer/2
[root@elm3c32 ~]# uptime
08:29:11 up  1:24,  1 user,  load average: 0.00, 0.08, 0.17
[root@elm3c32 ~]# uname -a
Linux elm3c32 2.6.24.7-107.el5rt #1 SMP PREEMPT RT Tue Mar 3 10:32:18 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

Comment 11 IBM Bug Proxy 2009-06-18 12:00:59 UTC
------- Comment From gowrishankar.m.com 2009-06-18 07:52 EDT-------
From 2.6.24.7-111.el5rt :

[root@elm9m99 ~]# ps -eLo  bsdtime,rtprio,comm| sort |grep sirq-timer
0:19     70 sirq-timer/3
0:20     70 sirq-timer/1
0:21     70 sirq-timer/0
0:23     70 sirq-timer/2

[root@elm9m99 ~]# uptime
07:44:49 up  7:45,  1 user,  load average: 0.00, 0.00, 0.00

Comment 14 Clark Williams 2009-09-22 21:38:00 UTC
Doesn't look like this is still an issue with 2.6.31. Can we close this?

Comment 15 IBM Bug Proxy 2009-09-23 18:01:49 UTC
------- Comment From sripathik.com 2009-09-23 13:52 EDT-------
Checked on 2.6.31-rt10.18.el5rt kernel. Looks like the problem is indeed solved:
# uptime
13:50:39 up  6:43,  2 users,  load average: 0.01, 0.37, 0.39

# ps -eLo  bsdtime,rtprio,comm| sort |grep sirq-timer
0:00     70 sirq-timer/0
0:00     70 sirq-timer/1
0:00     70 sirq-timer/2
0:00     70 sirq-timer/3
0:00     70 sirq-timer/4
0:00     70 sirq-timer/5
0:00     70 sirq-timer/6
0:00     70 sirq-timer/7

Hence I suppose we can close this. I am closing this bug on our side.

Thanks Clark.