Bug 822877

Summary: Kernel reports high load during idle, from 3.3.5-2 on
Product: [Fedora] Fedora Reporter: Bert DeKnuydt <Bert.Deknuydt>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 17CC: covex, gansalmon, hansvon, itamar, jonathan, kernel-maint, lacombar, madhu.chinakonda, ngaywood, webmaster
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-27 11:25:18 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Description Flags
load change with 3.4.2 none

Description Bert DeKnuydt 2012-05-18 07:30:00 EDT
Description of problem:

Load, as reported from e.g. 'uptime', on a completely idle machine
is significantly higher then it was on 3.3.4 and earlier.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Boot 3.3.5-2 or 3.3.6-3
2. Leave the machine to become totally idle
3. Monitor the load with e.g. 'uptime'
Actual results:

On one machine, with 3.3.6-3, after xx minutes idle-ness:

 13:11:30 up 10 min,  1 user,  load average: 0.16, 0.17, 0.15
 13:16:30 up 15 min,  1 user,  load average: 0.19, 0.17, 0.14

Load remains at around 0.20. The exact number it settles on 
differs from machine to machine, and seems to reach 0.80 on 
certain machines.

Expected results:

Same machine, with 3.3.4-3, after xx minutes of idle-ness:

 12:53:59 up 10 min,  1 user,  load average: 0.05, 0.14, 0.10
 12:59:00 up 15 min,  1 user,  load average: 0.00, 0.05, 0.08

Load further drops and settles at 0.00 to 0.05 for this machine.

Additional info:

Top, dstat and powertop do not show anything obvious to me.

Remembers me of the load mis-calculation of a year or so ago.

'load' is of course only an arbitrary number, but it's pretty important
as unexpected values disturb e.g. condor's job scheduling algorithms.

I see that ArchLinux has similar problems:
Comment 1 hansvon 2012-05-27 04:44:27 EDT
Does also affect F17 and 3.3.7:

1st machine: MacBookAir4,2 (Intel Sandybridge Core i7)
$ uname -a; uptime
Linux karei 3.3.7-1.fc17.x86_64 #1 SMP Mon May 21 22:32:19 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 10:26:49 up  1:00,  4 users,  load average: 0.67, 0.68, 0.57

2nd machine: Desktop SNB Core i3
$ uname -a;uptime Linux buta 3.3.7-1.fc17.x86_64 #1 SMP Mon May 21 22:32:19 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 10:40:26 up 11 min,  1 user,  load average: 0.44, 0.32, 0.24

Doesn't seem to affect my 3rd machine (Sony laptop with Core i3 Arrandale)
$ uname -a;uptime 
Linux hebi 3.3.7-1.fc17.x86_64 #1 SMP Mon May 21 22:32:19 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 10:42:00 up 9 min,  2 users,  load average: 0.02, 0.27, 0.28

Nothing seems to stand out in "perf top" or "powertop"...
Comment 2 Adam Pribyl 2012-06-07 14:10:45 EDT
Same here, on several machines with F16. The 3.3.4 shows in init 3 idle load 0. With 3.3.7 I see constant load 0.5, in Gnome above 1.0. This happens on both Intel and AMD cpus I have here. I am not sure this is just load shown incorrectly. IMHO there is a performance penalty too, but I did no objective measurement.
Comment 3 Arnaud Lacombe 2012-06-11 10:56:19 EDT
same here, here is the bug report I was about to fill:

Description of problem:
Kernel on a mostly idle box shows a very high load:

A Fedora 15 box ( showing the problem: 
top - 10:47:11 up 24 min,  1 user,  load average: 0.95, 0.99, 0.89
Tasks: 239 total,   1 running, 238 sleeping,   0 stopped,   0 zombie
Cpu0  :  1.0%us,  0.3%sy,  0.0%ni, 98.3%id,  0.0%wa,  0.3%hi,  0.0%si,  0.0%st
Cpu1  :  0.3%us,  0.7%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

A Fedora 16 box (3.3.1) *not* showing the problem:
top - 10:47:28 up  8:44,  2 users,  load average: 0.00, 0.01, 0.05
Tasks: 201 total,   1 running, 200 sleeping,   0 stopped,   0 zombie
Cpu0  :  2.1%us,  0.9%sy,  0.0%ni, 95.8%id,  1.2%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  2.1%us,  0.9%sy,  0.0%ni, 95.9%id,  1.1%wa,  0.0%hi,  0.0%si,  0.0%st

Version-Release number of selected component (if applicable):

Known bad:
on Fedora 15:
on Fedora 16: 3.3.5-2.fc16.x86_64

Known good:
on Fedora 16: 3.3.1-3.fc16.x86_64

How reproducible:

Steps to Reproduce:
1. boot
2. start a virtual console
3. run top
Actual results: load near 1.00

Expected results: load near 0.00

Additional info:
The Fedora 16 box do not show the problem on 3.3.1-3.fc16.x86_64 but do show it starting 3.3.5-2.fc16.x86_64 and up.

This directly impact my multimedia experience as youtube videos fail to play fluidly and I am not even able to correctly watch webm encoded video from mplayer... Switching to 3.3.1 fixes the issue.
Comment 4 hansvon 2012-06-11 11:07:32 EDT
I've also tried:
  kernel-3.4.1-1.fc17.x86_64 (from koji)
  kernel-3.4.2-1.fc17.x86_64 (from koji)
and I still get an unexpectly high load.
Comment 5 Arnaud Lacombe 2012-06-11 13:33:43 EDT
ok, I moved back my Fedora 15 machine back to After 25min of normal desktop usage (ie. thunderbird, chrome, couple of virtual terminal, plus .flv audio decoding with mplayer), here is top output:

top - 13:32:24 up 21 min,  1 user,  load average: 0.15, 0.14, 0.27
Tasks: 246 total,   1 running, 245 sleeping,   0 stopped,   0 zombie
Cpu0  :  7.0%us,  1.7%sy,  0.0%ni, 89.7%id,  0.7%wa,  0.7%hi,  0.3%si,  0.0%st
Cpu1  :  1.7%us,  1.7%sy,  0.0%ni, 95.3%id,  1.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu2  :  2.7%us,  1.0%sy,  0.0%ni, 95.7%id,  0.3%wa,  0.3%hi,  0.0%si,  0.0%st
Cpu3  :  0.7%us,  0.3%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.3%hi,  0.0%si,  0.0%st

while the machine here is actually more used than previously :/
Comment 6 Arnaud Lacombe 2012-06-13 12:09:43 EDT

It would seem that 3.3.1-3.fc16.x86_64 from F16 is also affected. I am unable to currently flawlessly play youtube's video at standard resolution. Load average is ~0.5 while having nothing more than XFCE a couple of terminal and Chromium opened (with this single tab opened)

top currently report:

top - 11:57:11 up 2 days,  9:53,  2 users,  load average: 0.41, 0.50, 0.48
Tasks: 253 total,   1 running, 252 sleeping,   0 stopped,   0 zombie
Cpu0  :  1.7%us,  0.7%sy,  0.0%ni, 97.4%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  2.6%us,  1.0%sy,  0.0%ni, 93.4%id,  3.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3049184k total,  1451524k used,  1597660k free,    48036k buffers
Swap:  3145724k total,   254716k used,  2891008k free,   923840k cached

However, `perf top' might have a little bit more insight:

Events: 87K cycles                                                                                                                                                                                                                             
 41.29%  [kernel]                       [k] native_read_tsc
 18.09%  [kernel]                       [k] delay_tsc
  6.58%  [kernel]                       [k] ioread32
  3.52%  [unknown]                      [.] 0x7f8268248dc0
  1.86%  [kernel]                       [k] read_hpet

Events: 101K cycles                                                                                                                                                                                                                            
 20.33%  [kernel]                            [k] native_read_tsc
  8.62%  [kernel]                            [k] delay_tsc
  6.94%  [unknown]                           [.] 0x7f8268248dc0
  3.38%  [kernel]                            [k] read_hpet
  2.63%  [kernel]                            [k] ioread32

Events: 118K cycles                                                                                                                                                                                                                            
 15.01%  [kernel]                            [k] native_read_tsc
  6.87%  [kernel]                            [k] delay_tsc
  4.85%  [unknown]                           [.] 0x7f8268248dc0
  3.97%  libpixman-1.so.0.24.4               [.] 0x5602d
  3.19%  [kernel]                            [k] read_hpet

Events: 174K cycles                                                                                                                                                                                                                            
 22.23%  [unknown]                           [.] 0x7f8268248dc0
 12.62%  [kernel]                            [k] native_read_tsc
  5.50%  [kernel]                            [k] delay_tsc
  3.19%  [kernel]                            [k] read_hpet
  2.48%  chromium-browser                    [.] 0x4c4b05

This might be a wild guess, but activity in native_read_tsc()/delay_tsc would seem to match to higher load while idle.
Comment 7 Adam Pribyl 2012-06-14 15:25:03 EDT
It looks like this really affects only 64b systems as I tried 3.3.7 on i686 and it seems OK.

From kernel-devel:

Michal Schmidt:

If 3.3.4 works and 3.3.5 is broken, perhaps someone who's seeing the bug 
can bisect it. There are not many patches there.

Maybe it's this patch that went into 3.3.5:

commit 6b7b95e774e2e2b32631511ad7d4c2256f1b3162
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Thu Mar 1 15:04:46 2012 +0100

     sched: Fix nohz load accounting -- again!

This seems to affect only the calculation, but IMHO we see also a performance hit.
Comment 8 Adam Pribyl 2012-06-23 15:50:05 EDT
With 3.4.2-1.fc16.i686 I see this on i686 too. The load is constantly around 1.
Comment 9 Arnaud Lacombe 2012-06-23 15:58:48 EDT
@adam: out of curiosity, what does `perf top' report on the box while idle and showing the issue ?

FWIW, here is my output on 3.3.1, while *not* experiencing the issue:

Events: 72K cycles                                                                                                                                                                                               
 17.87%  libspeexdsp.so.1.5.0           [.] 0x96e8                                                                              
 11.93%  libflashplayer.so              [.] 0x33bb54
  7.27%  [kernel]                       [k] read_hpet
  2.55%  [kernel]                       [k] _raw_spin_lock_irqsave
  2.46%  chromium-browser               [.] 0x1461d45
  0.31%  [kernel]                       [k] native_read_tsc
Comment 10 Adam Pribyl 2012-06-24 08:11:50 EDT
Created attachment 593993 [details]
load change with 3.4.2

This is a server, there is still running something mainly DVB streaming, but in general I do not see anything suspicious, except the load changed with the new kernel.

 17.08%  [kernel]                                               [k] acpi_idle_do_entry
  3.95%  getstream                                              [.] send_ts_packet
  3.86%  [kernel]                                               [k] memcpy
  3.80%  [kernel]                                               [k] __copy_to_user_ll
  2.53%  [kernel]                                               [k] _raw_spin_lock_irqsave
  2.17%  [kernel]                                               [k] _raw_spin_unlock_irqrestore
  1.98%  [kernel]                                               [k] __wake_up_common
  1.83%  [saa7134]                                              [k] saa7134_irq
  1.56%  [dvb_core]                                             [k] dvb_dmx_swfilter_packet
  1.55%  libevent-2.0.so.5.0.1.#prelink#.I54qsK (deleted)       [.] 0x0000ce4a
  1.43%  [dvb_core]                                             [k] dvb_ringbuffer_write
  1.32%  [kernel]                                               [k] __schedule
  1.24%  [kernel]                                               [k] find_busiest_group
  1.20%  getstream                                              [.] dvr_read
  1.09%  [dvb_core]                                             [k] dvb_dmxdev_ts_callback
  0.93%  [kernel]                                               [k] ep_poll_callback
  0.92%  [kernel]                                               [k] update_curr
  0.80%  [kernel]                                               [k] menu_select
  0.79%  [kernel]                                               [k] delay_tsc
  0.79%  [kernel]                                               [k] timerqueue_add
  0.77%  [kernel]                                               [k] do_timer
  0.76%  [kernel]                                               [k] __hrtimer_start_range_ns
  0.71%  [kernel]                                               [k] update_cfs_shares
  0.70%  [kernel]                                               [k] update_cfs_load
  0.65%  [dvb_core]                                             [k] dvb_dmxdev_buffer_write
  0.63%  [kernel]                                               [k] irqtime_account_process_tick
  0.63%  [kernel]                                               [k] native_sched_clock
  0.61%  [kernel]                                               [k] rb_erase
  0.60%  [kernel]                                               [k] try_to_wake_up
  0.60%  [kernel]                                               [k] select_task_rq_fair
  0.59%  [kernel]                                               [k] usb_hcd_irq
  0.58%  [kernel]                                               [k] dequeue_entity
  0.56%  [kernel]                                               [k] sysenter_past_esp
  0.55%  [kernel]                                               [k] read_tsc
  0.54%  libslang.so.2.2.4                                      [.] 0x000669f0
  0.53%  [kernel]                                               [k] cpuacct_charge
  0.50%  libc-2.14.90.so (deleted)                              [.] 0x000e34b4
  0.50%  [kernel]                                               [k] effective_load
Comment 11 Adam Pribyl 2012-06-24 14:14:32 EDT
Here is another computer - laptop with Intel Atom Z520 (i686), same problem after update to 3.4.2 - I'd love to know if there is anybody with a computer where the problem does not exibit with 3.4.2.

 21.29%  [kernel]                     [k] delay_tsc
  2.76%  [kernel]                     [k] read_hpet
  2.72%  libslang.so.2.2.4            [.] 0x0006695c
  2.60%  perf                         [.] 0x000468e0
  2.26%  libslang.so.2.2.4            [.] SLtt_smart_puts
  1.87%  [kernel]                     [k] ioread32
  1.65%  libslang.so.2.2.4            [.] SLsmg_write_chars
  1.49%  [kernel]                     [k] do_timer
  1.33%  libdbus-1.so.3.5.6           [.] 0x000253ae
  1.25%  dbus-daemon                  [.] 0x000353b0
  1.16%  [kernel]                     [k] __schedule
  1.12%  [kernel]                     [k] menu_select
  0.92%  [kernel]                     [k] tick_nohz_idle_exit
  0.88%  [kernel]                     [k] __switch_to
  0.85%  [kernel]                     [k] find_busiest_group
  0.71%  [kernel]                     [k] get_next_timer_interrupt
  0.67%  [kernel]                     [k] tg_load_down
  0.67%  [kernel]                     [k] copy_user_highpage
  0.66%  [kernel]                     [k] _raw_spin_lock_irqsave
  0.66%  [kernel]                     [k] tick_do_update_jiffies64
  0.63%  perf                         [.] rb_next
  0.58%  [kernel]                     [k] rcu_note_context_switch
  0.57%  [kernel]                     [k] hrtimer_interrupt
  0.55%  [kernel]                     [k] sched_clock_local
  0.54%  libc-2.14.90.so              [.] _int_malloc
  0.53%  perf                         [.] dso__find_symbol
  0.53%  [kernel]                     [k] ktime_get
  0.51%  [kernel]                     [k] tick_nohz_stop_sched_tick
Comment 12 Adam Pribyl 2012-06-25 03:44:26 EDT
Sorry for the noise -forget about the Atom, this one is OK. I probably saw what is not there. The load drops to 0 at the Atom.
Comment 13 Bert DeKnuydt 2012-07-24 05:38:03 EDT
It seems like kernel 3.4.6-2.fc17.x86_64 solves this problem, while 
kernel-3.4.5-2.fc17.x86_64 still shows excessive load.