Bug 822877

Summary: Kernel reports high load during idle, from 3.3.5-2 on
Product: [Fedora] Fedora Reporter: Bert DeKnuydt <Bert.Deknuydt>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 17CC: covex, gansalmon, hansvon, itamar, jonathan, kernel-maint, lacombar, madhu.chinakonda, ngaywood, webmaster
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-27 15:25:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
load change with 3.4.2 none

Description Bert DeKnuydt 2012-05-18 11:30:00 UTC
Description of problem:

Load, as reported from e.g. 'uptime', on a completely idle machine
is significantly higher then it was on 3.3.4 and earlier.

Version-Release number of selected component (if applicable):

kernel-3.3.5-2

How reproducible:

Always

Steps to Reproduce:
1. Boot 3.3.5-2 or 3.3.6-3
2. Leave the machine to become totally idle
3. Monitor the load with e.g. 'uptime'
  
Actual results:

On one machine, with 3.3.6-3, after xx minutes idle-ness:

 13:11:30 up 10 min,  1 user,  load average: 0.16, 0.17, 0.15
 13:16:30 up 15 min,  1 user,  load average: 0.19, 0.17, 0.14

Load remains at around 0.20. The exact number it settles on 
differs from machine to machine, and seems to reach 0.80 on 
certain machines.

Expected results:

Same machine, with 3.3.4-3, after xx minutes of idle-ness:

 12:53:59 up 10 min,  1 user,  load average: 0.05, 0.14, 0.10
 12:59:00 up 15 min,  1 user,  load average: 0.00, 0.05, 0.08

Load further drops and settles at 0.00 to 0.05 for this machine.

Additional info:

Top, dstat and powertop do not show anything obvious to me.

Remembers me of the load mis-calculation of a year or so ago.

'load' is of course only an arbitrary number, but it's pretty important
as unexpected values disturb e.g. condor's job scheduling algorithms.

I see that ArchLinux has similar problems:
https://bbs.archlinux.org/viewtopic.php?pid=1103189

Comment 1 hansvon 2012-05-27 08:44:27 UTC
Does also affect F17 and 3.3.7:

1st machine: MacBookAir4,2 (Intel Sandybridge Core i7)
$ uname -a; uptime
Linux karei 3.3.7-1.fc17.x86_64 #1 SMP Mon May 21 22:32:19 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 10:26:49 up  1:00,  4 users,  load average: 0.67, 0.68, 0.57


2nd machine: Desktop SNB Core i3
$ uname -a;uptime Linux buta 3.3.7-1.fc17.x86_64 #1 SMP Mon May 21 22:32:19 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 10:40:26 up 11 min,  1 user,  load average: 0.44, 0.32, 0.24


Doesn't seem to affect my 3rd machine (Sony laptop with Core i3 Arrandale)
$ uname -a;uptime 
Linux hebi 3.3.7-1.fc17.x86_64 #1 SMP Mon May 21 22:32:19 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 10:42:00 up 9 min,  2 users,  load average: 0.02, 0.27, 0.28


Nothing seems to stand out in "perf top" or "powertop"...

Comment 2 Adam Pribyl 2012-06-07 18:10:45 UTC
Same here, on several machines with F16. The 3.3.4 shows in init 3 idle load 0. With 3.3.7 I see constant load 0.5, in Gnome above 1.0. This happens on both Intel and AMD cpus I have here. I am not sure this is just load shown incorrectly. IMHO there is a performance penalty too, but I did no objective measurement.

Comment 3 Arnaud Lacombe 2012-06-11 14:56:19 UTC
same here, here is the bug report I was about to fill:

Description of problem:
Kernel on a mostly idle box shows a very high load:

A Fedora 15 box (2.6.43.5) showing the problem: 
top - 10:47:11 up 24 min,  1 user,  load average: 0.95, 0.99, 0.89
Tasks: 239 total,   1 running, 238 sleeping,   0 stopped,   0 zombie
Cpu0  :  1.0%us,  0.3%sy,  0.0%ni, 98.3%id,  0.0%wa,  0.3%hi,  0.0%si,  0.0%st
Cpu1  :  0.3%us,  0.7%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

A Fedora 16 box (3.3.1) *not* showing the problem:
top - 10:47:28 up  8:44,  2 users,  load average: 0.00, 0.01, 0.05
Tasks: 201 total,   1 running, 200 sleeping,   0 stopped,   0 zombie
Cpu0  :  2.1%us,  0.9%sy,  0.0%ni, 95.8%id,  1.2%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  2.1%us,  0.9%sy,  0.0%ni, 95.9%id,  1.1%wa,  0.0%hi,  0.0%si,  0.0%st


Version-Release number of selected component (if applicable):

Known bad:
on Fedora 15: 2.6.43.5-2.fc15.x86_64
on Fedora 16: 3.3.5-2.fc16.x86_64

Known good:
on Fedora 16: 3.3.1-3.fc16.x86_64

How reproducible:
always

Steps to Reproduce:
1. boot
2. start a virtual console
3. run top
  
Actual results: load near 1.00

Expected results: load near 0.00

Additional info:
The Fedora 16 box do not show the problem on 3.3.1-3.fc16.x86_64 but do show it starting 3.3.5-2.fc16.x86_64 and up.

This directly impact my multimedia experience as youtube videos fail to play fluidly and I am not even able to correctly watch webm encoded video from mplayer... Switching to 3.3.1 fixes the issue.

Comment 4 hansvon 2012-06-11 15:07:32 UTC
I've also tried:
  kernel-3.4.0-1.fc17.x86_64
  kernel-3.4.1-1.fc17.x86_64 (from koji)
  kernel-3.4.2-1.fc17.x86_64 (from koji)
and I still get an unexpectly high load.

Comment 5 Arnaud Lacombe 2012-06-11 17:33:43 UTC
ok, I moved back my Fedora 15 machine back to 2.6.43.2-6.fc15.x86_64. After 25min of normal desktop usage (ie. thunderbird, chrome, couple of virtual terminal, plus .flv audio decoding with mplayer), here is top output:

top - 13:32:24 up 21 min,  1 user,  load average: 0.15, 0.14, 0.27
Tasks: 246 total,   1 running, 245 sleeping,   0 stopped,   0 zombie
Cpu0  :  7.0%us,  1.7%sy,  0.0%ni, 89.7%id,  0.7%wa,  0.7%hi,  0.3%si,  0.0%st
Cpu1  :  1.7%us,  1.7%sy,  0.0%ni, 95.3%id,  1.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu2  :  2.7%us,  1.0%sy,  0.0%ni, 95.7%id,  0.3%wa,  0.3%hi,  0.0%si,  0.0%st
Cpu3  :  0.7%us,  0.3%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.3%hi,  0.0%si,  0.0%st

while the machine here is actually more used than previously :/

Comment 6 Arnaud Lacombe 2012-06-13 16:09:43 UTC
Hi,

It would seem that 3.3.1-3.fc16.x86_64 from F16 is also affected. I am unable to currently flawlessly play youtube's video at standard resolution. Load average is ~0.5 while having nothing more than XFCE a couple of terminal and Chromium opened (with this single tab opened)

top currently report:

top - 11:57:11 up 2 days,  9:53,  2 users,  load average: 0.41, 0.50, 0.48
Tasks: 253 total,   1 running, 252 sleeping,   0 stopped,   0 zombie
Cpu0  :  1.7%us,  0.7%sy,  0.0%ni, 97.4%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  2.6%us,  1.0%sy,  0.0%ni, 93.4%id,  3.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3049184k total,  1451524k used,  1597660k free,    48036k buffers
Swap:  3145724k total,   254716k used,  2891008k free,   923840k cached

However, `perf top' might have a little bit more insight:

...
Events: 87K cycles                                                                                                                                                                                                                             
 41.29%  [kernel]                       [k] native_read_tsc
 18.09%  [kernel]                       [k] delay_tsc
  6.58%  [kernel]                       [k] ioread32
  3.52%  [unknown]                      [.] 0x7f8268248dc0
  1.86%  [kernel]                       [k] read_hpet

...
Events: 101K cycles                                                                                                                                                                                                                            
 20.33%  [kernel]                            [k] native_read_tsc
  8.62%  [kernel]                            [k] delay_tsc
  6.94%  [unknown]                           [.] 0x7f8268248dc0
  3.38%  [kernel]                            [k] read_hpet
  2.63%  [kernel]                            [k] ioread32

...
Events: 118K cycles                                                                                                                                                                                                                            
 15.01%  [kernel]                            [k] native_read_tsc
  6.87%  [kernel]                            [k] delay_tsc
  4.85%  [unknown]                           [.] 0x7f8268248dc0
  3.97%  libpixman-1.so.0.24.4               [.] 0x5602d
  3.19%  [kernel]                            [k] read_hpet

...
Events: 174K cycles                                                                                                                                                                                                                            
 22.23%  [unknown]                           [.] 0x7f8268248dc0
 12.62%  [kernel]                            [k] native_read_tsc
  5.50%  [kernel]                            [k] delay_tsc
  3.19%  [kernel]                            [k] read_hpet
  2.48%  chromium-browser                    [.] 0x4c4b05


This might be a wild guess, but activity in native_read_tsc()/delay_tsc would seem to match to higher load while idle.

Comment 7 Adam Pribyl 2012-06-14 19:25:03 UTC
It looks like this really affects only 64b systems as I tried 3.3.7 on i686 and it seems OK.

From kernel-devel:

Michal Schmidt:

If 3.3.4 works and 3.3.5 is broken, perhaps someone who's seeing the bug 
can bisect it. There are not many patches there.

Maybe it's this patch that went into 3.3.5:

commit 6b7b95e774e2e2b32631511ad7d4c2256f1b3162
Author: Peter Zijlstra <peterz>
Date:   Thu Mar 1 15:04:46 2012 +0100

     sched: Fix nohz load accounting -- again!

----
This seems to affect only the calculation, but IMHO we see also a performance hit.

Comment 8 Adam Pribyl 2012-06-23 19:50:05 UTC
With 3.4.2-1.fc16.i686 I see this on i686 too. The load is constantly around 1.

Comment 9 Arnaud Lacombe 2012-06-23 19:58:48 UTC
@adam: out of curiosity, what does `perf top' report on the box while idle and showing the issue ?

FWIW, here is my output on 3.3.1, while *not* experiencing the issue:

Events: 72K cycles                                                                                                                                                                                               
 17.87%  libspeexdsp.so.1.5.0           [.] 0x96e8                                                                              
 11.93%  libflashplayer.so              [.] 0x33bb54
  7.27%  [kernel]                       [k] read_hpet
  2.55%  [kernel]                       [k] _raw_spin_lock_irqsave
  2.46%  chromium-browser               [.] 0x1461d45
[...]
  0.31%  [kernel]                       [k] native_read_tsc

Comment 10 Adam Pribyl 2012-06-24 12:11:50 UTC
Created attachment 593993 [details]
load change with 3.4.2

This is a server, there is still running something mainly DVB streaming, but in general I do not see anything suspicious, except the load changed with the new kernel.

 17.08%  [kernel]                                               [k] acpi_idle_do_entry
  3.95%  getstream                                              [.] send_ts_packet
  3.86%  [kernel]                                               [k] memcpy
  3.80%  [kernel]                                               [k] __copy_to_user_ll
  2.53%  [kernel]                                               [k] _raw_spin_lock_irqsave
  2.17%  [kernel]                                               [k] _raw_spin_unlock_irqrestore
  1.98%  [kernel]                                               [k] __wake_up_common
  1.83%  [saa7134]                                              [k] saa7134_irq
  1.56%  [dvb_core]                                             [k] dvb_dmx_swfilter_packet
  1.55%  libevent-2.0.so.5.0.1.#prelink#.I54qsK (deleted)       [.] 0x0000ce4a
  1.43%  [dvb_core]                                             [k] dvb_ringbuffer_write
  1.32%  [kernel]                                               [k] __schedule
  1.24%  [kernel]                                               [k] find_busiest_group
  1.20%  getstream                                              [.] dvr_read
  1.09%  [dvb_core]                                             [k] dvb_dmxdev_ts_callback
  0.93%  [kernel]                                               [k] ep_poll_callback
  0.92%  [kernel]                                               [k] update_curr
  0.80%  [kernel]                                               [k] menu_select
  0.79%  [kernel]                                               [k] delay_tsc
  0.79%  [kernel]                                               [k] timerqueue_add
  0.77%  [kernel]                                               [k] do_timer
  0.76%  [kernel]                                               [k] __hrtimer_start_range_ns
  0.71%  [kernel]                                               [k] update_cfs_shares
  0.70%  [kernel]                                               [k] update_cfs_load
  0.65%  [dvb_core]                                             [k] dvb_dmxdev_buffer_write
  0.63%  [kernel]                                               [k] irqtime_account_process_tick
  0.63%  [kernel]                                               [k] native_sched_clock
  0.61%  [kernel]                                               [k] rb_erase
  0.60%  [kernel]                                               [k] try_to_wake_up
  0.60%  [kernel]                                               [k] select_task_rq_fair
  0.59%  [kernel]                                               [k] usb_hcd_irq
  0.58%  [kernel]                                               [k] dequeue_entity
  0.56%  [kernel]                                               [k] sysenter_past_esp
  0.55%  [kernel]                                               [k] read_tsc
  0.54%  libslang.so.2.2.4                                      [.] 0x000669f0
  0.53%  [kernel]                                               [k] cpuacct_charge
  0.50%  libc-2.14.90.so (deleted)                              [.] 0x000e34b4
  0.50%  [kernel]                                               [k] effective_load

Comment 11 Adam Pribyl 2012-06-24 18:14:32 UTC
Here is another computer - laptop with Intel Atom Z520 (i686), same problem after update to 3.4.2 - I'd love to know if there is anybody with a computer where the problem does not exibit with 3.4.2.

 21.29%  [kernel]                     [k] delay_tsc
  2.76%  [kernel]                     [k] read_hpet
  2.72%  libslang.so.2.2.4            [.] 0x0006695c
  2.60%  perf                         [.] 0x000468e0
  2.26%  libslang.so.2.2.4            [.] SLtt_smart_puts
  1.87%  [kernel]                     [k] ioread32
  1.65%  libslang.so.2.2.4            [.] SLsmg_write_chars
  1.49%  [kernel]                     [k] do_timer
  1.33%  libdbus-1.so.3.5.6           [.] 0x000253ae
  1.25%  dbus-daemon                  [.] 0x000353b0
  1.16%  [kernel]                     [k] __schedule
  1.12%  [kernel]                     [k] menu_select
  0.92%  [kernel]                     [k] tick_nohz_idle_exit
  0.88%  [kernel]                     [k] __switch_to
  0.85%  [kernel]                     [k] find_busiest_group
  0.71%  [kernel]                     [k] get_next_timer_interrupt
  0.67%  [kernel]                     [k] tg_load_down
  0.67%  [kernel]                     [k] copy_user_highpage
  0.66%  [kernel]                     [k] _raw_spin_lock_irqsave
  0.66%  [kernel]                     [k] tick_do_update_jiffies64
  0.63%  perf                         [.] rb_next
  0.58%  [kernel]                     [k] rcu_note_context_switch
  0.57%  [kernel]                     [k] hrtimer_interrupt
  0.55%  [kernel]                     [k] sched_clock_local
  0.54%  libc-2.14.90.so              [.] _int_malloc
  0.53%  perf                         [.] dso__find_symbol
  0.53%  [kernel]                     [k] ktime_get
  0.51%  [kernel]                     [k] tick_nohz_stop_sched_tick

Comment 12 Adam Pribyl 2012-06-25 07:44:26 UTC
Sorry for the noise -forget about the Atom, this one is OK. I probably saw what is not there. The load drops to 0 at the Atom.

Comment 13 Bert DeKnuydt 2012-07-24 09:38:03 UTC
It seems like kernel 3.4.6-2.fc17.x86_64 solves this problem, while 
kernel-3.4.5-2.fc17.x86_64 still shows excessive load.