Bug 650934 - Idle System has high load without visible cause
Summary: Idle System has high load without visible cause
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 14
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 635062 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-08 12:29 UTC by Dennis Jacobfeuerborn
Modified: 2010-12-22 19:53 UTC (History)
26 users (show)

Fixed In Version: kernel-2.6.35.10-72.fc14
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-12-09 16:10:33 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Dennis Jacobfeuerborn 2010-11-08 12:29:02 UTC
I just installed Fedora 14 and I noticed that I cannot get my system load below the 0.50-1.00 range. This is on an idle system right after login and before starting any applications. I also checked "top" which shows the cpu as 99% idle.

I followed an issue with similar symptoms here:
https://bugzilla.kernel.org/show_bug.cgi?id=16525
and other people seem to have similar problems in this thread:
http://lists.fedoraproject.org/pipermail/test/2010-October/094683.html

This is was powertop says on my machine:

Top causes for wakeups:
  56.7% (892.1)   [Rescheduling interrupts] <kernel IPI>
  14.2% (223.5)   [kernel scheduler] Load balancing tick
   9.3% (145.7)   firefox-bin
   5.0% ( 79.0)   USB device  2-3 : Microsoft Basic Optical Mouse v2.0  (Microsoft )
   4.2% ( 66.2)   [ohci_hcd:usb2, hda_intel] <interrupt>
   3.5% ( 55.2)   [radeon] <interrupt>
   2.1% ( 33.3)   java
   1.0% ( 15.8)   thunderbird-bin
   0.7% ( 11.7)   skype
   0.7% ( 11.4)   firefox
   0.6% (  9.9)   Xorg
   0.5% (  7.4)   [kernel core] hrtimer_start (tick_sched_timer)
   0.3% (  4.7)   [TLB shootdowns] <kernel IPI>

When I play a youtube video the profile looks more like the one in the bug above:

Top causes for wakeups:
  36.2% (760.7)   [kernel scheduler] Load balancing tick
  34.6% (727.9)   [Rescheduling interrupts] <kernel IPI>
   7.4% (154.8)   [eth0] <interrupt>
   7.2% (152.1)   firefox-bin
   4.7% ( 99.2)   pulseaudio
   3.6% ( 75.0)   plugin-containe
   1.6% ( 33.3)   java
   1.0% ( 21.3)   firefox
   0.8% ( 15.9)   thunderbird-bin
   0.6% ( 13.5)   [ICE1712] <interrupt>
   0.5% ( 10.4)   skype
   0.3% (  6.9)   [sata_nv] <interrupt>

I mention this because I filed the following bug because of performance issues compared to my old Fedora 11 system:
https://bugzilla.redhat.com/show_bug.cgi?id=553059

That bug contains some sysprof profiles for comparison and the peculiar bit there is the excessive number of calls to "raw_local_irq_restore" compared to Fedora 11. This seems to fit the high "rescheduling interrupts" bit in the powertop profiles.

Is there a way to dynamically influence the interrupt and/or load balancing with kernel parameters (like e.g. nohz, noapic, etc.)? If so what would be the recommended settings to get to the bottom of this?

Comment 1 Fabian A. Scherschel 2010-11-08 13:08:45 UTC
Seems to me this is the same issue as: https://bugzilla.redhat.com/show_bug.cgi?id=635813

Comment 2 Dennis Jacobfeuerborn 2010-11-08 14:42:51 UTC
(In reply to comment #1)
> Seems to me this is the same issue as:
> https://bugzilla.redhat.com/show_bug.cgi?id=635813

The profiles in that bug don't show the high impact of rescheduling interrupts so this might be a separate issue.

Comment 3 Fabian A. Scherschel 2010-11-08 14:46:33 UTC
Are you sure? Looks to me like the rescheduling is waking up the CPU a lot which causes the high load.

Comment 4 Dennis Jacobfeuerborn 2010-11-08 14:52:54 UTC
(In reply to comment #3)
> Are you sure? Looks to me like the rescheduling is waking up the CPU a lot
> which causes the high load.

That certainly is part of the problem but if you compare the profiles then "[Rescheduling interrupts] <kernel IPI>" only shows up significantly in mine but not the others.

I'm not enough of a kernel guru to determine if these are related though.

Comment 5 Fabian A. Scherschel 2010-11-08 14:56:08 UTC
Oh, true. Never mind, ignore me.

Comment 6 Dennis Jacobfeuerborn 2010-11-08 15:20:05 UTC
Ok, so after booting with "nohz=off" my profile now looks like this:

Top causes for wakeups:
  73.3% (2002.0)   [kernel scheduler] Load balancing tick
  16.0% (437.7)   [Rescheduling interrupts] <kernel IPI>
   4.0% (110.6)   plugin-containe
   3.6% ( 99.4)   pulseaudio
   0.7% ( 19.2)   firefox
   0.6% ( 17.0)   thunderbird-bin
   0.5% ( 13.5)   [ICE1712] <interrupt>
   0.4% ( 10.6)   skype
   0.2% (  4.6)   [eth0] <interrupt>
   0.1% (  2.4)   [sata_nv] <interrupt>

This looks much more like the profiles in the other bug.

The net result is that the load now goes toward 0.0 after idling on the desktop for a few minutes. Not sure what to make of that though.

Comment 7 Andreas Fleig 2010-11-09 14:35:51 UTC
Ubuntu 10.04 had the same issue with Kernel 2.6.32:
http://ubuntuforums.org/showthread.php?t=1471010
https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910

The launchpad bug seems to be specific to EC2, but actually I noticed high system loads on every 10.04 desktop machine I encountered.

Comment 8 Matthew Miller 2010-11-10 18:44:46 UTC
*** Bug 635062 has been marked as a duplicate of this bug. ***

Comment 9 Hamidou Dia 2010-11-16 00:35:34 UTC
Hi,

It looks like there is an ongoing discussion about issues with ~high~ load (0.60 with a 2.6.36) when CPU is IDLE in the kernel mailing list (looks like this issue started from 2.6.35).

http://lkml.indiana.edu/hypermail/linux/kernel/1010.3/00310.html

Regards

Hamidou Dia

Comment 10 Kyle McMartin 2010-11-28 02:05:23 UTC
http://kyle.fedorapeople.org/kernel/2.6.35.9-62.bz635813.fc14/x86_64/

Try this build, please?

Comment 11 Michael Cronenworth 2010-11-28 03:59:43 UTC
(In reply to comment #10)
> http://kyle.fedorapeople.org/kernel/2.6.35.9-62.bz635813.fc14/x86_64/
> 
> Try this build, please?

This build is noticeably better. What a nice Thanksgiving treat. Hope you had a good Thanksgiving and weren't busy on this kernel bug.

Typing an e-mail or leaving the system at idle now results in desired load averages (0.10 or lower). Load is also correct when running a demanding application and it settles back down after stopping it. Good work!

It's unfortunate Linus was more concerned with how his Adobe Flash was working compared to the load average counters.

Comment 12 Orion Poplawski 2010-11-29 17:47:39 UTC
Is this an x86_64 issue only, or is there a i686 kernel I can test.  I think I see this on my P4 box.

Comment 13 Kyle McMartin 2010-11-29 18:24:42 UTC
No, shouldn't be. I just can't be bothered building 32-bit images constantly, since it triples the amount of time it takes me.

Comment 14 Kyle McMartin 2010-11-30 17:14:27 UTC
http://kyle.fedorapeople.org/kernel/2.6.35.9-62.bz650934.fc14/

peterz sent out a new patch this morning, so please try this.

I included i686 images this time.

NOTE: it will fix only the high running idle load average, and NOT any bugs about high cpu usage.

Comment 15 Michael Cronenworth 2010-11-30 17:35:01 UTC
(In reply to comment #14)
> http://kyle.fedorapeople.org/kernel/2.6.35.9-62.bz650934.fc14/

Did you also capture the -devel package? I need it for my... special modules.

Comment 16 Orion Poplawski 2010-11-30 18:19:27 UTC
Looks good to me - got to below .1 quickly after boot with nothing running.  Now at ~.4 with firefox running at 50% cpu.

Comment 17 Kyle McMartin 2010-11-30 19:57:32 UTC
no, sorry, i threw it out already. there src.rpm is there though.

Comment 18 Michael Cronenworth 2010-11-30 21:53:25 UTC
My compiled build from comment 14 seems to be acting the same as the kernel from comment 10. As noted by some of the folks in the LKML thread, it takes time (~5 minutes) of zero activity to see low idle load being reported for the last minute load counter. I would expect the last minute counter to reflect the last minute, but this is much better than the vanilla .35 kernel.

Comment 19 Mihai Harpau 2010-12-01 20:48:55 UTC
Kernel from comment 14 run very well (little better than kernel from comment 10) and is much better than last official kernel (kernel-2.6.35.6-48.fc14).

Comment 20 Dennis Jacobfeuerborn 2010-12-02 00:39:51 UTC
After booting to the desktop with the kernel from comment 14 and waiting for a few minutes the load goes all the way down to 0.00 for me. Looks good.

I'm wondering though why it takes several minutes to do so. Given that once the desktop is loaded the system activity stays constant and the load value is defined as the average over one minute I would expect for the load to reach the minimum value pretty much exactly one minute after the system activity dies down.

Comment 21 Jan ONDREJ 2010-12-09 08:53:02 UTC
Why this bug was closed as "RAWHIDE", even if it was filled for Fedora 14?
This still persist in Fedora 14 stable, reopening.

Comment 22 Matthew Miller 2010-12-09 13:37:54 UTC
(In reply to comment #21)
> Why this bug was closed as "RAWHIDE", even if it was filled for Fedora 14?
> This still persist in Fedora 14 stable, reopening.

"Fixed in rawhide" is sometimes (often!) the best fix for bugs filed against a stable release, particularly when a bug isn't critical and the fix itself may cause disruption for other people.

I'm not commenting on whether that's the case with this particular bug, but just noting that it's a reasonable action in general.

Comment 23 Kyle McMartin 2010-12-09 16:10:33 UTC
Because I fixed it there too.

Comment 24 Matthew Miller 2010-12-09 16:16:00 UTC
(In reply to comment #23)
> Because I fixed it there too.

Well then. :)

Comment 25 Michael Cronenworth 2010-12-09 16:31:40 UTC
(In reply to comment #21)
> Why this bug was closed as "RAWHIDE", even if it was filled for Fedora 14?
> This still persist in Fedora 14 stable, reopening.

Fixed in F14[1], too. Meaning: Wait for the next F14 kernel. ;)

Thanks, Kyle!

[1] http://pkgs.fedoraproject.org/gitweb/?p=kernel.git;a=commit;h=bed92c4e508998dbcbf358183f61892363277e15

Comment 26 Jan ONDREJ 2010-12-09 19:18:46 UTC
Please, next time leave this open and let bodhi to close it. Then it will not close before it will be pushed to stable. Thank you.

Comment 27 Kyle McMartin 2010-12-09 19:44:39 UTC
No, because then bodhi will close things which are not appropriate. There's no way to flag them individually, so I have to do it by hand when fixes get committed.

Comment 28 Fedora Update System 2010-12-17 15:10:18 UTC
kernel-2.6.35.10-68.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/kernel-2.6.35.10-68.fc14

Comment 29 Fedora Update System 2010-12-19 23:56:44 UTC
kernel-2.6.35.10-69.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/kernel-2.6.35.10-69.fc14

Comment 30 Fedora Update System 2010-12-21 13:55:21 UTC
kernel-2.6.35.10-72.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/kernel-2.6.35.10-72.fc14

Comment 31 Fedora Update System 2010-12-22 19:51:59 UTC
kernel-2.6.35.10-72.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.