638912 – Kworker/0 Task over 85%

Bug 638912 - Kworker/0 Task over 85%

Summary: Kworker/0 Task over 85%

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	John Feeney
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-09-30 11:06 UTC by Stephen
Modified:	2013-01-10 08:13 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2011-03-08 17:17:43 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dmesg (81.78 KB, application/octet-stream) 2010-11-24 18:44 UTC, Sander D	no flags	Details
PowerTOP output (6.62 KB, text/plain) 2010-11-26 18:13 UTC, Sander D	no flags	Details
Output of: grep -r . /sys/firmware/acpi/interrupts/ (3.91 KB, application/octet-stream) 2010-12-06 17:20 UTC, Sander D	no flags	Details
Output of `lspci -vvvxxxx` on 2.6.36.1-10.fc15.x86_64 (199.86 KB, text/plain) 2010-12-06 18:58 UTC, Sander D	no flags	Details
Output of `acpidump` on 2.6.36.1-10.fc15.x86_64 (567.74 KB, text/plain) 2010-12-06 18:59 UTC, Sander D	no flags	Details
Output of `lspci -t` on 2.6.36.1-11.fc15.x86_64 (669 bytes, text/plain) 2010-12-06 21:28 UTC, Sander D	no flags	Details
lspci -vvvxxxx with pcie_ports=compat (199.71 KB, text/plain) 2010-12-06 22:25 UTC, Sander D	no flags	Details
lspci -vvvxxxx without pcie_ports=compat (199.86 KB, text/plain) 2010-12-06 22:25 UTC, Sander D	no flags	Details
Diagnostic patch (377 bytes, patch) 2010-12-06 22:56 UTC, Matthew Garrett	no flags	Details \| Diff
Set PCI _OSC to 0 if we don't gain full control (509 bytes, patch) 2010-12-08 21:21 UTC, Matthew Garrett	no flags	Details \| Diff
Set _OSC supported field correctly (2.56 KB, patch) 2010-12-10 18:02 UTC, Matthew Garrett	no flags	Details \| Diff
debug output from normal boot and with pcie_ports=compat (10.83 KB, application/x-bzip) 2011-07-03 04:35 UTC, Jesse Hutton	no flags	Details
dmesg from booting with and without pcie_ports=compat (29.65 KB, application/x-bzip) 2011-07-03 04:47 UTC, Jesse Hutton	no flags	Details
Show Obsolete (2) View All

Description Stephen 2010-09-30 11:06:49 UTC

Description of problem:

Running top shows a task called kworker/0:3 running at between 85% and 95%.  This builds up over time, starting at around 10%.  By running at this level, my BOINC Client will not run.

Version-Release number of selected component (if applicable):
Kernel version 2.6.36-0.28.rc6.git0.fc15.x86_64

How reproducible:

Every time

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Sander D 2010-11-14 22:32:13 UTC

This is still happening with the latest 2.6.36-1.fc15.x86_64 kernel on my Rawhide installation. I've got 2 cores with hyper-threading, and one is constantly running around 100% because of kworker/0:1.

Comment 2 Kyle McMartin 2010-11-24 18:25:22 UTC

Hi, could you please try to reproduce with the 2.6.37-rc3 kernel from here:
http://repos.fedorapeople.org/repos/kyle/kernel/fedora-kernel.repo
(It'll be going into rawhide real soon but not immediately.)

regards, Kyle

Comment 3 Kyle McMartin 2010-11-24 18:41:53 UTC

Can you attach your dmesg?

Comment 4 Sander D 2010-11-24 18:44:12 UTC

Created attachment 462727 [details]
dmesg

Here's my dmesg, running with 2.6.37-0.1.rc2.git0.fc15.x86_64 where I have the same problem.

Comment 5 Kyle McMartin 2010-11-24 19:43:20 UTC

Thanks, it's hard to debug without knowing what's in the worker thread, and the workqueue tracer seems to be disabled in 2.6.36 and higher (I'll poke that later.) I've been told powertop2 can tell you, if you build it from:
git clone git://git.kernel.org/pub/scm/status/powertop/powertop.git

I'll work on a package for it if I can't find one (I don't believe the old one is sufficient.)

Thanks, Kyle

Comment 6 Kyle McMartin 2010-11-25 04:31:27 UTC

I tried that out, and it doesn't seem to work (displaying symbols) on x86_64... Arjan said he'd look into it. Will update you (but we really need to see what's on that workqueue that's sucking cpu.)

Comment 7 Kyle McMartin 2010-11-26 16:48:56 UTC

OK, powertop has allegedly been fixed.

Comment 8 Kyle McMartin 2010-11-26 17:35:51 UTC

http://kyle.fedorapeople.org/powertop-1.13-1.20101126.1.fc15.x86_64.rpm

Try that, and figure out which kWorker thread is the culprit?

Comment 9 Sander D 2010-11-26 18:13:44 UTC

Created attachment 463111 [details]
PowerTOP output

Thanks for the package! I'm not sure what displays the kWorker threads, but I've attached the content of the Overview and the Device stats tabs.

I'm still running 2.6.37-0.1.rc2.git0.fc15.x86_64 by the way, not -rc3. 'yum update' still doesn't automatically pick that. Should I try downloading and installing the RPMs?

Comment 10 Kyle McMartin 2010-11-26 18:32:44 UTC

Yeah, sorry, I changed the version numbering. If you could force remove that and upgrade you'll get the new stuff.

Blargh, nothing there looks like it would account for 85%.

Comment 11 Kyle McMartin 2010-11-26 18:53:14 UTC

Looks like USB might be the problem.. Can you try this:
http://kyle.fedorapeople.org/kernel/2.6.36.1-9.bz645968.1.fc15/

Which is built without the USB autosuspend stuff, and let me know if it helps?

Comment 12 Sander D 2010-11-26 19:10:56 UTC

I'm running that kernel right now, and kWorker is still very busy. Now it's around 94% even, instead of 85%.

When I just start PowerTOP, acpi_os_execute_deferred takes a lot of kWork time (867ms). But afterwards it seems normal, around 50ms. So I guess that's just PowerTOP querying ACPI or something.

Comment 13 Kyle McMartin 2010-11-26 19:19:59 UTC

Yeah, quite probably.

I'll ask mjg59 to take a look on Monday, I don't really know how to debug this, nothing jumps out as suspicious in your trace. (Nothing is running for particularly large fractions of a second...)

Comment 14 Kyle McMartin 2010-11-29 14:12:05 UTC

Can you

watch /sys/firmware/acpi/interrupts/gpe_all

and let me know if it is rising particularly quickly at runtime?

THanks, Kyle

Comment 15 Dave Jones 2010-12-01 02:56:45 UTC

I'm curious what the output of the following shows ..

perf record -a -e workqueue:workqueue_execute_start sleep 1

wait a second, and then

perf trace 

that might show something interesting.

Comment 16 Sander D 2010-12-02 22:50:48 UTC

I've broken my Rawhide install (will look at that this weekend), and am running an Ubuntu install right now with 2.6.37-7. The same problem occurs. Ubuntu doesn't seem to support the Kyle's command in Comment 14.

(In reply to comment #15)
> I'm curious what the output of the following shows ..
> 
> perf record -a -e workqueue:workqueue_execute_start sleep 1
> 
> wait a second, and then
> 
> perf trace 
> 
> that might show something interesting.

It gives 4733 occurrences of: "workqueue_execute_start: work struct 0xffff880036e4eb90: function acpi_os_execute_deferred" (with different struct names) and 151 occurences of the same string but with do_dbs_timer. The ACPI thing seems a bit much?

Comment 17 Sander D 2010-12-05 22:28:24 UTC

Same story in Rawhide with kernel 2.6.36.1-10.fc15.x86_64. A lot of acpi_os_execute_deferred events in the 'perf trace' output.

I can't watch the gpe_all thing even when running as root. The error I get is:

  sh: /sys/firmware/acpi/interrupts/gpe_all: Permission denied

It probably has nothing to do with it, but 'ls -l' gives:

  -rw-r--r--. 1 root root 4096 Dec  5 23:20 /sys/firmware/acpi/interrupts/gpe_all

Comment 18 Matthew Garrett 2010-12-05 22:38:49 UTC

Try the following:

watch -n 0.1 cat /sys/firmware/acpi/interrupts/gpe_all

and see if the number there is constantly increasing. If so, please attach the output of 

grep -r . /sys/firmware/acpi/interrupts/

Comment 19 Sander D 2010-12-06 17:20:02 UTC

Created attachment 465026 [details]
Output of: grep -r . /sys/firmware/acpi/interrupts/

The number in gpe_all is constantly increasing indeed, with a few thousand per second. Attached is the output of the grep command.

Comment 20 Matthew Garrett 2010-12-06 17:32:08 UTC

Ok, looks like gpe 1, which should only be relevant for hotplugging. Could you attach the output of lspci -vvvxxxx (run as root) and then install the pmtools package, run acpidump (again as root) and attach the output of that as well?

Comment 21 Sander D 2010-12-06 18:58:36 UTC

Created attachment 465040 [details]
Output of `lspci -vvvxxxx` on 2.6.36.1-10.fc15.x86_64

Comment 22 Sander D 2010-12-06 18:59:19 UTC

Created attachment 465041 [details]
Output of `acpidump` on 2.6.36.1-10.fc15.x86_64

Comment 23 Matthew Garrett 2010-12-06 19:43:35 UTC

Hm. Has this always been a problem, or did it appear with more recent kernels? It's correct for us to have gpe 1 enabled, so the fact that it's firing all the time is obviously a problem...

Comment 24 Sander D 2010-12-06 20:16:17 UTC

It's not a problem on Ubuntu with Linux 2.6.35-23-generic: no high kWorker CPU usage and no constant increase of gpe_all. Neither was it a problem on Fedora 14 if I recall correctly. Both distributions show the problem with 2.6.36 or 2.6.37, so I guess it appears only with recent kernels.

Comment 25 Matthew Garrett 2010-12-06 20:46:34 UTC

Hm. One of your ports has the hotplug flag set. Can you run

lspci -t

and attach the output?

Comment 26 Matthew Garrett 2010-12-06 21:03:01 UTC

Also, if you boot with

pcie_ports=compat

as a kernel argument, does the problem vanish?

Comment 27 Matthew Garrett 2010-12-06 21:07:32 UTC

And, finally, if you boot without that option (ie, just a normal boot), what does

ls -l /sys/bus/pci/slots/module

look like?

Comment 28 Sander D 2010-12-06 21:28:42 UTC

Created attachment 465071 [details]
Output of `lspci -t` on 2.6.36.1-11.fc15.x86_64

Running without the extra boot option on the 2.6.36.1-11.fc15.x86_64 kernel. It's a newer version than before, but I still have the same problem.

The path you specified doesn't seem to exist on my pc, but this seems to come close:

[root@elitebook ~]# ls -l /sys/bus/pci/slots/module
ls: cannot access /sys/bus/pci/slots/module: No such file or directory
[root@elitebook ~]# ls -l /sys/bus/pci/slots/1/module/
total 0
drwxr-xr-x. 2 root root 0 Dec  6 22:24 parameters
[root@elitebook ~]# ls -l /sys/bus/pci/slots/1/module/parameters/
total 0
-rw-r--r--. 1 root root 4096 Dec  6 22:24 debug

Comment 29 Sander D 2010-12-06 21:33:34 UTC

When I boot 2.6.36.1-11.fc15.x86_64 with pcie_ports=compat, the problem disappears. No high kWorker CPU usage and no constant increase of gpe_all.

Comment 30 Matthew Garrett 2010-12-06 21:39:43 UTC

Whoops, slight mistake there on my part.

ls -ld /sys/bus/pci/slots/1/module

Make sure you don't include an extra / at the end, or it'll give you a different answer! Can you do this both with and without the pcie_ports=compat option?

Comment 31 Sander D 2010-12-06 21:57:32 UTC

With the pcie_ports=compat option:

# ls -ld /sys/bus/pci/slots/1/module
lrwxrwxrwx. 1 root root 0 Dec  6 22:47 /sys/bus/pci/slots/1/module -> ../../../../module/acpiphp

Without the option:

# ls -ld /sys/bus/pci/slots/1/module
lrwxrwxrwx. 1 root root 0 Dec  6 22:51 /sys/bus/pci/slots/1/module -> ../../../../module/acpiphp

Comment 32 Matthew Garrett 2010-12-06 22:10:30 UTC

Ok, that's not quite what I was expecting. Can you do

lspci -vvvxxxx

as root for both the working (pcie_ports=compat) and non-working (no argument) states and attach the output?

Comment 33 Sander D 2010-12-06 22:25:19 UTC

Created attachment 465091 [details]
lspci -vvvxxxx with pcie_ports=compat

(Still running the same 2.6.36.1-11.fc15.x86_64 kernel.)

Comment 34 Sander D 2010-12-06 22:25:45 UTC

Created attachment 465092 [details]
lspci -vvvxxxx without pcie_ports=compat

Comment 35 Matthew Garrett 2010-12-06 22:56:13 UTC

Is there any chance you could build a kernel with the patch I'm about to attach, boot it (without pcie_ports=) and attach dmesg? The difference between the two configurations could explain what you're seeing, but for the life of me I'm unclear on why it ends up being configured that way.

Comment 36 Matthew Garrett 2010-12-06 22:56:35 UTC

Created attachment 465100 [details]
Diagnostic patch

Comment 37 Sander D 2010-12-06 23:10:59 UTC

I should be able to, haven't compiled a kernel since a few years though so it will take a little longer. I'll try in a few days and report to this bug again. Thanks so far!

Comment 38 Matthew Garrett 2010-12-07 21:35:56 UTC

Actually, ignore that - I think I've got a better idea as to what's happening now. I'll try to come up with a patch for you.

Comment 39 Matthew Garrett 2010-12-08 21:21:56 UTC

Created attachment 467586 [details]
Set PCI _OSC to 0 if we don't gain full control

Can you give this one a go?

Comment 40 Matthew Garrett 2010-12-09 00:06:23 UTC

Actually, sorry again - I don't think that one's right either. I'm having trouble figuring out exactly what's going wrong here. Still working on it...

Comment 41 Matthew Garrett 2010-12-09 03:03:37 UTC

Ah, no, figured it out. I'll have a patch tomorrow.

Comment 42 Matthew Garrett 2010-12-10 18:02:23 UTC

Created attachment 468029 [details]
Set _OSC supported field correctly

I think this should fix it for you.

Comment 43 Kyle McMartin 2010-12-10 18:32:13 UTC

http://kyle.fedorapeople.org/kernel/2.6.37-0.rc5.git2.1.bz638912.fc15/

Please try that for a kernel with Matthew's fix included.

Comment 44 Sander D 2010-12-10 22:32:15 UTC

This patch seems to fix it indeed. There's only an increased kworker CPU usage (around 0.3%) for a few times per minute, and it seems to take more than a minute for the gpe_all thing to increase. This seems good to me. Thanks a lot! Also thanks to Kyle for the compiled kernel.

Comment 45 Kyle McMartin 2010-12-10 22:53:00 UTC

Great, thanks, I shoved it into rawhide and it'll be in the next build.

Comment 46 Sander D 2011-02-28 11:40:41 UTC

With recent kernels this specific problem seems to have gone indeed, so maybe this bug should be marked as fixed (which I cannot do).

However, in Ubuntu 10.10 on the same pc with the unsupported ('mainline') kernel 2.6.38-020638rc6-generic, the problem seems to return after resuming from suspend. There is a highly active kworker thread and /sys/firmware/acpi/interrupts/gpe_all is increasing very quickly.

I guess it has to do something with the same problem, but if preferred I could create a new patch (on bugzilla.kernel.org?). I currently don't have a Fedora installation available unfortunately, but will be able to install it next week.

Comment 47 Sander D 2011-03-05 18:03:51 UTC

In Rawhide have the same suspend problem as mentioned in my previous comment:

I'm running kernel 2.6.38-0.rc6.git0.1.fc16.x86_64 and after resuming from suspend, kworker/0:1 constantly uses 65-70% CPU. gpe_all increases quickly. I don't have this problem after a reboot.

Comment 48 Chuck Ebbert 2011-03-08 17:17:43 UTC

(In reply to comment #47)
> I'm running kernel 2.6.38-0.rc6.git0.1.fc16.x86_64 and after resuming from
> suspend, kworker/0:1 constantly uses 65-70% CPU. gpe_all increases quickly. I
> don't have this problem after a reboot.

Please file a new bug for that problem.

Comment 49 Jesse Hutton 2011-07-03 04:35:56 UTC

Created attachment 511022 [details]
debug output from normal boot and with pcie_ports=compat

With F15 up-to-date as of today (kernel 2.6.38.8-32.fc15.x86_64), I'm seeing the same behaviour as the original reporter: kworker cpu usage around 80% constant. 

watch -n 0.1 cat /sys/firmware/acpi/interrupts/gpe_all shows interrupts climbing at around 10000/sec. As suggested in the comments booting with pcie_ports=compat does eliminate the high cpu usage and crazy interrupt rate.

Comment 50 Jesse Hutton 2011-07-03 04:47:50 UTC

Created attachment 511023 [details]
dmesg from booting with and without pcie_ports=compat

Comment 51 Jesse Hutton 2011-07-03 04:55:16 UTC

Actually, it looks like pcie_ports=compat doesn't work all the time. I'm experiencing the same problem now even after having booted with that parameter (same boot from the included dmesgs above).

Should I open a new bug?

Comment 52 Ernesto Manríquez 2011-08-02 03:42:11 UTC

I have this with kernel 2.6.40-4.fc15, and pcie_ports=compat ameliorates the situation. Is it microcode-related? Can we reasonably expect a fix? I didn't have this issue until some updates were installed.

Aug 01 14:22:41 Updated: 1:cups-libs-1.4.8-1.fc15.x86_64
Aug 01 14:22:45 Updated: yum-3.2.29-8.fc15.noarch
Aug 01 14:22:48 Updated: 1:dbus-libs-1.4.6-5.fc15.x86_64
Aug 01 14:22:52 Updated: 1:dbus-1.4.6-5.fc15.x86_64
Aug 01 14:23:00 Updated: libcap-2.22-1.fc15.x86_64
Aug 01 14:23:01 Updated: mesa-libGL-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:23:01 Updated: 32:bind-license-9.8.0-8.P4.fc15.noarch
Aug 01 14:23:02 Updated: mesa-dri-filesystem-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:23:05 Updated: mesa-dri-llvmcore-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:23:07 Updated: 32:bind-libs-9.8.0-8.P4.fc15.x86_64
Aug 01 14:23:08 Updated: mesa-libGLU-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:23:11 Updated: mesa-libGL-devel-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:23:16 Updated: gtk3-3.0.12-1.fc15.x86_64
Aug 01 14:23:17 Updated: pinentry-0.8.1-4.fc15.x86_64
Aug 01 14:23:18 Updated: lilypond-fonts-common-2.14.2-1.fc15.x86_64
Aug 01 14:23:20 Updated: system-config-printer-libs-1.3.5-1.fc15.x86_64
Aug 01 14:23:22 Updated: system-config-printer-udev-1.3.5-1.fc15.x86_64
Aug 01 14:23:25 Updated: lilypond-emmentaler-fonts-2.14.2-1.fc15.x86_64
Aug 01 14:23:25 Updated: pinentry-qt-0.8.1-4.fc15.x86_64
Aug 01 14:23:31 Updated: gtk3-devel-3.0.12-1.fc15.x86_64
Aug 01 14:23:32 Updated: mesa-libGLU-devel-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:23:33 Updated: 32:bind-utils-9.8.0-8.P4.fc15.x86_64
Aug 01 14:23:37 Updated: mesa-dri-drivers-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:23:38 Updated: 32:bind-libs-lite-9.8.0-8.P4.fc15.x86_64
Aug 01 14:23:48 Updated: 1:cups-1.4.8-1.fc15.x86_64
Aug 01 14:23:48 Updated: 1:dbus-x11-1.4.6-5.fc15.x86_64
Aug 01 14:23:49 Updated: 1:dbus-devel-1.4.6-5.fc15.x86_64
Aug 01 14:23:51 Updated: 1:wpa_supplicant-0.7.3-9.fc15.x86_64
Aug 01 14:23:52 Updated: yum-plugin-fastestmirror-1.1.30-3.fc15.noarch
Aug 01 14:23:53 Updated: yum-utils-1.1.30-3.fc15.noarch
Aug 01 14:23:54 Updated: yum-plugin-auto-update-debug-info-1.1.30-3.fc15.noarch
Aug 01 14:23:55 Updated: createrepo-0.9.9-4.fc15.noarch
Aug 01 14:23:57 Updated: 1:cups-devel-1.4.8-1.fc15.x86_64
Aug 01 14:23:59 Updated: gdb-7.3-41.fc15.x86_64
Aug 01 14:24:06 Updated: bluedevil-1.1.1-1.fc15.x86_64
Aug 01 14:24:09 Updated: xorg-x11-fonts-ISO8859-1-100dpi-7.5-4.fc15.noarch
Aug 01 14:24:12 Updated: cyrus-sasl-debuginfo-2.1.23-18.fc15.x86_64
Aug 01 14:24:13 Updated: p7zip-9.20.1-2.fc15.x86_64
Aug 01 14:24:16 Updated: 1:dbus-debuginfo-1.4.6-5.fc15.x86_64
Aug 01 14:24:18 Updated: perf-2.6.40-4.fc15.x86_64
Aug 01 14:24:59 Installed: kernel-devel-2.6.40-4.fc15.x86_64
Aug 01 14:25:02 Updated: libjingle-0.5.8-1.fc15.x86_64
Aug 01 14:25:06 Updated: lohit-telugu-fonts-2.4.5-14.fc15.noarch
Aug 01 14:25:15 Updated: kernel-headers-2.6.40-4.fc15.x86_64
Aug 01 14:25:46 Updated: mesa-debuginfo-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:25:47 Updated: libsoup-2.34.3-1.fc15.x86_64
Aug 01 14:25:49 Updated: 2:tar-1.26-1.fc15.x86_64
Aug 01 14:25:50 Updated: p7zip-plugins-9.20.1-2.fc15.x86_64
Aug 01 14:25:51 Updated: cifs-utils-5.0-2.fc15.x86_64
Aug 01 14:25:52 Updated: mesa-libGL-7.11-0.18.20110730.0.fc15.i686
Aug 01 14:25:52 Updated: mesa-libGLU-7.11-0.18.20110730.0.fc15.i686
Aug 01 14:25:53 Updated: 1:dbus-libs-1.4.6-5.fc15.i686
Aug 01 14:25:54 Updated: 1:cups-libs-1.4.8-1.fc15.i686
Aug 01 14:26:08 Installed: kernel-2.6.40-4.fc15.x86_64

Since then, even booting with kernel-2.6.38 gave this bug, unless I booted with the option above.

Comment 53 Ernesto Manríquez 2011-08-02 05:13:25 UTC

More on this: this error receded when I deleted the b43 module and installed broadcom-wl, with a BCM4312 WiFi card. Perhaps the kworker thread was polling bluetooth, since there was no bluetooth detected here with the newer kernels.

Comment 54 Walter Neumann 2011-08-15 04:05:52 UTC

I have same problem with kernel 2.6.40-4.fc15. Can the bug be reopened, or should I file a separate bug?

Comment 55 Daniel 2011-09-18 22:42:02 UTC

I got the same problem running
2.6.40.4-5.fc15.x86_64

Disabling usb autosuspend in powertop fixed it.

Note You need to log in before you can comment on or make changes to this bug.