Description of problem:
Running top shows a task called kworker/0:3 running at between 85% and 95%. This builds up over time, starting at around 10%. By running at this level, my BOINC Client will not run.
Version-Release number of selected component (if applicable):
Kernel version 2.6.36-0.28.rc6.git0.fc15.x86_64
Steps to Reproduce:
This is still happening with the latest 2.6.36-1.fc15.x86_64 kernel on my Rawhide installation. I've got 2 cores with hyper-threading, and one is constantly running around 100% because of kworker/0:1.
Hi, could you please try to reproduce with the 2.6.37-rc3 kernel from here:
(It'll be going into rawhide real soon but not immediately.)
Can you attach your dmesg?
Created attachment 462727 [details]
Here's my dmesg, running with 2.6.37-0.1.rc2.git0.fc15.x86_64 where I have the same problem.
Thanks, it's hard to debug without knowing what's in the worker thread, and the workqueue tracer seems to be disabled in 2.6.36 and higher (I'll poke that later.) I've been told powertop2 can tell you, if you build it from:
git clone git://git.kernel.org/pub/scm/status/powertop/powertop.git
I'll work on a package for it if I can't find one (I don't believe the old one is sufficient.)
I tried that out, and it doesn't seem to work (displaying symbols) on x86_64... Arjan said he'd look into it. Will update you (but we really need to see what's on that workqueue that's sucking cpu.)
OK, powertop has allegedly been fixed.
Try that, and figure out which kWorker thread is the culprit?
Created attachment 463111 [details]
Thanks for the package! I'm not sure what displays the kWorker threads, but I've attached the content of the Overview and the Device stats tabs.
I'm still running 2.6.37-0.1.rc2.git0.fc15.x86_64 by the way, not -rc3. 'yum update' still doesn't automatically pick that. Should I try downloading and installing the RPMs?
Yeah, sorry, I changed the version numbering. If you could force remove that and upgrade you'll get the new stuff.
Blargh, nothing there looks like it would account for 85%.
Looks like USB might be the problem.. Can you try this:
Which is built without the USB autosuspend stuff, and let me know if it helps?
I'm running that kernel right now, and kWorker is still very busy. Now it's around 94% even, instead of 85%.
When I just start PowerTOP, acpi_os_execute_deferred takes a lot of kWork time (867ms). But afterwards it seems normal, around 50ms. So I guess that's just PowerTOP querying ACPI or something.
Yeah, quite probably.
I'll ask mjg59 to take a look on Monday, I don't really know how to debug this, nothing jumps out as suspicious in your trace. (Nothing is running for particularly large fractions of a second...)
and let me know if it is rising particularly quickly at runtime?
I'm curious what the output of the following shows ..
perf record -a -e workqueue:workqueue_execute_start sleep 1
wait a second, and then
that might show something interesting.
I've broken my Rawhide install (will look at that this weekend), and am running an Ubuntu install right now with 2.6.37-7. The same problem occurs. Ubuntu doesn't seem to support the Kyle's command in Comment 14.
(In reply to comment #15)
> I'm curious what the output of the following shows ..
> perf record -a -e workqueue:workqueue_execute_start sleep 1
> wait a second, and then
> perf trace
> that might show something interesting.
It gives 4733 occurrences of: "workqueue_execute_start: work struct 0xffff880036e4eb90: function acpi_os_execute_deferred" (with different struct names) and 151 occurences of the same string but with do_dbs_timer. The ACPI thing seems a bit much?
Same story in Rawhide with kernel 126.96.36.199-10.fc15.x86_64. A lot of acpi_os_execute_deferred events in the 'perf trace' output.
I can't watch the gpe_all thing even when running as root. The error I get is:
sh: /sys/firmware/acpi/interrupts/gpe_all: Permission denied
It probably has nothing to do with it, but 'ls -l' gives:
-rw-r--r--. 1 root root 4096 Dec 5 23:20 /sys/firmware/acpi/interrupts/gpe_all
Try the following:
watch -n 0.1 cat /sys/firmware/acpi/interrupts/gpe_all
and see if the number there is constantly increasing. If so, please attach the output of
grep -r . /sys/firmware/acpi/interrupts/
Created attachment 465026 [details]
Output of: grep -r . /sys/firmware/acpi/interrupts/
The number in gpe_all is constantly increasing indeed, with a few thousand per second. Attached is the output of the grep command.
Ok, looks like gpe 1, which should only be relevant for hotplugging. Could you attach the output of lspci -vvvxxxx (run as root) and then install the pmtools package, run acpidump (again as root) and attach the output of that as well?
Created attachment 465040 [details]
Output of `lspci -vvvxxxx` on 188.8.131.52-10.fc15.x86_64
Created attachment 465041 [details]
Output of `acpidump` on 184.108.40.206-10.fc15.x86_64
Hm. Has this always been a problem, or did it appear with more recent kernels? It's correct for us to have gpe 1 enabled, so the fact that it's firing all the time is obviously a problem...
It's not a problem on Ubuntu with Linux 2.6.35-23-generic: no high kWorker CPU usage and no constant increase of gpe_all. Neither was it a problem on Fedora 14 if I recall correctly. Both distributions show the problem with 2.6.36 or 2.6.37, so I guess it appears only with recent kernels.
Hm. One of your ports has the hotplug flag set. Can you run
and attach the output?
Also, if you boot with
as a kernel argument, does the problem vanish?
And, finally, if you boot without that option (ie, just a normal boot), what does
ls -l /sys/bus/pci/slots/module
Created attachment 465071 [details]
Output of `lspci -t` on 220.127.116.11-11.fc15.x86_64
Running without the extra boot option on the 18.104.22.168-11.fc15.x86_64 kernel. It's a newer version than before, but I still have the same problem.
The path you specified doesn't seem to exist on my pc, but this seems to come close:
[root@elitebook ~]# ls -l /sys/bus/pci/slots/module
ls: cannot access /sys/bus/pci/slots/module: No such file or directory
[root@elitebook ~]# ls -l /sys/bus/pci/slots/1/module/
drwxr-xr-x. 2 root root 0 Dec 6 22:24 parameters
[root@elitebook ~]# ls -l /sys/bus/pci/slots/1/module/parameters/
-rw-r--r--. 1 root root 4096 Dec 6 22:24 debug
When I boot 22.214.171.124-11.fc15.x86_64 with pcie_ports=compat, the problem disappears. No high kWorker CPU usage and no constant increase of gpe_all.
Whoops, slight mistake there on my part.
ls -ld /sys/bus/pci/slots/1/module
Make sure you don't include an extra / at the end, or it'll give you a different answer! Can you do this both with and without the pcie_ports=compat option?
With the pcie_ports=compat option:
# ls -ld /sys/bus/pci/slots/1/module
lrwxrwxrwx. 1 root root 0 Dec 6 22:47 /sys/bus/pci/slots/1/module -> ../../../../module/acpiphp
Without the option:
# ls -ld /sys/bus/pci/slots/1/module
lrwxrwxrwx. 1 root root 0 Dec 6 22:51 /sys/bus/pci/slots/1/module -> ../../../../module/acpiphp
Ok, that's not quite what I was expecting. Can you do
as root for both the working (pcie_ports=compat) and non-working (no argument) states and attach the output?
Created attachment 465091 [details]
lspci -vvvxxxx with pcie_ports=compat
(Still running the same 126.96.36.199-11.fc15.x86_64 kernel.)
Created attachment 465092 [details]
lspci -vvvxxxx without pcie_ports=compat
Is there any chance you could build a kernel with the patch I'm about to attach, boot it (without pcie_ports=) and attach dmesg? The difference between the two configurations could explain what you're seeing, but for the life of me I'm unclear on why it ends up being configured that way.
Created attachment 465100 [details]
I should be able to, haven't compiled a kernel since a few years though so it will take a little longer. I'll try in a few days and report to this bug again. Thanks so far!
Actually, ignore that - I think I've got a better idea as to what's happening now. I'll try to come up with a patch for you.
Created attachment 467586 [details]
Set PCI _OSC to 0 if we don't gain full control
Can you give this one a go?
Actually, sorry again - I don't think that one's right either. I'm having trouble figuring out exactly what's going wrong here. Still working on it...
Ah, no, figured it out. I'll have a patch tomorrow.
Created attachment 468029 [details]
Set _OSC supported field correctly
I think this should fix it for you.
Please try that for a kernel with Matthew's fix included.
This patch seems to fix it indeed. There's only an increased kworker CPU usage (around 0.3%) for a few times per minute, and it seems to take more than a minute for the gpe_all thing to increase. This seems good to me. Thanks a lot! Also thanks to Kyle for the compiled kernel.
Great, thanks, I shoved it into rawhide and it'll be in the next build.
With recent kernels this specific problem seems to have gone indeed, so maybe this bug should be marked as fixed (which I cannot do).
However, in Ubuntu 10.10 on the same pc with the unsupported ('mainline') kernel 2.6.38-020638rc6-generic, the problem seems to return after resuming from suspend. There is a highly active kworker thread and /sys/firmware/acpi/interrupts/gpe_all is increasing very quickly.
I guess it has to do something with the same problem, but if preferred I could create a new patch (on bugzilla.kernel.org?). I currently don't have a Fedora installation available unfortunately, but will be able to install it next week.
In Rawhide have the same suspend problem as mentioned in my previous comment:
I'm running kernel 2.6.38-0.rc6.git0.1.fc16.x86_64 and after resuming from suspend, kworker/0:1 constantly uses 65-70% CPU. gpe_all increases quickly. I don't have this problem after a reboot.
(In reply to comment #47)
> I'm running kernel 2.6.38-0.rc6.git0.1.fc16.x86_64 and after resuming from
> suspend, kworker/0:1 constantly uses 65-70% CPU. gpe_all increases quickly. I
> don't have this problem after a reboot.
Please file a new bug for that problem.
Created attachment 511022 [details]
debug output from normal boot and with pcie_ports=compat
With F15 up-to-date as of today (kernel 188.8.131.52-32.fc15.x86_64), I'm seeing the same behaviour as the original reporter: kworker cpu usage around 80% constant.
watch -n 0.1 cat /sys/firmware/acpi/interrupts/gpe_all shows interrupts climbing at around 10000/sec. As suggested in the comments booting with pcie_ports=compat does eliminate the high cpu usage and crazy interrupt rate.
Created attachment 511023 [details]
dmesg from booting with and without pcie_ports=compat
Actually, it looks like pcie_ports=compat doesn't work all the time. I'm experiencing the same problem now even after having booted with that parameter (same boot from the included dmesgs above).
Should I open a new bug?
I have this with kernel 2.6.40-4.fc15, and pcie_ports=compat ameliorates the situation. Is it microcode-related? Can we reasonably expect a fix? I didn't have this issue until some updates were installed.
Aug 01 14:22:41 Updated: 1:cups-libs-1.4.8-1.fc15.x86_64
Aug 01 14:22:45 Updated: yum-3.2.29-8.fc15.noarch
Aug 01 14:22:48 Updated: 1:dbus-libs-1.4.6-5.fc15.x86_64
Aug 01 14:22:52 Updated: 1:dbus-1.4.6-5.fc15.x86_64
Aug 01 14:23:00 Updated: libcap-2.22-1.fc15.x86_64
Aug 01 14:23:01 Updated: mesa-libGL-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:23:01 Updated: 32:bind-license-9.8.0-8.P4.fc15.noarch
Aug 01 14:23:02 Updated: mesa-dri-filesystem-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:23:05 Updated: mesa-dri-llvmcore-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:23:07 Updated: 32:bind-libs-9.8.0-8.P4.fc15.x86_64
Aug 01 14:23:08 Updated: mesa-libGLU-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:23:11 Updated: mesa-libGL-devel-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:23:16 Updated: gtk3-3.0.12-1.fc15.x86_64
Aug 01 14:23:17 Updated: pinentry-0.8.1-4.fc15.x86_64
Aug 01 14:23:18 Updated: lilypond-fonts-common-2.14.2-1.fc15.x86_64
Aug 01 14:23:20 Updated: system-config-printer-libs-1.3.5-1.fc15.x86_64
Aug 01 14:23:22 Updated: system-config-printer-udev-1.3.5-1.fc15.x86_64
Aug 01 14:23:25 Updated: lilypond-emmentaler-fonts-2.14.2-1.fc15.x86_64
Aug 01 14:23:25 Updated: pinentry-qt-0.8.1-4.fc15.x86_64
Aug 01 14:23:31 Updated: gtk3-devel-3.0.12-1.fc15.x86_64
Aug 01 14:23:32 Updated: mesa-libGLU-devel-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:23:33 Updated: 32:bind-utils-9.8.0-8.P4.fc15.x86_64
Aug 01 14:23:37 Updated: mesa-dri-drivers-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:23:38 Updated: 32:bind-libs-lite-9.8.0-8.P4.fc15.x86_64
Aug 01 14:23:48 Updated: 1:cups-1.4.8-1.fc15.x86_64
Aug 01 14:23:48 Updated: 1:dbus-x11-1.4.6-5.fc15.x86_64
Aug 01 14:23:49 Updated: 1:dbus-devel-1.4.6-5.fc15.x86_64
Aug 01 14:23:51 Updated: 1:wpa_supplicant-0.7.3-9.fc15.x86_64
Aug 01 14:23:52 Updated: yum-plugin-fastestmirror-1.1.30-3.fc15.noarch
Aug 01 14:23:53 Updated: yum-utils-1.1.30-3.fc15.noarch
Aug 01 14:23:54 Updated: yum-plugin-auto-update-debug-info-1.1.30-3.fc15.noarch
Aug 01 14:23:55 Updated: createrepo-0.9.9-4.fc15.noarch
Aug 01 14:23:57 Updated: 1:cups-devel-1.4.8-1.fc15.x86_64
Aug 01 14:23:59 Updated: gdb-7.3-41.fc15.x86_64
Aug 01 14:24:06 Updated: bluedevil-1.1.1-1.fc15.x86_64
Aug 01 14:24:09 Updated: xorg-x11-fonts-ISO8859-1-100dpi-7.5-4.fc15.noarch
Aug 01 14:24:12 Updated: cyrus-sasl-debuginfo-2.1.23-18.fc15.x86_64
Aug 01 14:24:13 Updated: p7zip-9.20.1-2.fc15.x86_64
Aug 01 14:24:16 Updated: 1:dbus-debuginfo-1.4.6-5.fc15.x86_64
Aug 01 14:24:18 Updated: perf-2.6.40-4.fc15.x86_64
Aug 01 14:24:59 Installed: kernel-devel-2.6.40-4.fc15.x86_64
Aug 01 14:25:02 Updated: libjingle-0.5.8-1.fc15.x86_64
Aug 01 14:25:06 Updated: lohit-telugu-fonts-2.4.5-14.fc15.noarch
Aug 01 14:25:15 Updated: kernel-headers-2.6.40-4.fc15.x86_64
Aug 01 14:25:46 Updated: mesa-debuginfo-7.11-0.18.20110730.0.fc15.x86_64
Aug 01 14:25:47 Updated: libsoup-2.34.3-1.fc15.x86_64
Aug 01 14:25:49 Updated: 2:tar-1.26-1.fc15.x86_64
Aug 01 14:25:50 Updated: p7zip-plugins-9.20.1-2.fc15.x86_64
Aug 01 14:25:51 Updated: cifs-utils-5.0-2.fc15.x86_64
Aug 01 14:25:52 Updated: mesa-libGL-7.11-0.18.20110730.0.fc15.i686
Aug 01 14:25:52 Updated: mesa-libGLU-7.11-0.18.20110730.0.fc15.i686
Aug 01 14:25:53 Updated: 1:dbus-libs-1.4.6-5.fc15.i686
Aug 01 14:25:54 Updated: 1:cups-libs-1.4.8-1.fc15.i686
Aug 01 14:26:08 Installed: kernel-2.6.40-4.fc15.x86_64
Since then, even booting with kernel-2.6.38 gave this bug, unless I booted with the option above.
More on this: this error receded when I deleted the b43 module and installed broadcom-wl, with a BCM4312 WiFi card. Perhaps the kworker thread was polling bluetooth, since there was no bluetooth detected here with the newer kernels.
I have same problem with kernel 2.6.40-4.fc15. Can the bug be reopened, or should I file a separate bug?
I got the same problem running
Disabling usb autosuspend in powertop fixed it.