Red Hat Bugzilla – Full Text Bug Listing
|Summary:||kswapd0 using 100% CPU|
|Product:||[Fedora] Fedora||Reporter:||Pádraig Brady <p>|
|Component:||kernel||Assignee:||Kernel Maintainer List <kernel-maint>|
|Status:||CLOSED UPSTREAM||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Version:||15||CC:||a.j.delaney, bloch, dan.doel, den.mail, gansalmon, hulin.thibaud, itamar, jcmj, jjardon, jonathan, kernel-maint, llevet, madhu.chinakonda, Magnumgr, nux, pbrady, redhat_bugzilla, zanetu|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2011-08-28 20:47:54 EDT||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Pádraig Brady 2011-06-09 05:50:41 EDT
I can reliably get kswapd0 to spin by just copying data around my hard disk. It will spin until I remove the files I was working on from the buffer cache (by deleting or unmounting the partition). Unmounting the partition is the most reliable way to stop kswapd0 from spinning, and does so immediately. The system is stable for compiles etc, but once big files start moving around, boom! I'm using the latest F15 kernel (22.214.171.124-31.fc15.x86_64) on sandy bridge (dual core i3). I can reliably reproduce it by using dvdauthor to generate large files. The trigger size seems to be about 700MB. Just did it there again and kswapd is in this state: $ ps -C kswapd0 -l F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 1 R 0 35 2 4 80 0 - 0 ? ? 00:18:54 kswapd0 Unmount the partition and we get back to: $ ps -C kswapd0 -l F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 1 S 0 35 2 4 80 0 - 0 kswapd ? 00:20:02 kswapd0 $ dmesg | grep -Fi swap [ 6.562859] Adding 1507324k swap on /dev/sdb3. Priority:0 extents:1 across:1507324k SS Swap is on a very fast SSD in case that's of relevance. $ free -m total used free shared buffers cached Mem: 2860 1148 1712 0 63 578 -/+ buffers/cache: 506 2354 Swap: 1471 0 1471 This makes the system unusable for me. Dangerous too! Yesterday evening I was in a rush and closed the lid to suspend, but didn't notice kswapd was spinning away affecting the suspend process. I removed a _very_ hot laptop from my bag 90 mins later. Note old bugs 649694 and 555633 seem related (both on x86_64). Sometimes there's an oops which I presume is related. BUG: soft lockup - CPU#0 stuck for 67s! [kswapd0:35] Modules linked in: ip6table_filter ip6_tables tcp_lp fuse 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 snd_hda_codec_hdmi snd_hda_codec_idt arc4 dell_wmi snd_hda_intel sparse_keymap snd_hda_codec btusb snd_hwdep iwlagn snd_seq uvcvideo videodev dell_laptop snd_seq_device v4l2_compat_ioctl32 dcdbas microcode bluetooth iwlcore mac80211 snd_pcm xhci_hcd iTCO_wdt i2c_i801 iTCO_vendor_support cfg80211 rfkill r8169 mii snd_timer wmi snd soundcore snd_page_alloc ipv6 usb_storage uas i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: ip6_tables] CPU 0 Modules linked in: ip6table_filter ip6_tables tcp_lp fuse 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 snd_hda_codec_hdmi snd_hda_codec_idt arc4 dell_wmi snd_hda_intel sparse_keymap snd_hda_codec btusb snd_hwdep iwlagn snd_seq uvcvideo videodev dell_laptop snd_seq_device v4l2_compat_ioctl32 dcdbas microcode bluetooth iwlcore mac80211 snd_pcm xhci_hcd iTCO_wdt i2c_i801 iTCO_vendor_support cfg80211 rfkill r8169 mii snd_timer wmi snd soundcore snd_page_alloc ipv6 usb_storage uas i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: ip6_tables] Pid: 35, comm: kswapd0 Not tainted 126.96.36.199-27.fc15.x86_64 #1 Dell Inc. Inspiron N5110/034W60 RIP: 0010:[<ffffffff810e45c6>] [<ffffffff810e45c6>] shrink_slab+0x86/0x166 RSP: 0018:ffff8800b2961da0 EFLAGS: 00000206 RAX: 0000000000000000 RBX: ffff8800b2961de0 RCX: 0000000000000002 RDX: 0000000000041a00 RSI: 0000000000000000 RDI: ffffffff81a3d980 RBP: ffff8800b2961de0 R08: 0000000000000004 R09: 0000000000000bc7 R10: 0000000000000002 R11: 0000000000000420 R12: ffffffff8100a58e R13: ffff8800b2961e58 R14: 0000000000000000 R15: 0000000000024181 FS: 0000000000000000(0000) GS:ffff88001ee00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f7382ef0000 CR3: 0000000001a03000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kswapd0 (pid: 35, threadinfo ffff8800b2960000, task ffff8800b6185c80) Stack: 000000000000003d 0000000000000080 ffff880000000000 ffff8801005ec000 ffff8801005ec000 0000000000000002 0000000000000000 000000000000000c ffff8800b2961ee0 ffffffff810e71d0 ffff88001ef8f2f0 ffff8800b6185c80 Call Trace: [<ffffffff810e71d0>] kswapd+0x517/0x77c [<ffffffff810e6cb9>] ? kswapd+0x0/0x77c [<ffffffff8106eb53>] kthread+0x84/0x8c [<ffffffff8100a9e4>] kernel_thread_helper+0x4/0x10 [<ffffffff8106eacf>] ? kthread+0x0/0x8c [<ffffffff8100a9e0>] ? kernel_thread_helper+0x0/0x10 Code: 83 eb 10 e9 ce 00 00 00 44 89 f2 31 f6 48 89 df ff 13 48 63 4b 08 4c 63 e8 48 8b 45 c8 31 d2 48 f7 f1 31 d2 49 0f af c5 49 f7 f7
Comment 1 Pádraig Brady 2011-06-12 19:31:11 EDT
Actually to reproduce reliably, I need to cache a file over 2G. It often happens for files over 1.7G. As noted about my swap is 1.5G (none used). A simple reproducer to get kswapd0 spinning is to just do the following on any of my ext4 file systems (I've not tried other types): $ dd bs=1M count=2000 if=/dev/zero of=file.spin To get kswapd0 to stop spinning, uncache the file, the simplest method being to: rm file.spin
Comment 2 Dan Doel 2011-06-20 14:45:22 EDT
I encountered this same problem twice consecutively copying a 700MB file to a USB flash drive. Once was using nautilus, once was using cp in the console. I also had it occur seemingly at random this morning. I wasn't copying any files, so the only disk activity should have been, for instance, IRC logging. No significant disk usage that I could think of, but I noticed that the computer was running very hot, and kswapd0 was pegged at 100%. Kernel: 2.6.38-32.fc15.x86_64 Machine: Sandy Bridge i5, 2GB RAM free -m total used free shared buffers cached Mem: 1829 1403 425 0 7 816 -/+ buffers/cache: 579 1250 Swap: 10239 0 10239 My swap is also on an SSD, and I have swappiness set very low (1).
Comment 3 Pádraig Brady 2011-06-20 19:43:35 EDT
Note if you're not sure which file is cached, you can get kswapd0 out of the loop by doing: echo 1 > /proc/sys/vm/drop_caches Hmm so you have sandy bridge too. I wonder is it some SNB specific kernel locking issue? Aha! Googling for: http://www.google.ie/search?q=sandy+bridge+kernel+locking lists: http://www.gossamer-threads.com/lists/linux/kernel/1378998 which leads to the 2 patches at: http://marc.info/?l=linux-mm&m=130503811704830&w=2 I'll try these in the morning.
Comment 4 Pádraig Brady 2011-06-21 05:42:27 EDT
Mel Gorman's 2 patches referenced above did _not_ fix the issue for me. I think it made it a little more difficult to reproduce, in that I have to dd a little more data as described above to trigger the livelock.
Comment 5 Michael 2011-06-21 06:03:56 EDT
i have the same symptoms with kswapd0 and 100% CPU usage at my PC (CPU AMD 5600x2). With the latest kernel (188.8.131.52-32.fc15.x86_64) the problem appears but no so often.
Comment 6 Pádraig Brady 2011-06-21 06:32:12 EDT
So comment #5 is not sandy bridge. So you have swap on a fast SSD? Here is my mem hierarchy: L1 cache 64K/core 64GB/s L2 cache 256K/core 32GB/s L3 cache 3M 24GB/s RAM 3G 14GB/s SSD 120G 270MB/s (would do 500MB/s if it didn't saturate the SATA II) HD 320G 82MB/s What's the output of: free -m
Comment 7 Pádraig Brady 2011-06-21 12:23:14 EDT
FYI I've reported this upstream: http://marc.info/?t=130865025500001&r=1&w=2 No fixes yet.
Comment 8 Michael 2011-06-24 08:41:09 EDT
i do not know if it is relative but my PC has also 3GB of RAM and i notice a few times that when the problem occur, if i run the "yum update" from the terminal, my system return to normal behavior.
Comment 9 Pádraig Brady 2011-06-24 09:15:23 EDT
I guess `yum update` was using lot of RAM and pushing the offending data out of the page cache. Anyway the good news is that there is a fix for me at at least: http://marc.info/?l=linux-mm&m=130891589306063&w=2
Comment 10 Pádraig Brady 2011-06-24 11:32:53 EDT
And Mel said that he'll push to 2.6.38-stable so hopefully fedora will get this automatically with the next 2.6.38... merge
Comment 11 Michael 2011-07-07 11:47:30 EDT
As I saw in yesterday's new version of the kernel (kernel-184.108.40.206-35) the patch to correct the problem with ksapd0 has not been added. I hope soon to be added.
Comment 12 Dan Doel 2011-07-07 12:32:26 EDT
This isn't a serious suggestion, but... I actually haven't encountered this bug since I upgraded my laptop to 8 GB of memory. Previously I had 2, which was closer to that of the initial reporter. So, having a lot more cache space than is required for the file you're moving seems to alleviate this problem, and you could possibly 'fix' it for yourself by buying a large quantity of memory. :) I don't have any 7 GB files to copy around, but it wouldn't surprise me if it took files of around that size to reliably trigger the bug here now.
Comment 13 Pádraig Brady 2011-07-08 05:07:40 EDT
As suggested in comment #1, bump the value in that dd command to test larger values As for when this might appear, the flow is: mel gorman -> andrew morton -> linus (mainline) -> gregkh (stable) Andrew hasn't pushed these changes yet.
Comment 14 Pádraig Brady 2011-07-12 04:31:27 EDT
Created attachment 512367 [details] fix for 220.127.116.11 2.6.38-stable is unfortunately no longer maintained. The fix for this will be in the next 2.6.39-stable series. If we don't want to wait for that I've attached the backport from Mel for 18.104.22.168
Comment 15 Michael 2011-07-13 10:25:43 EDT
So there is no hope to be included official to kernel 2.6.38.x... Fedora 15 it gonna officially support the kernel 2.6.39.x ? Or how i can use the attachment to patch my kernel 2.6.38.x ?
Comment 16 Pádraig Brady 2011-07-13 10:47:22 EDT
(In reply to comment #15) > So there is no hope to be included official to kernel 2.6.38.x... no > Fedora 15 it gonna officially support the kernel 2.6.39.x ? probably > Or how i can use the attachment to patch my kernel 2.6.38.x ? yumdownloader --source kernel rpm -ivh kernel*.src.rpm # extract patches to ~/rpmbuild/SOURCES and update SPECS/kernel-2.6.spec rpmbuild -ba kernel-2.6.spec Note the above rebuilds all modules which takes a while. Personally I just tweaked the Makefile to add the appropriate extraversion and ran make bzImage
Comment 17 Michael 2011-07-13 11:12:23 EDT
(In reply to comment #16) > (In reply to comment #15) > > So there is no hope to be included official to kernel 2.6.38.x... > > no > > > Fedora 15 it gonna officially support the kernel 2.6.39.x ? > > probably > > > Or how i can use the attachment to patch my kernel 2.6.38.x ? > > yumdownloader --source kernel > rpm -ivh kernel*.src.rpm > # extract patches to ~/rpmbuild/SOURCES and update SPECS/kernel-2.6.spec copy the file in attachment to ~/rpmbuild/SOURCES ? > rpmbuild -ba kernel-2.6.spec > > Note the above rebuilds all modules which takes a while. > Personally I just tweaked the Makefile to add the appropriate extraversion > and ran make bzImage Sorry but i do not have deal with kernel compile before ! With this procedure, patch the existing kernel or rebuild a new ?
Comment 18 Pádraig Brady 2011-07-19 08:22:34 EDT
If back-porting the above patch set, this one is needed too: http://marc.info/?l=linux-mm&m=131105937331301&q=raw
Comment 19 Michael 2011-08-19 05:42:07 EDT
I think the problem is solved with kernel 2.6.40.x Until now, my PC with Fedora works without problem !
Comment 20 Pádraig Brady 2011-08-28 20:47:54 EDT
2.6.40 (3.0) includes the fix and works for me. closing...
Comment 21 jeff 2011-10-19 16:23:12 EDT
Still happening in 22.214.171.124 1gb VBox guest, firefox and thunderbird open with many tabs takes a day to show up... just trying to minimize the open windows causes it.
Comment 22 jeff 2011-12-30 11:40:09 EST
This still is happening in latest Kernel 3.1.6-1.fc16.x86_64 #1 SMP free -m total used free shared buffers cached Mem: 2003 1941 62 0 0 98 -/+ buffers/cache: 1842 160 Swap: 2271 1658 613 ps -C kswapd0 -l F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 1 D 0 21 2 0 80 0 - 0 conges ? 00:01:08 kswapd0
Comment 23 Thibaud 2012-02-28 03:25:36 EST
It affect me, with a 64 bits OS too but it's not specific to redhat : $ uname -a Linux hulin-Latitude-E5520 3.0.0-16-generic #29-Ubuntu SMP Tue Feb 14 12:48:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/484045 Is is specific to OS operating in 64 bits mode ?
Comment 24 jeff 2012-02-28 14:20:11 EST
Seems better on this kernel.. Linux one4.biz 3.2.7-1.fc16.x86_64 #1 SMP Tue Feb 21 01:40:47 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Comment 25 Pádraig Brady 2013-03-19 12:57:32 EDT
While the orig issue reported is improved, Thunderbird using 1957m virt, 722m RSS on my 3G system with kernel 126.96.36.199-5.fc15.x86_64 is very regularly stalling for me. Also looks like this or a closely related problem is still an issue upstream: https://lkml.org/lkml/2013/3/8/435
Comment 26 Pádraig Brady 2013-03-19 14:03:55 EDT
I just noticed very recent restructuring of kswapd which may address some of these issues: https://lkml.org/lkml/2013/3/17/50 At least it confirms the issue is still present
Comment 27 Nux 2013-09-28 20:19:31 EDT
Still happening on 3.12.0-0.rc2.el6.elrepo.x86_64 .. Dropping the caches "fixes" it temporarily.
Comment 28 Aidan Delaney 2014-01-31 15:54:28 EST
I'm seeing this on Fedora 20 3.12.8-300.fc20.x86_64 As reported, dropping caches echo 1 > /proc/sys/vm/drop_caches is a temporary fix.
Comment 29 Dag 2014-05-21 09:29:27 EDT
I'm seeing it happening on Fedora 20 3.13.10-200.fc20.x86_64 I tried : echo 1 > /proc/sys/vm/drop_caches It seems to "calm down" kswapd0 for some time, but it reappears taking up 100% CPU, slowing everything down, generating heat, and fans getting noisy beacause they spin at full speed. So it seems this "fix has no effect anymore. My system does not have ANY swap at all. And this kswapd0 thing is happening when free RAM is low. Closing some RAM-consuming applications make kswapd0 stop consuming CPU. Regard, Daggett