Bug 712019
Summary: | kswapd0 using 100% CPU | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Pádraig Brady <p> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 15 | CC: | a.j.delaney, bloch, dan.doel, den.mail, gansalmon, hulin.thibaud, itamar, jcmj, jjardon, jonathan, kernel-maint, llevet, madhu.chinakonda, Magnumgr, nux, pbrady, redhat_bugzilla, zanetu, zhangzhaoming | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-08-29 00:47:54 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Pádraig Brady
2011-06-09 09:50:41 UTC
Actually to reproduce reliably, I need to cache a file over 2G. It often happens for files over 1.7G. As noted about my swap is 1.5G (none used). A simple reproducer to get kswapd0 spinning is to just do the following on any of my ext4 file systems (I've not tried other types): $ dd bs=1M count=2000 if=/dev/zero of=file.spin To get kswapd0 to stop spinning, uncache the file, the simplest method being to: rm file.spin I encountered this same problem twice consecutively copying a 700MB file to a USB flash drive. Once was using nautilus, once was using cp in the console. I also had it occur seemingly at random this morning. I wasn't copying any files, so the only disk activity should have been, for instance, IRC logging. No significant disk usage that I could think of, but I noticed that the computer was running very hot, and kswapd0 was pegged at 100%. Kernel: 2.6.38-32.fc15.x86_64 Machine: Sandy Bridge i5, 2GB RAM free -m total used free shared buffers cached Mem: 1829 1403 425 0 7 816 -/+ buffers/cache: 579 1250 Swap: 10239 0 10239 My swap is also on an SSD, and I have swappiness set very low (1). Note if you're not sure which file is cached, you can get kswapd0 out of the loop by doing: echo 1 > /proc/sys/vm/drop_caches Hmm so you have sandy bridge too. I wonder is it some SNB specific kernel locking issue? Aha! Googling for: http://www.google.ie/search?q=sandy+bridge+kernel+locking lists: http://www.gossamer-threads.com/lists/linux/kernel/1378998 which leads to the 2 patches at: http://marc.info/?l=linux-mm&m=130503811704830&w=2 I'll try these in the morning. Mel Gorman's 2 patches referenced above did _not_ fix the issue for me. I think it made it a little more difficult to reproduce, in that I have to dd a little more data as described above to trigger the livelock. i have the same symptoms with kswapd0 and 100% CPU usage at my PC (CPU AMD 5600x2). With the latest kernel (2.6.38.8-32.fc15.x86_64) the problem appears but no so often. So comment #5 is not sandy bridge. So you have swap on a fast SSD? Here is my mem hierarchy: L1 cache 64K/core 64GB/s L2 cache 256K/core 32GB/s L3 cache 3M 24GB/s RAM 3G 14GB/s SSD 120G 270MB/s (would do 500MB/s if it didn't saturate the SATA II) HD 320G 82MB/s What's the output of: free -m FYI I've reported this upstream: http://marc.info/?t=130865025500001&r=1&w=2 No fixes yet. i do not know if it is relative but my PC has also 3GB of RAM and i notice a few times that when the problem occur, if i run the "yum update" from the terminal, my system return to normal behavior. I guess `yum update` was using lot of RAM and pushing the offending data out of the page cache. Anyway the good news is that there is a fix for me at at least: http://marc.info/?l=linux-mm&m=130891589306063&w=2 And Mel said that he'll push to 2.6.38-stable so hopefully fedora will get this automatically with the next 2.6.38... merge As I saw in yesterday's new version of the kernel (kernel-2.6.38.8-35) the patch to correct the problem with ksapd0 has not been added. I hope soon to be added. This isn't a serious suggestion, but... I actually haven't encountered this bug since I upgraded my laptop to 8 GB of memory. Previously I had 2, which was closer to that of the initial reporter. So, having a lot more cache space than is required for the file you're moving seems to alleviate this problem, and you could possibly 'fix' it for yourself by buying a large quantity of memory. :) I don't have any 7 GB files to copy around, but it wouldn't surprise me if it took files of around that size to reliably trigger the bug here now. As suggested in comment #1, bump the value in that dd command to test larger values As for when this might appear, the flow is: mel gorman -> andrew morton -> linus (mainline) -> gregkh (stable) Andrew hasn't pushed these changes yet. Created attachment 512367 [details]
fix for 2.6.38.8
2.6.38-stable is unfortunately no longer maintained.
The fix for this will be in the next 2.6.39-stable series.
If we don't want to wait for that I've attached the backport from Mel for 2.6.38.8
So there is no hope to be included official to kernel 2.6.38.x... Fedora 15 it gonna officially support the kernel 2.6.39.x ? Or how i can use the attachment to patch my kernel 2.6.38.x ? (In reply to comment #15) > So there is no hope to be included official to kernel 2.6.38.x... no > Fedora 15 it gonna officially support the kernel 2.6.39.x ? probably > Or how i can use the attachment to patch my kernel 2.6.38.x ? yumdownloader --source kernel rpm -ivh kernel*.src.rpm # extract patches to ~/rpmbuild/SOURCES and update SPECS/kernel-2.6.spec rpmbuild -ba kernel-2.6.spec Note the above rebuilds all modules which takes a while. Personally I just tweaked the Makefile to add the appropriate extraversion and ran make bzImage (In reply to comment #16) > (In reply to comment #15) > > So there is no hope to be included official to kernel 2.6.38.x... > > no > > > Fedora 15 it gonna officially support the kernel 2.6.39.x ? > > probably > > > Or how i can use the attachment to patch my kernel 2.6.38.x ? > > yumdownloader --source kernel > rpm -ivh kernel*.src.rpm > # extract patches to ~/rpmbuild/SOURCES and update SPECS/kernel-2.6.spec copy the file in attachment to ~/rpmbuild/SOURCES ? > rpmbuild -ba kernel-2.6.spec > > Note the above rebuilds all modules which takes a while. > Personally I just tweaked the Makefile to add the appropriate extraversion > and ran make bzImage Sorry but i do not have deal with kernel compile before ! With this procedure, patch the existing kernel or rebuild a new ? If back-porting the above patch set, this one is needed too: http://marc.info/?l=linux-mm&m=131105937331301&q=raw I think the problem is solved with kernel 2.6.40.x Until now, my PC with Fedora works without problem ! 2.6.40 (3.0) includes the fix and works for me. closing... Still happening in 2.6.40.6 1gb VBox guest, firefox and thunderbird open with many tabs takes a day to show up... just trying to minimize the open windows causes it. This still is happening in latest Kernel 3.1.6-1.fc16.x86_64 #1 SMP free -m total used free shared buffers cached Mem: 2003 1941 62 0 0 98 -/+ buffers/cache: 1842 160 Swap: 2271 1658 613 ps -C kswapd0 -l F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 1 D 0 21 2 0 80 0 - 0 conges ? 00:01:08 kswapd0 It affect me, with a 64 bits OS too but it's not specific to redhat : $ uname -a Linux hulin-Latitude-E5520 3.0.0-16-generic #29-Ubuntu SMP Tue Feb 14 12:48:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/484045 Is is specific to OS operating in 64 bits mode ? Seems better on this kernel.. Linux one4.biz 3.2.7-1.fc16.x86_64 #1 SMP Tue Feb 21 01:40:47 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux While the orig issue reported is improved, Thunderbird using 1957m virt, 722m RSS on my 3G system with kernel 2.6.40.4-5.fc15.x86_64 is very regularly stalling for me. Also looks like this or a closely related problem is still an issue upstream: https://lkml.org/lkml/2013/3/8/435 I just noticed very recent restructuring of kswapd which may address some of these issues: https://lkml.org/lkml/2013/3/17/50 At least it confirms the issue is still present Still happening on 3.12.0-0.rc2.el6.elrepo.x86_64 .. Dropping the caches "fixes" it temporarily. I'm seeing this on Fedora 20 3.12.8-300.fc20.x86_64 As reported, dropping caches echo 1 > /proc/sys/vm/drop_caches is a temporary fix. I'm seeing it happening on Fedora 20 3.13.10-200.fc20.x86_64 I tried : echo 1 > /proc/sys/vm/drop_caches It seems to "calm down" kswapd0 for some time, but it reappears taking up 100% CPU, slowing everything down, generating heat, and fans getting noisy beacause they spin at full speed. So it seems this "fix has no effect anymore. My system does not have ANY swap at all. And this kswapd0 thing is happening when free RAM is low. Closing some RAM-consuming applications make kswapd0 stop consuming CPU. Regard, Daggett I'm seeing it happening on CentOS Linux release 7.2.1511 (Core) # uname -a Linux nas 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux My system does not have ANY swap at all. And this kswapd0 thing is happening when free RAM is low. |