From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20041008 Firefox/0.10.1 Description of problem: When running an I/O-intensive process, kswapd starts to constantly use alot of CPU (50-60% according to top), causing all other processes to stumble to a halt. Version-Release number of selected component (if applicable): 2.6.8-1.603 How reproducible: Always Steps to Reproduce: 1. Run gdm -> gnome (minimal setup) 2. Open terminal 3. Use cedega to start BfVietnam.exe (Or basically any other process that eats RAM like it was candy) Actual Results: Other processes than kswapd die/freeze. Unless offending I/O-intensive process is killed thoroughly, the system will eventually cease responding and a hard reboot is required. Expected Results: kswapd should complete whatever it was doing and let the rest of the system continue as normal. Additional info: riel> it's a known bug from the upstream kernel, fixed in akpm's latest pile of patches
Tested using kernel-2.6.8-1.607 now. As far as I can tell this is supposed to be based on http://kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.9-rc4 and the patches to kswapd included there. Still experiencing the same behaviour. System dies.
Bug #126234 looks just like this bug.
I'm having problems which might be related to this bug. I mentioned this first in bug #131251 comment #34: --clip-- Now that I finally got my memory sticks I tried upgrading the memory to 4GB. And boy did I open another can of worms by doing that. After booting up I ran "free" and saw that I had 4GB of usable memory in Linux. I loaded the [680MB] SQL dump file with nano and did the usual text search trick. It worked fine, but that's probably because the system had enough RAM this time (one text editor instance uses only about 1GB of memory). I loaded another copy in another session, it loaded nicely as well. I started loading the third file when the system seemed to get stuck, nano just stopped at "Reading File". On another channel I tried running "free" to see if the system was already swapping at this point (it shouldn't). Surprise surprise, the "free" command itself got stuck, it refused to output anything and also refused to return back to command prompt. Running "ls" and "vmstat 2" on other channels resulted in the same, nothing was printed on the screen and those commands didn't return to the command prompt. The editors on channels 1 and 2 were responsive and worked, until I tried to move to the next page. This move made both of them get stuck. This happened with 2.6.9 and elevator=as. --clip-- Another experience, this time with 2.6.9-bk6 with the grab_swap_token function call commented out to work around the OOM issue in bug #131251. I started "vmstat 2" on one channel, and managed to load 3 instances of nano on three different channels. The fourth got stuck in Reading File phase. Meanwhile the vmstat command on another channel kept on running with the following information: r 0 b 6 swpd 23324 free 5780 buff 1112 cache 290616 si 0 so 0 bi 0 bo 0 in 1001 cs 4-6 us 0 sy 0 id 0 wa 100 I left the computer running for a few hours, during which the free and cache values changes very slightly I had an open bash prompt on one channel and tried running "free" on it, but it got stuck as in my previous experiment. nano sessions also got stuck once I tried moving to the next page of text. To make the FC developers happy I tried the Fedora Core version of the kernel, build 640. This time I had "top" running on one channel and "vmstat" on another. The vmstat values before trying to start nano: r 0 b 0 swpd 0 free 3906088 buff 14708 cache 44488 si 0 so 0 bi 0 bo 0 in 1001-1003 cs 8-14 us 0 sy 0 id 100 wa 0 Three instances of nano loaded fine, the fourth stopped again in the Reading File phase. vmstat kept running, with these values: r 1 b 2 swpd 23360 free 6956 buff 12548 cache 679088 si 0 so 0 bi 0 bo 0 in 1001 cs 6-9 us 0 sy 100 id 0 wa 0 Notice that the 100 value was now indeed in the System (sy) column, unlike in 2.6.9-bk6 where the 100 value was in the Wait (wa) column. I don't know if this was just a coincidence or repeatable behaviour. The "top" session which I had running on another channel got stuck, the screen no longer got updated. On that frozen screen I can see that kswapd0 uses 66.9% of CPU (state:R) and nano uses the remaining 33.0% (state:D). I'm running the latest development branch of Fedora on an Abit AV8 motherboard, with AMD Athlon 64 3500+ and 4GB of memory. No X running on this server. The hard disk is a parallel ATA Seagate 120GB.
Also testing .640 here. I'm having mixed results, and will report back when I've done more extensive testing...
I tried if I could reproduce the problem on vanilla 2.6.10-rc1 kernel. I wasn't able to lock up the system, but still there's something weird going on in kswpad. I managed to load 4 copies of my text file and after that "top" shows that kswapd0 is using 99% of cpu, although nothing visible seems to be happening. According to vmstat, no swapping is currently being done (si=so=0). During the loading of those files kswapd seemed to be acting sanely, but once the kernel had to touch swap kswapd went crazy. Currently there's 33156 KB of used swap space, 7152 KB of free memory, 2564 KB of buffers and 97080 KB of cache. The system worked fine at this point of time, even though kswapd was consuming lots of CPU time. Addendum: When I tried copying that 680MB test file to another name for further testing, the system locked up solidly as in my previous experiments. The cp process never finished. The frozen "top" screen tells that kswapd is using 99% of cpu. vmstat keeps running, this time the values are 33156 KB for swap, 89852 KB free, 3324 KB buffers, 12804 KB cache. The editor on the first channel froze when I tried to move to the next page. I tried exiting the remaining 3 nano text editors, but they froze too. At this point the vmstat command that was still running on another channel displayed 6 processes being in uninterruptible sleep (vmstat column "b"). Is there something useful that I could test?
Still, with .640 and .643 - kswapd goes crazy as soon as swap is touched. Settles (ie down to about 6% cpu usage) when tabbing to a text-terminal, goes back up to eat cpu (~60% constant) when tabbing back to X. Don't know if there's any point in this, but I'm going to test with the initial fc3t2 kernel for reference.
I failed to mention I am no longer able to kill the system this way. Kswapd now seems to only affect the performance of said I/O-intensive task...
Thanks, I can confirm it's sorted in the .649 linux-2.6.9-vm-tame-oomkiller.patch