Red Hat Bugzilla – Bug 135312
kswapd has trouble with high I/O
Last modified: 2015-01-04 17:10:35 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20041008
Description of problem:
When running an I/O-intensive process, kswapd starts to constantly use
alot of CPU (50-60% according to top), causing all other processes to
stumble to a halt.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Run gdm -> gnome (minimal setup)
2. Open terminal
3. Use cedega to start BfVietnam.exe
(Or basically any other process that eats RAM like it was candy)
Actual Results: Other processes than kswapd die/freeze.
Unless offending I/O-intensive process is killed thoroughly, the
system will eventually cease responding and a hard reboot is required.
Expected Results: kswapd should complete whatever it was doing and
let the rest of the system continue as normal.
riel> it's a known bug from the upstream kernel, fixed in akpm's
latest pile of patches
Tested using kernel-2.6.8-1.607 now. As far as I can tell this is
supposed to be based on
and the patches to kswapd included there. Still experiencing the same
behaviour. System dies.
Bug #126234 looks just like this bug.
I'm having problems which might be related to this bug. I mentioned
this first in bug #131251 comment #34:
Now that I finally got my memory sticks I tried upgrading the memory
to 4GB. And boy did I open another can of worms by doing that. After
booting up I ran "free" and saw that I had 4GB of usable memory in
Linux. I loaded the [680MB] SQL dump file with nano and did the usual
text search trick. It worked fine, but that's probably because the
system had enough RAM this time (one text editor instance uses only
about 1GB of memory). I loaded another copy in another session, it
loaded nicely as well. I started loading the third file when the
system seemed to get stuck, nano just stopped at "Reading File". On
another channel I tried running "free" to see if the system was
already swapping at this point (it shouldn't). Surprise surprise, the
"free" command itself got stuck, it refused to output anything and
also refused to return back to command prompt. Running "ls" and
"vmstat 2" on other channels resulted in the same, nothing was printed
on the screen and those commands didn't return to the command prompt.
The editors on channels 1 and 2 were responsive and worked, until I
tried to move to the next page. This move made both of them get stuck.
This happened with 2.6.9 and elevator=as.
Another experience, this time with 2.6.9-bk6 with the grab_swap_token
function call commented out to work around the OOM issue in bug
#131251. I started "vmstat 2" on one channel, and managed to load 3
instances of nano on three different channels. The fourth got stuck in
Reading File phase. Meanwhile the vmstat command on another channel
kept on running with the following information:
I left the computer running for a few hours, during which the free and
cache values changes very slightly I had an open bash prompt on one
channel and tried running "free" on it, but it got stuck as in my
previous experiment. nano sessions also got stuck once I tried moving
to the next page of text.
To make the FC developers happy I tried the Fedora Core version of the
kernel, build 640. This time I had "top" running on one channel and
"vmstat" on another. The vmstat values before trying to start nano:
Three instances of nano loaded fine, the fourth stopped again in the
Reading File phase. vmstat kept running, with these values:
Notice that the 100 value was now indeed in the System (sy) column,
unlike in 2.6.9-bk6 where the 100 value was in the Wait (wa) column. I
don't know if this was just a coincidence or repeatable behaviour. The
"top" session which I had running on another channel got stuck, the
screen no longer got updated. On that frozen screen I can see that
kswapd0 uses 66.9% of CPU (state:R) and nano uses the remaining 33.0%
I'm running the latest development branch of Fedora on an Abit AV8
motherboard, with AMD Athlon 64 3500+ and 4GB of memory. No X running
on this server. The hard disk is a parallel ATA Seagate 120GB.
Also testing .640 here. I'm having mixed results, and will report back
when I've done more extensive testing...
I tried if I could reproduce the problem on vanilla 2.6.10-rc1 kernel.
I wasn't able to lock up the system, but still there's something weird
going on in kswpad. I managed to load 4 copies of my text file and
after that "top" shows that kswapd0 is using 99% of cpu, although
nothing visible seems to be happening. According to vmstat, no
swapping is currently being done (si=so=0).
During the loading of those files kswapd seemed to be acting sanely,
but once the kernel had to touch swap kswapd went crazy. Currently
there's 33156 KB of used swap space, 7152 KB of free memory, 2564 KB
of buffers and 97080 KB of cache. The system worked fine at this point
of time, even though kswapd was consuming lots of CPU time.
Addendum: When I tried copying that 680MB test file to another name
for further testing, the system locked up solidly as in my previous
experiments. The cp process never finished. The frozen "top" screen
tells that kswapd is using 99% of cpu. vmstat keeps running, this time
the values are 33156 KB for swap, 89852 KB free, 3324 KB buffers,
12804 KB cache. The editor on the first channel froze when I tried to
move to the next page. I tried exiting the remaining 3 nano text
editors, but they froze too. At this point the vmstat command that was
still running on another channel displayed 6 processes being in
uninterruptible sleep (vmstat column "b").
Is there something useful that I could test?
Still, with .640 and .643 - kswapd goes crazy as soon as swap is
touched. Settles (ie down to about 6% cpu usage) when tabbing to a
text-terminal, goes back up to eat cpu (~60% constant) when tabbing
back to X.
Don't know if there's any point in this, but I'm going to test with
the initial fc3t2 kernel for reference.
I failed to mention I am no longer able to kill the system this way.
Kswapd now seems to only affect the performance of said I/O-intensive
Thanks, I can confirm it's sorted in the .649