Under high load, top/ps/procinfo stop updating, while other applications
seem to keep running (slowly, of course.) top can lock up and stop updating
for two to three minutes at a time! I would hope that psutils would be no
less responsive under load, as that's basically the only way to rescue a
box that is thrashing to death.
We (Red Hat) should really try to fix this before next release.
A fix for this is in Linus kernel 2.4.3-pre8
Unfortionatly, this fix has caused major instabilities.
We're hoping to get a stable fix as soon as possible
I don't think 2.4.3 fixed it, as 2.4.5-0.2.9 is no better than the 7.1-release
I've attached a python script to help demonstrate the problem. Run this script
in one terminal and wait for it to stop spawning threads; after it's done, run a
ps aux in another terminal. Alternately, if you have *alot* of patience, try
starting top in another terminal.
It's not nearly so bad under normal circumstances, but in any case ps and top
aren't very useful under high load. FreeBSD does quite well on this test
compared to Linux 2.4.
Created attachment 20803 [details]
It turns out this problem is caused by VMA fragmentation in heavily threaded
processes. Watching ps output, there's a marked slowdown as it prints stats
for threads. glibc issue? Kernel issue? Who knows. For now, don't run lots
of threads on Linux ...