I have been seeing some problems with running nfs benchmarks
at very high loads and were wondering if somebody could show
me some pointers to where the problem lies.
The system is a 2.4.0 kernel on a 6.2 Red at distribution.
What appears to be happening is that the system has
enough free memory (~2K pages or 8mb) that the swap,
reclaim and buffer daemons are satisfied that they
do not need to free more. In other words system memory
has reached the water marks such that the daemons have
quiesced. Unfortunately 2K pages is less than 1% of
the system's memory (1 GB RAM == 262144 4K pages) so
the odds of obtaining 2 consecutive pages can be quite
low; especially if approximately 300 processes are
attempting to fork(2) at the same time.
The Linux page allocation routine, __alloc_pages(),
will attempt to clean and free pages when trying to
satisfy a request for more than one page. Unfortunately
it will only free random different pages, rather than
trying to free a page large enough to meet the request.
In other words the VM subsystem should free an 8K page
in order to satisfy an 8K request, as opposed to 2
independent 4K pages.
Please retry with our 2.4.2-0.1.49 kernel as is available in rawhide.
There are several (eg dozens) VM related bugs fixed in that.
Actually, I have tried it on the following releases with the same results:
redhat 6.2 and redhat 7.0 with kernel version 2.4.0 -> 2.4.3-pre5
Where can I obtain a copy of 2.4.2-0.1.49 and is it part
of a distribution?
It appears that this problem was introduced in open source
kernel 2.4.0. I have seen a comment by Alan Cox stating
that "The 2.4 VM is currently too broken to survive high I/O
benchmark tests without going silly".
Has Redhat done anything to the vm subsystem to
alleviate the 1-order allocation failed problem?
The Rawhide part of our ftp site has the 2.4.2-XXX kernel RPMs. Yes we have
patches in our kernel to improve the VM performance/behaviour, however, there is
still room for improvement.
I have been unable to locate version 2.4.2-0.1.49 on RedHat's ftp site.
could you please prove me with a pointer?
As 7.1 is now released, you should get that kernel, 2.4.2-2
It seems that my original reported problem still exists in 7.1 kernel 2.4.2-2
and will probably exist until the Linux community fixes the problem.
Thanks for the help.
Kernel 2.4.2-2 should be mostly ok. Having said that, we fixed an algorithm-bug
in the VM last week and a kernel with that fix is on its way to rawhide.
(eg basic QA and such).
Do you know of any "benchmark"-like program that we can use this to reproduce
the problem? (eg a more-or-less standalone program that we can add to our
Basically, any fork bomb process will create the problem.
Create a process that recursively spawns copies of itself.
Under kernel version 2.2 the system would slow way down, but under
kernel version 2.4 the system hangs. By hang I mean, you can't log in
and kill the process. To create a login process, you need 2 consecutive pages
and since there are no consecutive pages you can't log in, hence the system
appears to be hung.
I'm running 7.1 w/2.4.2-2.
./tiotest -f 500 -b 8192 -d /diskb -t 10
I get the same errors.
Is there a fix yet?
May 17 10:46:45 localhost kernel: __alloc_pages: 0-order allocation failed.
May 17 10:47:48 localhost last message repeated 777 times
May 17 10:48:52 localhost last message repeated 34 times
May 17 10:50:33 localhost last message repeated 67 times
May 17 10:52:22 localhost last message repeated 34 times
May 17 10:53:46 localhost last message repeated 33 times
May 17 10:55:18 localhost last message repeated 67 times
May 17 10:57:53 localhost last message repeated 46 times
May 17 10:57:55 localhost last message repeated 12 times
May 17 10:58:16 localhost kernel: failed.
May 17 10:58:16 localhost kernel: __alloc_pages: 0-order allocation failed.
May 17 10:58:49 localhost last message repeated 324 times
May 17 11:00:28 localhost last message repeated 586 times
May 17 11:02:04 localhost last message repeated 67 times
May 17 11:03:50 localhost last message repeated 66 times
May 17 11:04:49 localhost last message repeated 276 times
May 17 11:05:15 localhost last message repeated 154 times
May 17 11:06:50 localhost last message repeated 390 times
tiotest available @ http://www.iki.fi/miku/tiotest
Those are warnings not errors. Also, could you please try kernel 2.4.3-5
available from rawhide? We are continuesly improving the VM, and would be
very interested in feedback from others.
I use tiobench myself on a regular basis and don't see this message..