Bug 35032

Summary: __alloc_pages: 1-order allocation failed.
Product: [Retired] Red Hat Linux Reporter: Need Real Name <edmilhomme>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED CURRENTRELEASE QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: dnielsen, edmilhomme
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-06-06 00:14:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2001-04-06 15:31:51 UTC
I have been seeing some problems with running nfs benchmarks
at very high loads and were wondering if somebody could show
me some pointers to where the problem lies.
The system is a 2.4.0 kernel on a 6.2 Red at distribution.

What appears to be happening is that the system has
enough free memory (~2K pages or 8mb) that the swap,
reclaim and buffer daemons are satisfied that they
do not need to free more.  In other words system memory
has reached the water marks such that the daemons have
quiesced.  Unfortunately 2K pages is less than 1% of
the system's memory (1 GB RAM == 262144 4K pages) so
the odds of obtaining 2 consecutive pages can be quite
low; especially if approximately 300 processes are
attempting to fork(2) at the same time.

The Linux page allocation routine, __alloc_pages(),
will attempt to clean and free pages when trying to
satisfy a request for more than one page.  Unfortunately
it will only free random different pages, rather than
trying to free a page large enough to meet the request.
In other words the VM subsystem should free an 8K page
in order to satisfy an 8K request, as opposed to 2
independent 4K pages.

Comment 1 Arjan van de Ven 2001-04-06 15:38:41 UTC
Please retry with our 2.4.2-0.1.49 kernel as is available in rawhide.
There are several (eg dozens) VM related bugs fixed in that.

Comment 2 Need Real Name 2001-04-06 15:44:37 UTC
Actually, I have tried it on the following releases with the same results:

    redhat 6.2 and redhat 7.0 with kernel version 2.4.0 -> 2.4.3-pre5 




Comment 3 Need Real Name 2001-04-12 14:47:40 UTC
Where can I obtain a copy of 2.4.2-0.1.49 and is it part
of a distribution?
It appears that this problem was introduced in open source
kernel 2.4.0.  I have seen a comment by Alan Cox stating
that "The 2.4 VM is currently too broken to survive high I/O
benchmark tests without going silly".
Has Redhat done anything to the vm subsystem to
alleviate the 1-order allocation failed problem?

thanks
Ed


Comment 4 Arjan van de Ven 2001-04-12 14:52:47 UTC
The Rawhide part of our ftp site has the 2.4.2-XXX kernel RPMs. Yes we have 
patches in our kernel to improve the VM performance/behaviour, however, there is 
still room for improvement.

Comment 5 Need Real Name 2001-04-19 15:10:06 UTC
Hi,

I have been unable to locate version 2.4.2-0.1.49 on RedHat's ftp site.
could you please prove me with a pointer?


thanks
Ed


Comment 6 Arjan van de Ven 2001-04-19 15:57:58 UTC
As 7.1 is now released, you should get that kernel, 2.4.2-2


Comment 7 Need Real Name 2001-05-04 14:54:50 UTC
It seems that my original reported problem still exists in 7.1 kernel 2.4.2-2
and will probably exist until the Linux community fixes the problem.
 
Thanks for the help.  


Comment 8 Arjan van de Ven 2001-05-05 11:42:46 UTC
Kernel 2.4.2-2 should be mostly ok. Having said that, we fixed an algorithm-bug
in the VM last week and a kernel with that fix is on its way to rawhide.
(eg basic QA and such).

Do you know of any "benchmark"-like program that we can use this to reproduce 
the problem? (eg a more-or-less standalone program that we can add to our
testsuites)

Comment 9 Need Real Name 2001-05-07 15:26:16 UTC
Basically, any fork bomb process will create the problem.
Create a process that recursively spawns copies of itself.

Under kernel version 2.2 the system would slow way down, but under
kernel version 2.4 the system hangs.  By hang I mean, you can't log in
and kill the process.  To create a login process, you need 2 consecutive pages
and since there are no consecutive pages you can't log in, hence the system
appears to be hung.


Comment 10 Need Real Name 2001-05-17 21:11:25 UTC
I'm running 7.1 w/2.4.2-2.
Running:
./tiotest -f 500 -b 8192 -d /diskb -t 10
I get the same errors.
Is there a fix yet?

/var/log/messages
**************************
May 17 10:46:45 localhost kernel: __alloc_pages: 0-order allocation failed.
May 17 10:47:48 localhost last message repeated 777 times
May 17 10:48:52 localhost last message repeated 34 times
May 17 10:50:33 localhost last message repeated 67 times
May 17 10:52:22 localhost last message repeated 34 times
May 17 10:53:46 localhost last message repeated 33 times
May 17 10:55:18 localhost last message repeated 67 times
May 17 10:57:53 localhost last message repeated 46 times
May 17 10:57:55 localhost last message repeated 12 times
May 17 10:58:16 localhost kernel: failed.
May 17 10:58:16 localhost kernel: __alloc_pages: 0-order allocation failed.
May 17 10:58:49 localhost last message repeated 324 times
May 17 11:00:28 localhost last message repeated 586 times
May 17 11:02:04 localhost last message repeated 67 times
May 17 11:03:50 localhost last message repeated 66 times
May 17 11:04:49 localhost last message repeated 276 times
May 17 11:05:15 localhost last message repeated 154 times
May 17 11:06:50 localhost last message repeated 390 times
*******************************

tiotest available @ http://www.iki.fi/miku/tiotest

Comment 11 Arjan van de Ven 2001-05-18 07:39:54 UTC
dnielsen.com:
Those are warnings not errors. Also, could you please try kernel 2.4.3-5
available from rawhide? We are continuesly improving the VM, and would be
very interested in feedback from others. 

I use tiobench myself on a regular basis and don't see this message..