Bug 35032

Summary:	__alloc_pages: 1-order allocation failed.
Product:	[Retired] Red Hat Linux	Reporter:	Need Real Name <edmilhomme>
Component:	kernel	Assignee:	Arjan van de Ven <arjanv>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Brock Organ <borgan>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.1	CC:	dnielsen, edmilhomme
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2003-06-06 00:14:33 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Need Real Name 2001-04-06 15:31:51 UTC

I have been seeing some problems with running nfs benchmarks
at very high loads and were wondering if somebody could show
me some pointers to where the problem lies.
The system is a 2.4.0 kernel on a 6.2 Red at distribution.

What appears to be happening is that the system has
enough free memory (~2K pages or 8mb) that the swap,
reclaim and buffer daemons are satisfied that they
do not need to free more.  In other words system memory
has reached the water marks such that the daemons have
quiesced.  Unfortunately 2K pages is less than 1% of
the system's memory (1 GB RAM == 262144 4K pages) so
the odds of obtaining 2 consecutive pages can be quite
low; especially if approximately 300 processes are
attempting to fork(2) at the same time.

The Linux page allocation routine, __alloc_pages(),
will attempt to clean and free pages when trying to
satisfy a request for more than one page.  Unfortunately
it will only free random different pages, rather than
trying to free a page large enough to meet the request.
In other words the VM subsystem should free an 8K page
in order to satisfy an 8K request, as opposed to 2
independent 4K pages.

Comment 1 Arjan van de Ven 2001-04-06 15:38:41 UTC

Please retry with our 2.4.2-0.1.49 kernel as is available in rawhide.
There are several (eg dozens) VM related bugs fixed in that.

Comment 2 Need Real Name 2001-04-06 15:44:37 UTC

Actually, I have tried it on the following releases with the same results:

    redhat 6.2 and redhat 7.0 with kernel version 2.4.0 -> 2.4.3-pre5

Comment 3 Need Real Name 2001-04-12 14:47:40 UTC

Where can I obtain a copy of 2.4.2-0.1.49 and is it part
of a distribution?
It appears that this problem was introduced in open source
kernel 2.4.0.  I have seen a comment by Alan Cox stating
that "The 2.4 VM is currently too broken to survive high I/O
benchmark tests without going silly".
Has Redhat done anything to the vm subsystem to
alleviate the 1-order allocation failed problem?

thanks
Ed

Comment 4 Arjan van de Ven 2001-04-12 14:52:47 UTC

The Rawhide part of our ftp site has the 2.4.2-XXX kernel RPMs. Yes we have 
patches in our kernel to improve the VM performance/behaviour, however, there is 
still room for improvement.

Comment 5 Need Real Name 2001-04-19 15:10:06 UTC

Hi,

I have been unable to locate version 2.4.2-0.1.49 on RedHat's ftp site.
could you please prove me with a pointer?


thanks
Ed

Comment 6 Arjan van de Ven 2001-04-19 15:57:58 UTC

As 7.1 is now released, you should get that kernel, 2.4.2-2

Comment 7 Need Real Name 2001-05-04 14:54:50 UTC

It seems that my original reported problem still exists in 7.1 kernel 2.4.2-2
and will probably exist until the Linux community fixes the problem.
 
Thanks for the help.

Comment 8 Arjan van de Ven 2001-05-05 11:42:46 UTC

Kernel 2.4.2-2 should be mostly ok. Having said that, we fixed an algorithm-bug
in the VM last week and a kernel with that fix is on its way to rawhide.
(eg basic QA and such).

Do you know of any "benchmark"-like program that we can use this to reproduce 
the problem? (eg a more-or-less standalone program that we can add to our
testsuites)

Comment 9 Need Real Name 2001-05-07 15:26:16 UTC

Basically, any fork bomb process will create the problem.
Create a process that recursively spawns copies of itself.

Under kernel version 2.2 the system would slow way down, but under
kernel version 2.4 the system hangs.  By hang I mean, you can't log in
and kill the process.  To create a login process, you need 2 consecutive pages
and since there are no consecutive pages you can't log in, hence the system
appears to be hung.

Comment 10 Need Real Name 2001-05-17 21:11:25 UTC

I'm running 7.1 w/2.4.2-2.
Running:
./tiotest -f 500 -b 8192 -d /diskb -t 10
I get the same errors.
Is there a fix yet?

/var/log/messages
**************************
May 17 10:46:45 localhost kernel: __alloc_pages: 0-order allocation failed.
May 17 10:47:48 localhost last message repeated 777 times
May 17 10:48:52 localhost last message repeated 34 times
May 17 10:50:33 localhost last message repeated 67 times
May 17 10:52:22 localhost last message repeated 34 times
May 17 10:53:46 localhost last message repeated 33 times
May 17 10:55:18 localhost last message repeated 67 times
May 17 10:57:53 localhost last message repeated 46 times
May 17 10:57:55 localhost last message repeated 12 times
May 17 10:58:16 localhost kernel: failed.
May 17 10:58:16 localhost kernel: __alloc_pages: 0-order allocation failed.
May 17 10:58:49 localhost last message repeated 324 times
May 17 11:00:28 localhost last message repeated 586 times
May 17 11:02:04 localhost last message repeated 67 times
May 17 11:03:50 localhost last message repeated 66 times
May 17 11:04:49 localhost last message repeated 276 times
May 17 11:05:15 localhost last message repeated 154 times
May 17 11:06:50 localhost last message repeated 390 times
*******************************

tiotest available @ http://www.iki.fi/miku/tiotest

Comment 11 Arjan van de Ven 2001-05-18 07:39:54 UTC

dnielsen.com:
Those are warnings not errors. Also, could you please try kernel 2.4.3-5
available from rawhide? We are continuesly improving the VM, and would be
very interested in feedback from others. 

I use tiobench myself on a regular basis and don't see this message..