Red Hat Bugzilla – Bug 29927
[vm balance]Memory errors with Cerberus with > 1 GB of RAM
Last modified: 2007-04-18 12:31:49 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0; DigExt)
Within the first hour of starting cerberus( both cts 1.2.1 and 1.2.15) on
systems running RC1, kernel reports:
"__alloc_pages: 0-order allocation failed"
Happens only on systems with > 1 GB of RAM
This has occurred on:
system RAM swap kernel
pe1300 1GB 512MB smp
pe1550 1GB 512MB smp
pe1400 2GB 4GB smp/up
pe4400 3.5GB 512MB smp
pe6350 1GB 512MB smp
A pe2400 w/ 512MB RAM and 70MB of swap space (created with auto-
partitioning) has worked without error.
All systems have been running of onboard scsi or 39160 scsi card.
During error on machine w/ 2GB ram and 4GB swap, top reports all but 3-5MB
of ram in use, and only 2GB of 4GB swap in use.
After receiving this error, rebooting the machine will hang at "Turning
Steps to Reproduce:
1.Install > 1 GB of RAM
2. Run VA Linux Cerberus Test (1.2.1 or 1.2.15)
3. program will fail with error
Actual Results: "__alloc_pages: 0-order allocation failed" message
Expected Results: No errors
Known problem, reproducible here. I'm already investigating.
I am not sure if this is related to this bug, but I have a Supermicro 370DL3 mb
with 1 gig of memory. After install, a look at gtop (or top) shows that the
system is only recognizing *64* megs. This mb has the Serverworks ServerSet III
LE chipset, not sure if that has anything to do with it. Am curious to see if
the fix for this works for me, though it appears that my error is with detection
at the install phase.
Detection problems are entirely separate from the performance problems we are
seeing: please open a separate bugzilla report for that.
The worst of the VM performance problems of this nature are fixed in CVS, but
are not enabled on all builds as we are still chasing other VM problems. Expect
a VM balancing test kernel build tonight or tomorrow for beta testers to try.
"0-order allocation failed" is informative, but the system doesn't fail. Could
we change the debugging level of this message? We're concerned that customers
will see the word "failed" and call for support.
we will soften the level of the warning.
Changed severity from "error condition" to "informational, non significant".
Will show up in syslog, not on the screen.