Hide Forgot
Configuring a large number of hugepages using the kernel boot parameter "hugepages=", causes the kernel to not boot with Out of Memory errors. On a 64GB system, adding "hugepages=28160" to the kernel boot line (asks for 55GB of hugepages) causes the boot to fail. 9GB of free memory should be plenty of memory for the kernel to boot. The same 55GB allocation of hugepages done interactively on the command line "echo 28160 > /proc/sys/vm/nr_hugepages" works without problem. Adding the above interactive command to rc.local causes the command to fail. It appears that during the boot process memory in a 64GB x86_64 environment is configured differently than it is once the system is fully booted. It looks like the hugepage allocation problem is numa related. Disabling numa features with "numa=off" on the kernel boot line gets rid of the problem, however performance suffers.
BTW, this is RH EL 3 Update 3
This suggests that the pre-allocation of hugepages should also be made NUMA-aware. Will investigate.
*** Bug 130489 has been marked as a duplicate of this bug. ***
It has been decided that x86_64 RHEL3 kernels should continue to enable NUMA by default. However, if an OOM kill occurs on a NUMA system, an extra message will be printed by the kernel suggesting that using the "numa=off" boot option might be a good way to work around the issue. The exact message is: OOM kill occurred on an x86_64 NUMA system! The numa=off boot option might help avoid this. This change was committed to the RHEL3 U5 patch pool on 9-Feb-2005 (in kernel version 2.4.21-27.12.EL).
A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.12.EL). To enable an improved NUMA-friendly page allocation policy, please set /proc/sys/vm/numa_memory_allocator via the "sysctl" command (or put "vm.numa_memory_allocator = 1" in /etc/sysctl.conf).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html