Red Hat Bugzilla – Bug 127896
Using hugetlb causes massive slowdowns with ramfs
Last modified: 2007-11-30 17:07:02 EST
Description of problem:
On a system with 8GB of RAM I'm allocating 2.5GB for hugepages. When
I subsequently try to create a large file in /dev/shm (mounted as
ramfs), the file will grow to 1.6-2GB relatively quickly (within 10-
20 seconds), but after the file reaches that size the growth rate
slows to 20MB/min (!?!).
Version-Release number of selected component (if applicable):
echo 2560 > /proc/sys/vm/hugetlb_pool
mount -t ramfs none /dev/shm
cat /dev/zero > /dev/shm/grassgrowsfasterthanthis
Steps to Reproduce:
1. See above.
Sure is slow.
Wish it were faster.
Note that this problem does *not* occur if /dev/shm is mounted as
tmpfs rather than ramfs (with everything else the same as I've
described it above). It's just a problem with ramfs+hugetlb.
Also, this is not just a theoretical situation; it's part of an
Oracle configuration in which the shared portions of the SGA will be
allocated out of hugepages, and the buffer cache will be allocated
out of /dev/shm (which Oracle recommends mounting as ramfs, so that
it's locked into memory). The main workaround I've found so far is
just to avoid hugetlb--which is also good because hugetlb can cause
extreme system instability with large Oracle SGA configurations (I'm
about to file a bug for that as well).
Looking at page_alloc.c, it appears that there could be a problem when
all the sum of hugetlbfs + ramfs pages are larger than what fits in
the first tried zone memory is allocated from, in your case, the
The RHEL3 kernel tries to fit just over 4GB of data into a 4GB size
highmem zone and has trouble fitting things.
I'll try to come up with an experimental patch to alleviate this
John, I think the slowdown you are seeing is caused by a combination
of the hugemem kernel, allocating 2.5GB for hugetlbfs and the fact
that ramfs sets the GFP_WIRED flag the inode->i_mapping->gfp_mask.
This causes the system to attempt to reclaim highmem pages for ramfs
because you have overcommited highmem between the hugetlb pages and
The first thing to do is get me an "AltSysrq M" output when you notice
the ramfs allocation slowdown. Next, please try running the smp
kernel instead of the hugemem kernel. Why are you running the hugemem
kernel in an 8GB system in the first place, have you seen other lowmem
issues with the standard smp kernel? Finally, I am re-evaluating
whether the GFP_WIRED should only be set in the ramfs inode for the
smp kernel since lowmem exhaustion is not nearly as much of an issue
for the hugemem kernel as it is for the smp kernel.
Have you tried reproducing this yourself? The method I mentioned is
pretty straightfoward (though the values might require tweaking
depending on your memory configuration), and so I was intending that
y'all at RedHat could test this yourselves. I don't have a system
that's readily available for such testing anymore.
You may be right about highmem being overcommitted: the system
reports a HighTotal of 4.5GB (LowTotal=3.4GB), so 2.5GB for hugetlb
plus the 1.6GB file in ramfs is close to that. That does suggest how
you could test it on a machine with a different memory size, I
suppose. I'm deeply dismayed to learn that the old lowmem/highmem
distinction is still around and that we can't just treat 8GB of
memory as 8GB of memory. Is there any documentation that indicates
the actual limitations on the use of hugetlb, ramfs/shmfs, etc in
terms of highmem/lowmem and all other relevant factors? I get the
feeling that we're one of the first sites even trying this kind of
configuration, and we're having to make our way through the dark to
We're using the hugemem kernel mainly to get the 4/4 memory split, to
allow as much memory as possible for the portions of the Oracle SGA
that have to reside within process memory (i.e., those portions that
can't go into shmfs/ramfs and be accessed indirectly). I've yet to
find a thorough, detailed explanation of the differences between the
various kernel choices, though.
John, yes I did test myself and saw quite a but of variation in the
slowdown and system responsiveness. Anyway, as far as the "old
lowmem/highmem distinction" is concerned yes we still have it and no
you cant tread 8MB as 8MB until you start using a 64 bit computer.
The origional design of the Linux kernel was to map the shared kernel
address space, including physical memory in the upper 1GB of the 4GB
user address space. Once we started supporting more than 1GB of
physical memory it couldnt all be mapped into that 1GB shared kernel
address space, hence the lowmem/highmem distinction. With the advent
of more that 4GB of physical memory in a 32 bit system(PAE) we created
a separate kernel address space(hugemem) kernel. However, since we
support more than 4GB of physical memory it still can not all be
makked into the kernel address space at the same time so we still have
a lowmem/highmem distinction even in the hugemem kernel when you have
more than 4GB of memory. You will continue to have this distinction
until you switch to 64 bit hardware(EMT64, AMD64, IPF, etc).
In the mean time, I think this problem has been fixed by falling back
to the lowmem zone when allocating wired memory(ramfs and hugepages)
and the highmem is more than 90% wired. The test kernel with this fix
is located here:
Thanks, Larry Woodman
The URL you specified is giving a 404 error.
Thanks for the (apparent) fix. I'd ask again, though: is there some
document--anything at all--that details all of the constraints on or
considerations involved in actually using the various large memory
features of RHEL3? We've run into nothing but problems in attempting
to do so (e.g. bug 127897, among other issues we haven't reported),
even though the configuration we're using is well within the apparent
John, I just re-copied the kernel so it should work now(sorry about
the 404 error, disquota limits problem).
A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.4.EL).
An errata has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.