From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050323 Firefox/1.0.2 Fedora/1.0.2-1.3.1
Description of problem:
The system is a 64GB DL585 running dual-core Opterons and RHEL4 U1. The DL585 is set for non-interleave memory. About 62GB of memory is allocated to hugepages. When DB2 starts it allocates about 9GB of hugepages without a problem. However, once a query is started the system goes to about 88% system time and the free hugepages continually drops until all the hugepages are gone and the kernel panics and the machine reboots. If the query is run without hugepages it only consumes about 24GB of memory, so DB2 should not be asking for all the hugepage space. The free non-hugepage memory stays around 1.2GB.
This 64GB DL585/DB2 setup didn't have a problem using hugepages when running RHEL4 (no update) on single core Opterons. So I don't suspect a problem with the application side.
The panic does not seem to happen when the DL585 is booted with memory interleave enabled. However, the IO is very erratic and exhibits a performance degradation compared with hugepages being disabled.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Bring up DB2 with hugepages allocated and enabled in DB2.
2. Kick off a long running query.
Actual Results: The system kernel panics
Expected Results: Even if the hugepages are depleted, a user application shouldn't be able to crash the kernel.
We have captured a netdump from the crash and it will be added to this bugzilla as an attachment.
We are trying to run a RHEL4 U1 test on single core Opteron's to see whether the issue is generic to U1 or related to running dual-core. I suspect it is related to running dual-core.
Created attachment 117192 [details]
net dump log from kernel panic
As mentioned previously, when the system is running in interleave mode (no cpu
local memory) the kernel doesn't panic. However, the hugepage allocations do
behave strangely. When the query is started the number of hugepages free as
reported by /proc/meminfo drops down to around 0, then bounces back up, then
drops back down, etc. I counted at least two instances where the free hugepages
bounced back up, but there could have been more. It eventually stops at a
constant non-zero value.
When the system is running in non-interleave mode (cpu local memory), the
initial drop to 0 of hugepages free is where the kernel panics. There is no
bounce back up.
We have now had similar panics happen when the system is configured with
hardware memory interleave on. So that setting does not alleviate the problem
as I previously reported.
The panic has also now occurred just bringing up DB2 before any queries have
As this appears to be a fairly urgent support request, please either contact Red
Hat's Technical Support line at 888-GO-REDHAT or file a web ticket at
http://www.redhat.com/apps/support/. Bugzilla is not an official support
channel, has no response guarantees, and may not route your issue to the correct
area to assist you. Using the official support channels above will guarantee
that your issue is handled appropriately and routed to the individual or group
which can best assist you with this issue and will also allow Red Hat to track
the issue, ensuring that any applicable bug fix is included in all releases and
is not dropped from a future update or major release.
Andy, can you see if this still crashes with the RHEL4-U2 beta kernel? There
were several dual core bugfixes in that update and it seems to be centeral to
Thanks, Larry Woodman
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life.
Please See https://access.redhat.com/support/policy/updates/errata/
If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.