From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050323 Firefox/1.0.2 Fedora/1.0.2-1.3.1 Description of problem: The system is a 64GB DL585 running dual-core Opterons and RHEL4 U1. The DL585 is set for non-interleave memory. About 62GB of memory is allocated to hugepages. When DB2 starts it allocates about 9GB of hugepages without a problem. However, once a query is started the system goes to about 88% system time and the free hugepages continually drops until all the hugepages are gone and the kernel panics and the machine reboots. If the query is run without hugepages it only consumes about 24GB of memory, so DB2 should not be asking for all the hugepage space. The free non-hugepage memory stays around 1.2GB. This 64GB DL585/DB2 setup didn't have a problem using hugepages when running RHEL4 (no update) on single core Opterons. So I don't suspect a problem with the application side. The panic does not seem to happen when the DL585 is booted with memory interleave enabled. However, the IO is very erratic and exhibits a performance degradation compared with hugepages being disabled. Version-Release number of selected component (if applicable): kernel-2.6.9-11-ELsmp How reproducible: Always Steps to Reproduce: 1. Bring up DB2 with hugepages allocated and enabled in DB2. 2. Kick off a long running query. 3. Actual Results: The system kernel panics Expected Results: Even if the hugepages are depleted, a user application shouldn't be able to crash the kernel. Additional info: We have captured a netdump from the crash and it will be added to this bugzilla as an attachment. We are trying to run a RHEL4 U1 test on single core Opteron's to see whether the issue is generic to U1 or related to running dual-core. I suspect it is related to running dual-core.
Created attachment 117192 [details] net dump log from kernel panic
As mentioned previously, when the system is running in interleave mode (no cpu local memory) the kernel doesn't panic. However, the hugepage allocations do behave strangely. When the query is started the number of hugepages free as reported by /proc/meminfo drops down to around 0, then bounces back up, then drops back down, etc. I counted at least two instances where the free hugepages bounced back up, but there could have been more. It eventually stops at a constant non-zero value. When the system is running in non-interleave mode (cpu local memory), the initial drop to 0 of hugepages free is where the kernel panics. There is no bounce back up.
We have now had similar panics happen when the system is configured with hardware memory interleave on. So that setting does not alleviate the problem as I previously reported. The panic has also now occurred just bringing up DB2 before any queries have been started.
As this appears to be a fairly urgent support request, please either contact Red Hat's Technical Support line at 888-GO-REDHAT or file a web ticket at http://www.redhat.com/apps/support/. Bugzilla is not an official support channel, has no response guarantees, and may not route your issue to the correct area to assist you. Using the official support channels above will guarantee that your issue is handled appropriately and routed to the individual or group which can best assist you with this issue and will also allow Red Hat to track the issue, ensuring that any applicable bug fix is included in all releases and is not dropped from a future update or major release.
Andy, can you see if this still crashes with the RHEL4-U2 beta kernel? There were several dual core bugfixes in that update and it seems to be centeral to this bug. Thanks, Larry Woodman
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.