Bug 164385 - Kernel panics when DB2 accesses hugepages in Update 1
Summary: Kernel panics when DB2 accesses hugepages in Update 1
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-07-27 15:45 UTC by Andrew Bond
Modified: 2012-06-20 13:28 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-20 13:28:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
net dump log from kernel panic (20.87 KB, application/x-gzip)
2005-07-27 15:52 UTC, Andrew Bond
no flags Details

Description Andrew Bond 2005-07-27 15:45:52 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050323 Firefox/1.0.2 Fedora/1.0.2-1.3.1

Description of problem:
The system is a 64GB DL585 running dual-core Opterons and RHEL4 U1.  The DL585 is set for non-interleave memory.  About 62GB of memory is allocated to hugepages.  When DB2 starts it allocates about 9GB of hugepages without a problem.  However, once a query is started the system goes to about 88% system time and the free hugepages continually drops until all the hugepages are gone and the kernel panics and the machine reboots.  If the query is run without hugepages it only consumes about 24GB of memory, so DB2 should not be asking for all the hugepage space.  The free non-hugepage memory stays around 1.2GB.

This 64GB DL585/DB2 setup didn't have a problem using hugepages when running RHEL4   (no update) on single core Opterons.  So I don't suspect a problem with the application side.

The panic does not seem to happen when the DL585 is booted with memory interleave enabled.  However, the IO is very erratic and exhibits a performance degradation compared with hugepages being disabled.

Version-Release number of selected component (if applicable):
kernel-2.6.9-11-ELsmp

How reproducible:
Always

Steps to Reproduce:
1. Bring up DB2 with hugepages allocated and enabled in DB2.
2. Kick off a long running query.
3.
  

Actual Results:  The system kernel panics

Expected Results:  Even if the hugepages are depleted, a user application shouldn't be able to crash the kernel.

Additional info:

We have captured a netdump from the crash and it will be added to this bugzilla as an attachment.

We are trying to run a RHEL4 U1 test on single core Opteron's to see whether the issue is generic to U1 or related to running dual-core.  I suspect it is related to running dual-core.

Comment 1 Andrew Bond 2005-07-27 15:52:29 UTC
Created attachment 117192 [details]
net dump log from kernel panic

Comment 2 Andrew Bond 2005-07-27 16:09:44 UTC
As mentioned previously, when the system is running in interleave mode (no cpu
local memory) the kernel doesn't panic.  However, the hugepage allocations do
behave strangely.  When the query is started the number of hugepages free as
reported by /proc/meminfo drops down to around 0, then bounces back up, then
drops back down, etc.  I counted at least two instances where the free hugepages
bounced back up, but there could have been more.  It eventually stops at a
constant non-zero value.

When the system is running in non-interleave mode (cpu local memory), the
initial drop to 0 of hugepages free is where the kernel panics.  There is no
bounce back up.

Comment 3 Andrew Bond 2005-07-28 20:30:55 UTC
We have now had similar panics happen when the system is configured with
hardware memory interleave on.  So that setting does not alleviate the problem
as I previously reported.

The panic has also now occurred just bringing up DB2 before any queries have
been started.

Comment 4 Suzanne Hillman 2005-08-01 19:09:15 UTC
As this appears to be a fairly urgent support request, please either contact Red
Hat's Technical Support line at 888-GO-REDHAT or file a web ticket at
http://www.redhat.com/apps/support/.  Bugzilla is not an official support
channel, has no response guarantees, and may not route your issue to the correct
area to assist you.  Using the official support channels above will guarantee
that your issue is handled appropriately and routed to the individual or group
which can best assist you with this issue and will also allow Red Hat to track
the issue, ensuring that any applicable bug fix is included in all releases and
is not dropped from a future update or major release.

Comment 5 Larry Woodman 2005-08-16 20:11:21 UTC
Andy, can you see if this still crashes with the RHEL4-U2 beta kernel?  There
were several dual core bugfixes in that update and it seems to be centeral to
this bug.

Thanks, Larry Woodman


Comment 7 Jiri Pallich 2012-06-20 13:28:16 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.


Note You need to log in before you can comment on or make changes to this bug.