127897 – Having memlock set too low can hang a system using hugetlb/tmpfs for a large Oracle SGA

Bug 127897 - Having memlock set too low can hang a system using hugetlb/tmpfs for a large Oracle SGA

Summary: Having memlock set too low can hang a system using hugetlb/tmpfs for a large ...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Larry Woodman
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-07-15 01:16 UTC by John Caruso
Modified:	2007-11-30 22:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-10-19 19:22:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description John Caruso 2004-07-15 01:16:31 UTC

Description of problem:
On a RHEL3 system with the hugemem kernel and running Oracle 9i 
(9.2.0.5), we're attempting to set up a database with a ~6GB SGA, 
comprising about 2.1GB of shared pool (allocated from hugepages) and 
4GB of database cache (allocated from /dev/shm, which is mounted as 
tmpfs).

The database starts successfully and users can make connections 
successfully via SQL*Net.  However, if a user signs on to the system 
in question and tries to run sqlplus directly and their memlock 
(ulimit -l) setting is "too low", the entire system will hang as soon 
as they enter a password.  At this point no further logins are 
possible, even on the console.  I put "too low" in quotes because 
it's not clear how high the value has to be; in my testing values 
down to 1000000 were ok, but 500000 caused the system to hang.  
Setting it to 4 (the default) hangs it every time.

We discovered this bug because of bug 113335 (apparently ignored 
since it was filed), which causes a local user's memlock settings as 
specified in limits.conf to be ignored when they signed on via ssh.

To trigger this bug it's also required to 
set "use_indirect_data_buffers = true" in the init.ora file for the 
database (which causes Oracle to use /dev/shm rather than process 
memory for that portion of the SGA).


Version-Release number of selected component (if applicable):
kernel-hugemem-2.4.21-15.0.3.EL


How reproducible:
See above.


Steps to Reproduce:
1. See above.

  
Actual results:
Systems hangs.


Expected results:
System operates normally, or fails gracefully.


Additional info:
I should add that if the shared portion of the SGA is bumped up to 
about 2.5GB and hugetlb is in use, it's not even necessary to use 
sqlplus to hang the system--the system will hang as soon as the 
database is started.  So among its other problems, hugetlb causes 
erratic behavior as process memory usage approaches the 2.7GB 
boundary.  This seems like a distinct bug to me (though it may be 
related), but I won't be filing it separately.

In addition to being a serious stability issue for Oracle 9i 
installations using large SGAs, this bug also allows any local user 
of such a system to hang that system, so it's potentially a serious 
denial of service attack as well.  However, I think it's much more 
likely to happen through happenstance than it is to be used as an 
attack vector.

The workaround so far is just to disable hugetlb.  Given the issues 
with hugetlb noted in bug 127896, I have to say that it appears that 
the hugetlb implementation in RHEL3 is very unstable and should be 
avoided like the plague.

Comment 1 Rik van Riel 2004-07-15 02:45:46 UTC

How much physical memory does your server have ?

Comment 2 John Caruso 2004-07-15 03:32:24 UTC

Sorry: 8GB.

Comment 3 Larry Woodman 2004-07-28 17:47:39 UTC

Martin, can try to reproduce this bug in our lab as soon as yuo get a
chance?

Thanks, Larry

Comment 4 RHEL Program Management 2007-10-19 19:22:40 UTC

This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.