Bug 128729

Summary: Large java processes core dump on hugemem kernel
Product: Red Hat Enterprise Linux 3 Reporter: Richard Homolka <richard.homolka>
Component: kernelAssignee: Dave Anderson <anderson>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: kevins, mingo, petrides, riel
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-11-08 19:25:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Richard Homolka 2004-07-28 19:04:35 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.1)
Gecko/20040707

Description of problem:
We have large weblogic 8.1sp2 and 8.1sp3 processes that occasionally
but regularly crash and core dump on the hugemem kernel.  This happens
under Sun JVMs 1.4.2_04, 1.4.2_05, and jrockit 8.2.  BEA tech support
has found a similar case and referred us to the "[PATCH] bogus
sigaltstack calls by rt_sigreturn" patch recently accepted into 2.6.7.
 The client customer performed this patch (modified slightly because
of the differences in the code in 2.4AS3.0 kernel and stock 2.6) and
the problems seem to have gone away (no further tech support requests
to BEA).

Version-Release number of selected component (if applicable):
kernel-2.4.21-15.EL

How reproducible:
Sometimes

Steps to Reproduce:
1. Run large java apps on hugemem kernel with large amounts of RAM (we
have 12Gb in affected systems)
2. wait
3.
    

Actual Results:  after a while, apps would core dump, with large call
stacks

Expected Results:  app should run without issue

Additional info:

Happens 12Gb machines, and 36Gb machines.

Comment 1 Kevin Stussman 2004-08-02 19:41:23 UTC
Like to add myself to this. Our weblogic server crashes "every so often" for no apparent 
reason leaving a large core file.

OS : 2.4.21-15.ELhugemem
MemTotal:      5931956 kB
GDB : 
Core was generated by `/usr/local/bea/jdk141_05/bin/java'.
Program terminated with signal 11, Segmentation fault.

Comment 2 Kevin Stussman 2004-08-02 21:02:07 UTC
FYI, here is the patch for those interested.

http://linux.bkbits.net:8080/linux-2.6/gnupatch@40b035f1haADUZ5Ujxb0PPoxPYHX_g

We will try to move off of the hugemem kernel before trying this patch. (fortunately that 
is an option for us right now)

Comment 3 Kevin Stussman 2004-08-06 16:14:54 UTC
After reading the patch and this bug, I felt pretty sure that our problem was related to 
this hugemem kernel problem, but after switching to the smp kernel...the problem is still 
happening. The only noticible differences are now the stack trace produces this 
message:

# HotSpot Virtual Machine Error, Internal Error
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# Java VM: Java HotSpot(TM) Client VM (1.4.1_05-b01 mixed mode)
#
# Error ID: 43113F32554E54494D45110E4350500305
#
# Problematic Thread: prio=1 tid=0x0x83f6028 nid=0x7a0 runnable
#

and the crashes now happen at a regular time (around midnight) instead of at random 
times.

Comment 4 Richard Homolka 2004-11-08 18:51:19 UTC
Seems closed.  There is a fix to the AS hugemem kernel as of kernel
update 18 (the beta for U3).  The U3 kernel has solved our problems.

Comment 5 Ernie Petrides 2004-11-08 19:25:00 UTC
Closing per above comment (seems to be dup of bug 123253).

*** This bug has been marked as a duplicate of 123253 ***