Bug 163699

Summary: [RHEL3] JVM Crash
Product: Red Hat Enterprise Linux 3 Reporter: Issue Tracker <tao>
Component: kernelAssignee: Ingo Molnar <mingo>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: lwoodman, petrides, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-08-23 16:53:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Issue Tracker 2005-07-20 14:31:06 UTC
Escalated to Bugzilla from IssueTracker

Comment 4 Ernie Petrides 2005-07-22 01:54:42 UTC
Please report the results you get with the RHEL3 U5 kernel (2.4.21-32.EL)
or later.  Thanks.


Comment 6 Johnray Fuller 2005-07-22 21:23:41 UTC
(In reply to comment #4)
> Please report the results you get with the RHEL3 U5 kernel (2.4.21-32.EL)
> or later.  Thanks.
> 

Ernie, I have test on both the -15 && the 32.0.1 kernels with the same results:

(gdb) info reg
eax            0x0      0
ecx            0x2387   9095
edx            0x6      6
ebx            0x2387   9095
esp            0xbfffd160       0xbfffd160
ebp            0xbfffd170       0xbfffd170
esi            0x2387   9095
edi            0xb75a6a98       -1218811240
eip            0xb7499c0f       0xb7499c0f
eflags         0x206    518
cs             0x23     35
ss             0x2b     43
ds             0xc03f002b       -1069613013
es             0x2b     43
fs             0xfff7   65527
gs             0x33     51


Comment 8 Ernie Petrides 2005-07-22 22:17:25 UTC
Thanks for the feedback, Johnray.  I do recall a potentially relevant
data corruptor fix that we released in 2.4.21-32.0.1.EL, but since you
still encountered the failures with that kernel, this must be the result
of some other problem.

Moving to ASSIGNED state for Ingo to work on.


Comment 12 Stan Cox 2005-08-10 21:22:00 UTC
On rel3 I was able to reproduce the suspect value reported by gdb by:
(silly.c and Test.java are attached to issue 74303)
/usr/lib/jvm/java-1.4.2-ibm-1.4.2.1/bin/javac Test.java
gcc silly.c -o silly
/usr/lib/jvm/java-1.4.2-ibm-1.4.2.1/bin/java Test 1
kill -6 N # the above
gdb -c core.N /usr/lib/jvm/java-1.4.2-ibm-1.4.2.1/bin/java
(gdb) info reg ds
ds             0xc03f002b       -1069613013

Worked okay on rel4
My suspician was that the value for ds was incorrect in the corefile but the
cause of the problem was elsewhere since ds is a 16 bit register.
However his gdb session is puzzling (assuming it is as reported):
0xb6f488a3 <copy+19>: mov    (%eax),%eax // eax SHOULD BE 1 (ds based load)
0xb6f488a5 <copy+21>: mov    %eax,0xffffffc0(%ebp) // NO!!! eax contains 0x33343d6e

Comment 13 Stan Cox 2005-08-11 18:23:26 UTC
The corefile results that Johnray and I noticed were on a R3U2 system.  I just
tried on R3U4 and it worked okay.

Comment 17 Ernie Petrides 2005-08-22 19:43:00 UTC
Regarding comment #6, which indicates that failures have been reproduced
on the latest released RHEL3 kernel (3 months ago), i.e., 2.4.21-32.0.1.EL,
I'd like to reiterate that Larry fixed a pte_clear() race in that kernel
that was causing very similar symptoms.

Could someone please reconfirm that the 2.4.21-32.0.1.EL kernel still
exhibits the problem reported in this bugzilla?

Thanks in advance.  -ernie


Comment 21 Ernie Petrides 2005-08-23 16:53:12 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-472.html


*** This bug has been marked as a duplicate of 141394 ***