Bug 592433 - #1966794 Java synchronization code locks on RHEL5.4
Summary: #1966794 Java synchronization code locks on RHEL5.4
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: java-1.6.0-openjdk
Version: 5.4
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Andrew Haley
QA Contact: BaseOS QE - Apps
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-14 20:28 UTC by Alan Matsuoka
Modified: 2018-10-27 12:19 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-07-13 16:44:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
hung-jstack-voltdb.txt (14.03 KB, text/plain)
2010-05-14 20:30 UTC, Alan Matsuoka
no flags Details
LBDLockPatternTest.java (1.66 KB, text/x-java)
2010-05-14 20:30 UTC, Alan Matsuoka
no flags Details
lbd_lock_test.zip (8.80 KB, application/zip)
2010-05-14 20:31 UTC, Alan Matsuoka
no flags Details
sosreport-client-SR1966794.tar.bz2 (755.20 KB, application/x-bzip2)
2010-05-14 20:31 UTC, Alan Matsuoka
no flags Details

Description Alan Matsuoka 2010-05-14 20:28:42 UTC
Description of problem:
Client's java software running RHEL5.3/5.4(x64) on our Dell R610 (2x Xeon 5500) hangs sometimes. They haven't been able to reproduce the issue on distributions with newer kernels like Fedora 10/11 or Ubuntu 9.04 Server. It does reproduce on Centos 5.3/5.4
Client has given test code that reproduces the deadlock sometimes.Usually it takes 5 minues to hang, or not at all.
Client has provided jstack reports showing the deadlocked case and the dmesg output. Client has installed Openjdk jvm - 'java-1.6.0-openjdk-1.6.0.0-1.2.b09.el5-x86_64'. He has also provided core dump of the hung process.

How reproducible:
Not always

Steps to Reproduce:
Compile attached java code
$ javac -g LBDLockPatternTest.java
run it
$ java LBDLockPatternTest
If not hung in 10 minutes , kill the process and run again.
sometimes it hangs with stopping printing on screen.

Hi,
  I tried to reproduce it on our internal server ibm-x3650m2-1.gsslab.rdu.redhat.com with rhel5.4 on 16x Xeon processor. The code gets hang 1 out of 10 times. I have taken sosreport and twice crash dumps of hang process on that system. I have installed glibc-debuginfo and java-1.6.0-openjdk-debuginfo packages to get more debug information. This process did not hang on rhel5.4 server with X2 Athlon when tried 20-30 times.

  Backtrace shows running thread is stuck at
#0  0x000000368720ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b0cfa6a5c35 in Monitor::wait (this=0x21799c0, no_safepoint_check=false, timeout=0,as_suspend_equivalent=false)
   at /usr/src/debug/icedtea6-1.2/openjdk/hotspot/src/os/linux/vm/os_linux.hpp:279
#2  0x00002b0cfa7fb53b in VMThread::execute (op=0x41ad7870)
   at /usr/src/debug/icedtea6-1.2/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:587
#3  0x00002b0cfa6d0475 in ParallelScavengeHeap::mem_allocate (this=0x2181db0, size=4, is_noref=<value optimized out>, is_tlab=false, gc_overhead_limit_was_exceeded=0x41ad7947)
   at /usr/src/debug/icedtea6-1.2/openjdk/hotspot/src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.cpp:443
#4  0x00002b0cfa4f3d0f in instanceKlass::allocate_instance (this=<value optimized out>, __the_thread__=0x217c400)
   at /usr/src/debug/icedtea6-1.2/openjdk/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:124
#5  0x00002b0cfa724009 in OptoRuntime::new_instance_C (klass=0x2aaaae8c9a88, thread=<value optimized out>)
   at /usr/src/debug/icedtea6-1.2/openjdk/hotspot/src/share/vm/opto/runtime.cpp:173
#6  0x00002aaaab58a307 in ?? ()
#7  0x000000000217c400 in ?? ()
#8  0x00002b0cfa4f3b4e in instanceKlass::allocate_instance (this=<value optimized out>, __the_thread__=0x2aaab39f0198)
   at /usr/src/debug/icedtea6-1.2/openjdk/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:216
#9  0x00002aaaae8c9a88 in ?? ()
#10 0x0000000000000000 in ?? ()

Attached Client sosreport. client's core dump is on ftp://dropbox.redhat.com at  /incoming/core-SR1966794-client.tar.gz
Core dump on ibm-x3650m2-1.gsslab.rdu.redhat.com is on ftp://dropbox.redhat.com at  /incoming/core-SR1966794-ibm-x3650m2-1.tar.gz  /incoming/sosreport-ibm-x3650m2-1-SR1966794.tar.bz2

-----------------------------------------------------------------------------------------------------------------
client had talk with other vendors.

The Sun/Oracle JDK engineers have confirmed this is a bug in the JDK:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=2185608
It has been fixed in the recent Official Oracle/Sun Java JDK 1.6u18:
http://java.sun.com/javase/6/webnotes/6u18.html   defect 6822370

Client verified that the fix is NOT in the latest OpenJDK release and is not yet on the trunk of the OpenJDK source. OpenJDK mailing list people told it will make it into b19, whenever that comes out.

Now, client is able to run our software on RHEL5 with the latest release of the official JDK, but not with OpenJDK

Comment 1 Alan Matsuoka 2010-05-14 20:30:19 UTC
Created attachment 414160 [details]
hung-jstack-voltdb.txt

Comment 2 Alan Matsuoka 2010-05-14 20:30:47 UTC
Created attachment 414161 [details]
LBDLockPatternTest.java

Comment 3 Alan Matsuoka 2010-05-14 20:31:14 UTC
Created attachment 414162 [details]
lbd_lock_test.zip

Comment 4 Alan Matsuoka 2010-05-14 20:31:55 UTC
Created attachment 414163 [details]
sosreport-client-SR1966794.tar.bz2

Comment 6 Andrew Haley 2010-07-13 16:44:32 UTC
1:1.6.0-1.9.b16 has been pushed to RHEL 5.3, 5,4, and 5.5, and already has this bug fixed.  Please reopen if these is still a problem after updating.


Note You need to log in before you can comment on or make changes to this bug.