Bug 592433

Summary: #1966794 Java synchronization code locks on RHEL5.4
Product: Red Hat Enterprise Linux 5 Reporter: Alan Matsuoka <alanm>
Component: java-1.6.0-openjdkAssignee: Andrew Haley <aph>
Status: CLOSED ERRATA QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.4CC: aph, dbhole, jwest, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-13 16:44:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
hung-jstack-voltdb.txt
none
LBDLockPatternTest.java
none
lbd_lock_test.zip
none
sosreport-client-SR1966794.tar.bz2 none

Description Alan Matsuoka 2010-05-14 20:28:42 UTC
Description of problem:
Client's java software running RHEL5.3/5.4(x64) on our Dell R610 (2x Xeon 5500) hangs sometimes. They haven't been able to reproduce the issue on distributions with newer kernels like Fedora 10/11 or Ubuntu 9.04 Server. It does reproduce on Centos 5.3/5.4
Client has given test code that reproduces the deadlock sometimes.Usually it takes 5 minues to hang, or not at all.
Client has provided jstack reports showing the deadlocked case and the dmesg output. Client has installed Openjdk jvm - 'java-1.6.0-openjdk-1.6.0.0-1.2.b09.el5-x86_64'. He has also provided core dump of the hung process.

How reproducible:
Not always

Steps to Reproduce:
Compile attached java code
$ javac -g LBDLockPatternTest.java
run it
$ java LBDLockPatternTest
If not hung in 10 minutes , kill the process and run again.
sometimes it hangs with stopping printing on screen.

Hi,
  I tried to reproduce it on our internal server ibm-x3650m2-1.gsslab.rdu.redhat.com with rhel5.4 on 16x Xeon processor. The code gets hang 1 out of 10 times. I have taken sosreport and twice crash dumps of hang process on that system. I have installed glibc-debuginfo and java-1.6.0-openjdk-debuginfo packages to get more debug information. This process did not hang on rhel5.4 server with X2 Athlon when tried 20-30 times.

  Backtrace shows running thread is stuck at
#0  0x000000368720ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b0cfa6a5c35 in Monitor::wait (this=0x21799c0, no_safepoint_check=false, timeout=0,as_suspend_equivalent=false)
   at /usr/src/debug/icedtea6-1.2/openjdk/hotspot/src/os/linux/vm/os_linux.hpp:279
#2  0x00002b0cfa7fb53b in VMThread::execute (op=0x41ad7870)
   at /usr/src/debug/icedtea6-1.2/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:587
#3  0x00002b0cfa6d0475 in ParallelScavengeHeap::mem_allocate (this=0x2181db0, size=4, is_noref=<value optimized out>, is_tlab=false, gc_overhead_limit_was_exceeded=0x41ad7947)
   at /usr/src/debug/icedtea6-1.2/openjdk/hotspot/src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.cpp:443
#4  0x00002b0cfa4f3d0f in instanceKlass::allocate_instance (this=<value optimized out>, __the_thread__=0x217c400)
   at /usr/src/debug/icedtea6-1.2/openjdk/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:124
#5  0x00002b0cfa724009 in OptoRuntime::new_instance_C (klass=0x2aaaae8c9a88, thread=<value optimized out>)
   at /usr/src/debug/icedtea6-1.2/openjdk/hotspot/src/share/vm/opto/runtime.cpp:173
#6  0x00002aaaab58a307 in ?? ()
#7  0x000000000217c400 in ?? ()
#8  0x00002b0cfa4f3b4e in instanceKlass::allocate_instance (this=<value optimized out>, __the_thread__=0x2aaab39f0198)
   at /usr/src/debug/icedtea6-1.2/openjdk/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:216
#9  0x00002aaaae8c9a88 in ?? ()
#10 0x0000000000000000 in ?? ()

Attached Client sosreport. client's core dump is on ftp://dropbox.redhat.com at  /incoming/core-SR1966794-client.tar.gz
Core dump on ibm-x3650m2-1.gsslab.rdu.redhat.com is on ftp://dropbox.redhat.com at  /incoming/core-SR1966794-ibm-x3650m2-1.tar.gz  /incoming/sosreport-ibm-x3650m2-1-SR1966794.tar.bz2

-----------------------------------------------------------------------------------------------------------------
client had talk with other vendors.

The Sun/Oracle JDK engineers have confirmed this is a bug in the JDK:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=2185608
It has been fixed in the recent Official Oracle/Sun Java JDK 1.6u18:
http://java.sun.com/javase/6/webnotes/6u18.html   defect 6822370

Client verified that the fix is NOT in the latest OpenJDK release and is not yet on the trunk of the OpenJDK source. OpenJDK mailing list people told it will make it into b19, whenever that comes out.

Now, client is able to run our software on RHEL5 with the latest release of the official JDK, but not with OpenJDK

Comment 1 Alan Matsuoka 2010-05-14 20:30:19 UTC
Created attachment 414160 [details]
hung-jstack-voltdb.txt

Comment 2 Alan Matsuoka 2010-05-14 20:30:47 UTC
Created attachment 414161 [details]
LBDLockPatternTest.java

Comment 3 Alan Matsuoka 2010-05-14 20:31:14 UTC
Created attachment 414162 [details]
lbd_lock_test.zip

Comment 4 Alan Matsuoka 2010-05-14 20:31:55 UTC
Created attachment 414163 [details]
sosreport-client-SR1966794.tar.bz2

Comment 6 Andrew Haley 2010-07-13 16:44:32 UTC
1:1.6.0-1.9.b16 has been pushed to RHEL 5.3, 5,4, and 5.5, and already has this bug fixed.  Please reopen if these is still a problem after updating.