Description of problem:
We are have developed a Java application server. We have noticed that on Linux
OS our application hangs for few seconds (~12 seconds ). After this short
hang period, application continue to work as usual.
Our application is a multi threaded server application. We noticed that
randomly the whole java application hangs, i.e. all threads including GC
thread for several seconds.
We have got GC logging enabled but there is no evidence of GC even taking that
The same application works fine on Solaris and Windows machines.
Will you please suggest if there are any known issues or any tools available
to trace down the cause of this hangs issue.
We have tried running gdb and catching bt output below:
0x0097b7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#0 0x0097b7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00be9b16 in pthread_cond_wait@@GLIBC_2.3.2 ()
#2 0x00bea15e in pthread_cond_wait@GLIBC_2.0 () from /lib/tls/libpthread.so.0
#3 0xb7a3b8de in os::Linux::safe_cond_wait ()
#4 0xb7a235d4 in Monitor::wait ()
#5 0xb7adf0fd in Threads::destroy_vm ()
#6 0xb78a2abe in jni_DestroyJavaVM ()
#7 0x0804960d in ?? ()
#8 0xb7bb13c0 in jni_NativeInterface ()
#9 0x0805d424 in ?? ()
#10 0x8d107b02 in ?? ()
#11 0x0805d414 in ?? ()
#12 0xbfffd794 in ?? ()
#13 0x00000fff in ?? ()
#14 0x08059008 in ?? ()
#15 0x00000000 in ?? ()
We have tried running application with
We found that with, LD_ASSUME_KERNEL=2.4.1 the hang issue occur less
frequently but eventually we hit this problem.
We have noticed following links mentioning an issue, with Java application
hangs on some Linux kernels.
Will you please let us know,
- if it is a known issue
- and and if it has been fixed then in which release?
- is there a work around?
Please reply as soon as possible, we provide a product wiith high
availability, which means applition not responding for more >10 seconds is not
expectable to our customers.
Version-Release number of selected component (if applicable):
Linux 2.6.9-42.0.2.ELsmp #1 SMP Thu Aug 17 18:00:32 EDT 2006 i686 i686 i386
With our SIP A/S application it can be reproduced in ~1 Hrs, with following GC
/usr/jdk1.5.0_07/bin/java -server -XX:NewSize=32m -XX:MaxNewSize=32m -
XX:+DisableExplicitGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -
Xloggc:log_gc.log -Xms512m -Xmx512m
Steps to Reproduce:
At the moment I do not have a small test application to help you with
replicating this issue. We have noticed this issue while testing our
please let us know, if I can provide any further detail on this issue, we
might also be able to provide you access to our application.
Java process hangs for ~12 seconds
Java application should not hang for no aparent reason.
Linux gbcdff1273m.eu.ubiquity.net 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:27:17
EDT 2006 i686 i686 i386 GNU/Linux
Red Hat Enterprise Linux ES release 4 (Nahant Update 3)
I would also like to mention that we mintored GC logs closely and have noticed
for the pause duration (~12 sec), there was no GC activity being logged.
Whereas with the GC setting we were using we were expecting some GC activity
going almost every second.
Below is the extract from GC log for the duration we noticed pause, the
complete GC log is attached:
33290.753: [GC 33290.753: [ParNew: 32619K->0K(32704K), 0.0559830 secs] 305364K-
>275688K(524224K), 0.0564280 secs]
33302.175: [GC 33302.176: [ParNew: 32640K->0K(32704K), 0.0793950 secs] 308328K-
>278783K(524224K), 0.0804620 secs]
Created attachment 149556 [details]
GC log outout for the test
The attache logs contains GC output, you will see that GC activity is being
logged almost every second, but between time stamp 3290.753 and 33302.175 there
is nothing logged, which make us think that even GC thread is not being
executed when application Hang.
Created attachment 149678 [details]
I also run strace, the total output is > 1 GB.
However attached is the output from when my script identified that java process
is not responding for last 4 seconds.
I used following command from the script:
strace -p$PID -f -ttt -o strace.log
I have no experience in parsing strace file, but I am concern that following
line apparse to be the last line for process 23491.
The java command was running with 23490 PID, and looking at other lines for
process 23490, I assume that 23491 is immediate child thread of process 23490.
In the whole 1G output, I do not see “futex resumed” call since following
If required, I can provide you access to FTP server to obtain the give you
remaining output of strace command.
23491 1173433909.823682 futex(0x900f9c0c, FUTEX_WAIT, 1, NULL <unfinished ...>
As we continue to test our application on diferrent RedHat versions, we
noticed that our RHEL 4 test machine was using Using Update 3.
Whenn we upgraded our box for Update 4, we noticed that test is runnign with
no hang issue for last three days..
We are running other tests in parllel to verify our results, but will you
please advise us on changes done in Update 4, what exactly do you think should
have helped us in the issue we were seeing.
As I mentioned earlier, we have noticed that the test we run on RHEL AS4
Update 4 is not showing this issue.
As we are working closely with our client, we need to understand the actual
cause and we need to be sure that on what specific platform issue does not
From experience of other Java/Redhat users encountering same problem, would
you recommend RHEL AS4 Update 4 or any other LINUX installation as a confirm
PLEASE COMMENT ASAP.
(In reply to comment #5)
> As I mentioned earlier, we have noticed that the test we run on RHEL AS4
> Update 4 is not showing this issue.
> As we are working closely with our client, we need to understand the actual
> cause and we need to be sure that on what specific platform issue does not
> From experience of other Java/Redhat users encountering same problem, would
> you recommend RHEL AS4 Update 4 or any other LINUX installation as a confirm
Yes, use the latest java-1.5.0-ibm packages from RHN.
(In reply to comment #6)
> (In reply to comment #5)
> > From experience of other Java/Redhat users encountering same problem, would
> > you recommend RHEL AS4 Update 4 or any other LINUX installation as a confirm
> > solution?
> Yes, use the latest java-1.5.0-ibm packages from RHN.
And ensure that your system is up-to-date. The fact that this was fixed by
updating from RHEL-4.3 to RHEL-4.4 suggests that this was not caused by the JVM
alone, but by some interaction between it and the underlying system.
I'd suggest re-testing your application against RHEL-4.5 Beta, and re-opening
this bug if the problem re-appears.
For the recommandation to use java-1.5.0-ibm from the RHN; is there a known
problem (or previous bug) with the Sun JVM?
For the recommendation to use java-1.5.0-ibm from the RHN; is there a known
problem (or previous bug) with the Sun JVM?
I apologise for the incorrect component associated with this bug, at the time
I raised that defect, I did not know what component to select, and bugzilla
didn't allow me to raise a defect without component. Hence I end up selecting
We are not using IBM java, we are using SUN JDK (build 1.5.0_07-b03).
(In reply to comment #9)
> Hi Thomas,
> For the recommendation to use java-1.5.0-ibm from the RHN; is there a known
> problem (or previous bug) with the Sun JVM?
I'm not aware of one. We currently only ship IBM and BEA's JVMs on the RHEL
Unfortunately, we do not have a support agreement in place with Sun for their
JVM. This makes it distinctly challenging to resolve issues like this one.
We do have support agreements in place with IBM and BEA for their JVMs. This
gives us a mechanism for addressing problems and for escalating problems if they
prove to be in the JVM.
We would really need a reproducer to determine what the underlying problem is.
Since one of our recommendations is to see if the problem exists on the latest
shipping version of RHEL, it sounds like you are already making progress.
Is there an issue with using RHEL 4.4? Based on your problem description, it
sounds like stress testing on RHEL 4.4 would be the logical next step.
Thanks for your recommendation,
We did see failures with RHEL 4.4 on multiprocessor machines. However it seems
to be more stable on single processor with HT enabled.
While trying to narrow down the cause of the issue I have filter the
strace.log for all messages for "mprotect" call. I noticed following block of
messages, where an mprotect call is being made at '1173434782.998688' but not
ressumed until at least 1173434791.370265, this is an ~8 seconds gape. Would
you think that cause java hang issue?
Attached is the file containing complete file filtering mprotact calls from
the strace output.
23494 1173434782.927869 <... mprotect resumed> ) = 0
23494 1173434782.998688 mprotect(0xb7f84000, 4096, PROT_READ) = 0
23494 1173434785.680847 mprotect(0xb7f83000, 4096, PROT_READ) = 0
23494 1173434785.680966 mprotect(0xb7f83000, 4096,
PROT_READ|PROT_WRITE|PROT_EXEC) = 0
23494 1173434785.681088 mprotect(0xb7f84000, 4096, PROT_NONE) = 0
23494 1173434785.736099 mprotect(0xb7f84000, 4096, PROT_READ) = 0
23494 1173434788.649178 mprotect(0xb7f83000, 4096, PROT_READ) = 0
23494 1173434788.649297 mprotect(0xb7f83000, 4096,
PROT_READ|PROT_WRITE|PROT_EXEC) = 0
23494 1173434788.649423 mprotect(0xb7f84000, 4096, PROT_NONE) = 0
23494 1173434788.706982 mprotect(0xb7f84000, 4096, PROT_READ) = 0
23494 1173434791.370106 mprotect(0xb7f83000, 4096, PROT_READ <unfinished ...>
23494 1173434791.370265 <... mprotect resumed> ) = 0
Created attachment 151261 [details]
This file contains filter for mprotect calls from complete strace output.