Bug 231276
| Summary: | Java application Hangs for few seconds (~12 sec) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Neeta Gupta <ngupta> | ||||||||
| Component: | java-1.5.0-ibm | Assignee: | Thomas Fitzsimmons <fitzsim> | ||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||||
| Severity: | urgent | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 4.0 | CC: | markwiz, soreilly | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | i386 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | 1.5.0.3-1jpp.3.el4 | Doc Type: | Bug Fix | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2007-03-15 20:53:13 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Neeta Gupta
2007-03-07 09:59:19 UTC
I would also like to mention that we mintored GC logs closely and have noticed for the pause duration (~12 sec), there was no GC activity being logged. Whereas with the GC setting we were using we were expecting some GC activity going almost every second. Below is the extract from GC log for the duration we noticed pause, the complete GC log is attached: 33290.753: [GC 33290.753: [ParNew: 32619K->0K(32704K), 0.0559830 secs] 305364K- >275688K(524224K), 0.0564280 secs] 33302.175: [GC 33302.176: [ParNew: 32640K->0K(32704K), 0.0793950 secs] 308328K- >278783K(524224K), 0.0804620 secs] Created attachment 149556 [details]
GC log outout for the test
The attache logs contains GC output, you will see that GC activity is being
logged almost every second, but between time stamp 3290.753 and 33302.175 there
is nothing logged, which make us think that even GC thread is not being
executed when application Hang.
Created attachment 149678 [details]
strace output
I also run strace, the total output is > 1 GB.
However attached is the output from when my script identified that java process
is not responding for last 4 seconds.
I used following command from the script:
strace -p$PID -f -ttt -o strace.log
I have no experience in parsing strace file, but I am concern that following
line apparse to be the last line for process 23491.
The java command was running with 23490 PID, and looking at other lines for
process 23490, I assume that 23491 is immediate child thread of process 23490.
In the whole 1G output, I do not see “futex resumed” call since following
line printed.
If required, I can provide you access to FTP server to obtain the give you
remaining output of strace command.
23491 1173433909.823682 futex(0x900f9c0c, FUTEX_WAIT, 1, NULL <unfinished ...>
As we continue to test our application on diferrent RedHat versions, we noticed that our RHEL 4 test machine was using Using Update 3. Whenn we upgraded our box for Update 4, we noticed that test is runnign with no hang issue for last three days.. We are running other tests in parllel to verify our results, but will you please advise us on changes done in Update 4, what exactly do you think should have helped us in the issue we were seeing. Hi, As I mentioned earlier, we have noticed that the test we run on RHEL AS4 Update 4 is not showing this issue. As we are working closely with our client, we need to understand the actual cause and we need to be sure that on what specific platform issue does not exist. From experience of other Java/Redhat users encountering same problem, would you recommend RHEL AS4 Update 4 or any other LINUX installation as a confirm solution? PLEASE COMMENT ASAP. Thanks, Neeta (In reply to comment #5) > Hi, > > As I mentioned earlier, we have noticed that the test we run on RHEL AS4 > Update 4 is not showing this issue. > > > As we are working closely with our client, we need to understand the actual > cause and we need to be sure that on what specific platform issue does not > exist. > > > From experience of other Java/Redhat users encountering same problem, would > you recommend RHEL AS4 Update 4 or any other LINUX installation as a confirm > solution? Yes, use the latest java-1.5.0-ibm packages from RHN. (In reply to comment #6) > (In reply to comment #5) > > From experience of other Java/Redhat users encountering same problem, would > > you recommend RHEL AS4 Update 4 or any other LINUX installation as a confirm > > solution? > > Yes, use the latest java-1.5.0-ibm packages from RHN. > And ensure that your system is up-to-date. The fact that this was fixed by updating from RHEL-4.3 to RHEL-4.4 suggests that this was not caused by the JVM alone, but by some interaction between it and the underlying system. I'd suggest re-testing your application against RHEL-4.5 Beta, and re-opening this bug if the problem re-appears. Hi Thomas, For the recommandation to use java-1.5.0-ibm from the RHN; is there a known problem (or previous bug) with the Sun JVM? Cheers, Shawn Hi Thomas, For the recommendation to use java-1.5.0-ibm from the RHN; is there a known problem (or previous bug) with the Sun JVM? Cheers, Shawn I apologise for the incorrect component associated with this bug, at the time I raised that defect, I did not know what component to select, and bugzilla didn't allow me to raise a defect without component. Hence I end up selecting IBM java. We are not using IBM java, we are using SUN JDK (build 1.5.0_07-b03). (In reply to comment #9) > Hi Thomas, > > For the recommendation to use java-1.5.0-ibm from the RHN; is there a known > problem (or previous bug) with the Sun JVM? I'm not aware of one. We currently only ship IBM and BEA's JVMs on the RHEL supplementary discs. Neeta, Unfortunately, we do not have a support agreement in place with Sun for their JVM. This makes it distinctly challenging to resolve issues like this one. We do have support agreements in place with IBM and BEA for their JVMs. This gives us a mechanism for addressing problems and for escalating problems if they prove to be in the JVM. We would really need a reproducer to determine what the underlying problem is. Since one of our recommendations is to see if the problem exists on the latest shipping version of RHEL, it sounds like you are already making progress. Is there an issue with using RHEL 4.4? Based on your problem description, it sounds like stress testing on RHEL 4.4 would be the logical next step. Hi, Thanks for your recommendation, We did see failures with RHEL 4.4 on multiprocessor machines. However it seems to be more stable on single processor with HT enabled. While trying to narrow down the cause of the issue I have filter the strace.log for all messages for "mprotect" call. I noticed following block of messages, where an mprotect call is being made at '1173434782.998688' but not ressumed until at least 1173434791.370265, this is an ~8 seconds gape. Would you think that cause java hang issue? Attached is the file containing complete file filtering mprotact calls from the strace output. 23494 1173434782.927869 <... mprotect resumed> ) = 0 23494 1173434782.998688 mprotect(0xb7f84000, 4096, PROT_READ) = 0 23494 1173434785.680847 mprotect(0xb7f83000, 4096, PROT_READ) = 0 23494 1173434785.680966 mprotect(0xb7f83000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC) = 0 23494 1173434785.681088 mprotect(0xb7f84000, 4096, PROT_NONE) = 0 23494 1173434785.736099 mprotect(0xb7f84000, 4096, PROT_READ) = 0 23494 1173434788.649178 mprotect(0xb7f83000, 4096, PROT_READ) = 0 23494 1173434788.649297 mprotect(0xb7f83000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC) = 0 23494 1173434788.649423 mprotect(0xb7f84000, 4096, PROT_NONE) = 0 23494 1173434788.706982 mprotect(0xb7f84000, 4096, PROT_READ) = 0 23494 1173434791.370106 mprotect(0xb7f83000, 4096, PROT_READ <unfinished ...> 23494 1173434791.370265 <... mprotect resumed> ) = 0 Regards, Neeta Created attachment 151261 [details]
This file contains filter for mprotect calls from complete strace output.
|