Bug 446606 - java hang after OutOfMemoryError
Summary: java hang after OutOfMemoryError
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: beta
Hardware: x86_64
OS: All
low
urgent
Target Milestone: ---
: ---
Assignee: Red Hat Real Time Maintenance
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-05-15 10:40 UTC by IBM Bug Proxy
Modified: 2008-10-02 00:46 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-06-02 20:31:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 44798 0 None None None Never

Description IBM Bug Proxy 2008-05-15 10:40:35 UTC
=Comment: #0=================================================
Paul A. Clarke <pacman.com> - 2008-05-14 13:33 EDT
Problem description:
Running internal java testcase, resulted in a OutOfMemoryError and an apparent
testcase hang.

If this is not an installation problem,
       Describe any custom patches installed.

       Provide output from "uname -a", if possible:
Linux elm3b99.beaverton.ibm.com 2.6.24.7-52ibmrt2.3 #1 SMP PREEMPT RT Mon May 12
20:23:45 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

Hardware Environment
    Machine type (p650, x235, SF2, etc.): x3550
    Cpu type (Power4, Power5, IA-64, etc.): x86_64

Please provide access information for the machine if it is available.
ABAT

Is this reproducible?  unknown at this time (haven't tried)
    Describe the steps:
real-time release-testing scripts

Is the system (not just the application) hung? no
=Comment: #1=================================================
Paul A. Clarke <pacman.com> - 2008-05-14 13:40 EDT
# strace -p28886
Process 28886 attached - interrupt to quit
[ Process PID=28886 runs in 32 bit mode. ]
futex(0x805cd40, FUTEX_WAIT, 149, NULL
=Comment: #3=================================================
John G. Stultz <jstultz.com> - 2008-05-14 13:43 EDT
For context to the RH guys, the kernel is your -52 with the fastgup patches
removed (they also cause java hangs, but that's documented in a different bug).

The -47 kernel with the fastgup patches removed did not have this issue.

Comment 1 IBM Bug Proxy 2008-05-15 11:17:19 UTC
------- Comment From ankigarg.com 2008-05-15 07:09 EDT-------
Hey Paul, I could not reproduce this with the latest -54 MRG kernel. could you
pl confirm ?

Comment 2 Clark Williams 2008-05-21 00:53:38 UTC
I suspect the fast_gup stuff was the culprit here. Close it?

Comment 3 IBM Bug Proxy 2008-05-21 05:24:30 UTC
------- Comment From sripathi.com 2008-05-21 01:22 EDT-------
(In reply to comment #12)
> ------- Comment From williams 2008-05-20 20:53 EST-------
> I suspect the fast_gup stuff was the culprit here. Close it?

Clark, Paul has in fact seen this on -54 kernel on a particular hardware type.
Please give us a bit more time to analyze and confirm.

Comment 4 IBM Bug Proxy 2008-05-21 23:56:52 UTC
------- Comment From jstultz.com 2008-05-21 19:53 EDT-------
Does this issue happen if you run the following after bootup?

sudo echo 1 > /proc/sys/kernel/rwlock_reader_limit

Comment 5 IBM Bug Proxy 2008-05-22 14:16:27 UTC
------- Comment From pacman.com 2008-05-22 10:08 EDT-------
(In reply to comment #15)
> Does this issue happen if you run the following after bootup?
>
>         sudo echo 1 > /proc/sys/kernel/rwlock_reader_limit
>

I pulled the -57 build (our "alpha14") and ran overnight last night.  The Java
OOM still occurs, even with the above setting, which I verified was still active
this morning, via "cat /proc/sys/kernel/rwlock_reader_limit".

I haven't tried many platforms, but have mostly been running on an x3550.  Now
wondering if there is something specific to the platform?

Comment 6 Clark Williams 2008-05-22 20:58:05 UTC
I know that we have an x3550 in the lab up in Westford, but I don't think it's
allocated to our test system (RHTS). Probably wouldn't see anything with our
current tests anyway, since they're not java-oriented. 

That's something we should work on in the future (working together to get a set
of RT Java smoke tests).


Comment 7 IBM Bug Proxy 2008-05-30 19:56:59 UTC
------- Comment From jstultz.com 2008-05-30 15:54 EDT-------
Can we get this retested with alpha16 or alpha17 (once its built?)?

If it still persists, this may need a prio bump or [focus].

Comment 8 IBM Bug Proxy 2008-05-30 20:16:39 UTC
------- Comment From pacman.com 2008-05-30 16:15 EDT-------
-------------------------------
Garbage Collection Impact
-------------------------------

Measures impact of Garbage Collection on RealtimeThread not accessing the heap.
- RT Scheduling latency (maximum) : 54.00 us
- RT Execution duration (maximum) : 1.107 ms
...GC...
- RT Scheduling latency (maximum) : 1.154 ms
- RT Execution time (maximum) : 2.837 ms

Measures impact of Garbage Collection on NoHeapRealtimeThread.
- NHRT Scheduling latency (maximum) : 116.0 us
- NHRT Execution time (maximum) : 1.223 ms
...GC...
JVMDUMP006I Processing Dump Event "systhrow", detail
"java/lang/OutOfMemoryError" - Please Wait.
JVMDUMP007I JVM Requesting Snap Dump using
'/home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/Snap0001.20080530.155059.7171.trc'
JVMDUMP010I Snap Dump written to
/home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/Snap0001.20080530.155059.7171.trc
JVMDUMP007I JVM Requesting Heap Dump using
'/home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/heapdump.20080530.155059.7171.phd'
JVMDUMP010I Heap Dump written to
/home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/heapdump.20080530.155059.7171.phd
JVMDUMP007I JVM Requesting Java Dump using
'/home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/javacore.20080530.155059.7171.txt'
JVMDUMP010I Java Dump written to
/home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/javacore.20080530.155059.7171.txt
JVMDUMP013I Processed Dump Event "systhrow", detail "java/lang/OutOfMemoryError".
java.lang.OutOfMemoryError
at javolution.util.FastMap.<init>(Unknown Source)
at com.raytheon.calibrate.GCImpact.run(Unknown Source)
at com.raytheon.calibrate.Main.run(Unknown Source)
at javax.realtime.RealtimeThread.runImpl(Unknown Source)
-------------------
Network Performance
-------------------

Warning: No server address specified (-DserverIp) - Start server on local host
JVMDUMP006I Processing Dump Event "systhrow", detail
"java/lang/OutOfMemoryError" - Please Wait.
JVMDUMP007I JVM Requesting Snap Dump using
'/home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/Snap0002.20080530.161316.7171.trc'
JVMDUMP010I Snap Dump written to
/home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/Snap0002.20080530.161316.7171.trc
JVMDUMP007I JVM Requesting Heap Dump using
'/home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/heapdump.20080530.161316.7171.phd'

[1]+  Stopped                 ./release-testing.sh R2 alpha16
[rtuser@elm3b99 linux-rt-tests]$ uname -r
2.6.24.7-60ibmrt2.4

Comment 9 IBM Bug Proxy 2008-05-30 20:24:32 UTC
------- Comment From pacman.com 2008-05-30 16:21 EDT-------
after ctrl-z...

$ cat /proc/meminfo
MemTotal:      4022840 kB
MemFree:        403776 kB
Buffers:        142852 kB
Cached:        1891500 kB
SwapCached:          0 kB
Active:        1538856 kB
Inactive:      1631168 kB
SwapTotal:     8008392 kB
SwapFree:      8008392 kB
Dirty:               8 kB
Writeback:           0 kB
AnonPages:     1135756 kB
Mapped:          14556 kB
Slab:           403776 kB
SReclaimable:   373272 kB
SUnreclaim:      30504 kB
PageTables:       5364 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  10019812 kB
Committed_AS:  1380428 kB
VmallocTotal: 34359738367 kB
VmallocUsed:     45964 kB
VmallocChunk: 34359691823 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:     2048 kB

Comment 10 Clark Williams 2008-06-02 20:31:42 UTC
closing per IBM

Comment 11 IBM Bug Proxy 2008-06-10 12:09:30 UTC
------- Comment From alan_stevens.com 2008-06-10 08:00 EDT-------
I've asked the JVM GC team to take a look in case they can help.

Comment 12 IBM Bug Proxy 2008-06-10 14:25:31 UTC
------- Comment From Charlie_Gracie.com 2008-06-10 10:20 EDT-------
Hi.
Can you do another run with these extra options:
-verbose:gc -Xgc:verboseExtensions -XXgc:perfTraceLog=gci.trace

Once this run has completed can you provide the following files:
gci.trace
javacore*.txt
Snap.*.trc

Thanks

Comment 13 IBM Bug Proxy 2008-06-12 16:40:37 UTC
------- Comment From dvhltc.com 2008-06-12 12:32 EDT-------
Per JTC call, dropping this to P3.  Not a top priority hardware platform, so
while we want to work through it, the failures on the blades will take priority.

Comment 14 IBM Bug Proxy 2008-06-12 17:09:21 UTC
------- Comment From Sean_Foley.com 2008-06-12 13:00 EDT-------
I have seen hangs occurring after OOM (OutofMemoryError) before.   The issue
can occur because the OOM can occur at any time whatsoever.  If the OOM occurs
inside a thread holding a particular lock, then without releasing the lock the
thread is immediately redirected to writing diagnostic dump files.  During the
course of the dumps, a saparate thread triggered to write the dump files may
attempt to acquire the same lock. The original thread waits for the dumping
thread to complete, causing deadlock.  Obtaining native (not java) stack traces
of live threads can confirm this scenario.

However, it is the OOM that is the underlying problem.

Comment 15 IBM Bug Proxy 2008-06-23 15:24:42 UTC
------- Comment From Charlie_Gracie.com 2008-06-23 11:22 EDT-------
Sorry the -Xgc:verboseExtensions option does not exist in the V1 product.  The
rest of the options are still correct.

(In reply to comment #31)
> Hi.
> Can you do another run with these extra options:
> -verbose:gc -Xgc:verboseExtensions -XXgc:perfTraceLog=gci.trace
>
> Once this run has completed can you provide the following files:
> gci.trace
> javacore*.txt
> Snap.*.trc
>
> Thanks

Comment 16 IBM Bug Proxy 2008-10-02 00:46:04 UTC
We've not been able to reproduce this. Mostly due to having cycles to spend on unsupported machines. So we're rejecting this. If it needs it can be reopened.


Note You need to log in before you can comment on or make changes to this bug.