=Comment: #0================================================= Paul A. Clarke <pacman.com> - 2008-05-14 13:33 EDT Problem description: Running internal java testcase, resulted in a OutOfMemoryError and an apparent testcase hang. If this is not an installation problem, Describe any custom patches installed. Provide output from "uname -a", if possible: Linux elm3b99.beaverton.ibm.com 2.6.24.7-52ibmrt2.3 #1 SMP PREEMPT RT Mon May 12 20:23:45 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux Hardware Environment Machine type (p650, x235, SF2, etc.): x3550 Cpu type (Power4, Power5, IA-64, etc.): x86_64 Please provide access information for the machine if it is available. ABAT Is this reproducible? unknown at this time (haven't tried) Describe the steps: real-time release-testing scripts Is the system (not just the application) hung? no =Comment: #1================================================= Paul A. Clarke <pacman.com> - 2008-05-14 13:40 EDT # strace -p28886 Process 28886 attached - interrupt to quit [ Process PID=28886 runs in 32 bit mode. ] futex(0x805cd40, FUTEX_WAIT, 149, NULL =Comment: #3================================================= John G. Stultz <jstultz.com> - 2008-05-14 13:43 EDT For context to the RH guys, the kernel is your -52 with the fastgup patches removed (they also cause java hangs, but that's documented in a different bug). The -47 kernel with the fastgup patches removed did not have this issue.
------- Comment From ankigarg.com 2008-05-15 07:09 EDT------- Hey Paul, I could not reproduce this with the latest -54 MRG kernel. could you pl confirm ?
I suspect the fast_gup stuff was the culprit here. Close it?
------- Comment From sripathi.com 2008-05-21 01:22 EDT------- (In reply to comment #12) > ------- Comment From williams 2008-05-20 20:53 EST------- > I suspect the fast_gup stuff was the culprit here. Close it? Clark, Paul has in fact seen this on -54 kernel on a particular hardware type. Please give us a bit more time to analyze and confirm.
------- Comment From jstultz.com 2008-05-21 19:53 EDT------- Does this issue happen if you run the following after bootup? sudo echo 1 > /proc/sys/kernel/rwlock_reader_limit
------- Comment From pacman.com 2008-05-22 10:08 EDT------- (In reply to comment #15) > Does this issue happen if you run the following after bootup? > > sudo echo 1 > /proc/sys/kernel/rwlock_reader_limit > I pulled the -57 build (our "alpha14") and ran overnight last night. The Java OOM still occurs, even with the above setting, which I verified was still active this morning, via "cat /proc/sys/kernel/rwlock_reader_limit". I haven't tried many platforms, but have mostly been running on an x3550. Now wondering if there is something specific to the platform?
I know that we have an x3550 in the lab up in Westford, but I don't think it's allocated to our test system (RHTS). Probably wouldn't see anything with our current tests anyway, since they're not java-oriented. That's something we should work on in the future (working together to get a set of RT Java smoke tests).
------- Comment From jstultz.com 2008-05-30 15:54 EDT------- Can we get this retested with alpha16 or alpha17 (once its built?)? If it still persists, this may need a prio bump or [focus].
------- Comment From pacman.com 2008-05-30 16:15 EDT------- ------------------------------- Garbage Collection Impact ------------------------------- Measures impact of Garbage Collection on RealtimeThread not accessing the heap. - RT Scheduling latency (maximum) : 54.00 us - RT Execution duration (maximum) : 1.107 ms ...GC... - RT Scheduling latency (maximum) : 1.154 ms - RT Execution time (maximum) : 2.837 ms Measures impact of Garbage Collection on NoHeapRealtimeThread. - NHRT Scheduling latency (maximum) : 116.0 us - NHRT Execution time (maximum) : 1.223 ms ...GC... JVMDUMP006I Processing Dump Event "systhrow", detail "java/lang/OutOfMemoryError" - Please Wait. JVMDUMP007I JVM Requesting Snap Dump using '/home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/Snap0001.20080530.155059.7171.trc' JVMDUMP010I Snap Dump written to /home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/Snap0001.20080530.155059.7171.trc JVMDUMP007I JVM Requesting Heap Dump using '/home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/heapdump.20080530.155059.7171.phd' JVMDUMP010I Heap Dump written to /home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/heapdump.20080530.155059.7171.phd JVMDUMP007I JVM Requesting Java Dump using '/home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/javacore.20080530.155059.7171.txt' JVMDUMP010I Java Dump written to /home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/javacore.20080530.155059.7171.txt JVMDUMP013I Processed Dump Event "systhrow", detail "java/lang/OutOfMemoryError". java.lang.OutOfMemoryError at javolution.util.FastMap.<init>(Unknown Source) at com.raytheon.calibrate.GCImpact.run(Unknown Source) at com.raytheon.calibrate.Main.run(Unknown Source) at javax.realtime.RealtimeThread.runImpl(Unknown Source) ------------------- Network Performance ------------------- Warning: No server address specified (-DserverIp) - Start server on local host JVMDUMP006I Processing Dump Event "systhrow", detail "java/lang/OutOfMemoryError" - Please Wait. JVMDUMP007I JVM Requesting Snap Dump using '/home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/Snap0002.20080530.161316.7171.trc' JVMDUMP010I Snap Dump written to /home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/Snap0002.20080530.161316.7171.trc JVMDUMP007I JVM Requesting Heap Dump using '/home/rtuser/linux-rt-tests/internal/func/calibrate/calibrate-v1.6.0/heapdump.20080530.161316.7171.phd' [1]+ Stopped ./release-testing.sh R2 alpha16 [rtuser@elm3b99 linux-rt-tests]$ uname -r 2.6.24.7-60ibmrt2.4
------- Comment From pacman.com 2008-05-30 16:21 EDT------- after ctrl-z... $ cat /proc/meminfo MemTotal: 4022840 kB MemFree: 403776 kB Buffers: 142852 kB Cached: 1891500 kB SwapCached: 0 kB Active: 1538856 kB Inactive: 1631168 kB SwapTotal: 8008392 kB SwapFree: 8008392 kB Dirty: 8 kB Writeback: 0 kB AnonPages: 1135756 kB Mapped: 14556 kB Slab: 403776 kB SReclaimable: 373272 kB SUnreclaim: 30504 kB PageTables: 5364 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 10019812 kB Committed_AS: 1380428 kB VmallocTotal: 34359738367 kB VmallocUsed: 45964 kB VmallocChunk: 34359691823 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB
closing per IBM
------- Comment From alan_stevens.com 2008-06-10 08:00 EDT------- I've asked the JVM GC team to take a look in case they can help.
------- Comment From Charlie_Gracie.com 2008-06-10 10:20 EDT------- Hi. Can you do another run with these extra options: -verbose:gc -Xgc:verboseExtensions -XXgc:perfTraceLog=gci.trace Once this run has completed can you provide the following files: gci.trace javacore*.txt Snap.*.trc Thanks
------- Comment From dvhltc.com 2008-06-12 12:32 EDT------- Per JTC call, dropping this to P3. Not a top priority hardware platform, so while we want to work through it, the failures on the blades will take priority.
------- Comment From Sean_Foley.com 2008-06-12 13:00 EDT------- I have seen hangs occurring after OOM (OutofMemoryError) before. The issue can occur because the OOM can occur at any time whatsoever. If the OOM occurs inside a thread holding a particular lock, then without releasing the lock the thread is immediately redirected to writing diagnostic dump files. During the course of the dumps, a saparate thread triggered to write the dump files may attempt to acquire the same lock. The original thread waits for the dumping thread to complete, causing deadlock. Obtaining native (not java) stack traces of live threads can confirm this scenario. However, it is the OOM that is the underlying problem.
------- Comment From Charlie_Gracie.com 2008-06-23 11:22 EDT------- Sorry the -Xgc:verboseExtensions option does not exist in the V1 product. The rest of the options are still correct. (In reply to comment #31) > Hi. > Can you do another run with these extra options: > -verbose:gc -Xgc:verboseExtensions -XXgc:perfTraceLog=gci.trace > > Once this run has completed can you provide the following files: > gci.trace > javacore*.txt > Snap.*.trc > > Thanks
We've not been able to reproduce this. Mostly due to having cycles to spend on unsupported machines. So we're rejecting this. If it needs it can be reopened.