Bug 179228 - kernels > 2.6.15-1.1826.2.10_FC5 cause java/mono apps to freeze during GC
kernels > 2.6.15-1.1826.2.10_FC5 cause java/mono apps to freeze during GC
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Ingo Molnar
Brian Brock
:
: 177592 177820 179002 179304 179978 (view as bug list)
Depends On:
Blocks: FC5Blocker 173278 178493 179811 180637 180926
  Show dependency treegraph
 
Reported: 2006-01-28 12:01 EST by Anthony Green
Modified: 2007-11-30 17:11 EST (History)
22 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-02-13 17:25:49 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
stack traces from gdb (14.46 KB, text/plain)
2006-01-28 12:39 EST, Anthony Green
no flags Details

  None (edit)
Description Anthony Green 2006-01-28 12:01:09 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7

Description of problem:
Eclipse, RSSOwl, Azureus, chainsaw, etc all run fine on 2.6.15-1.1826.2.10_FC5 with java-1.4.2-gcj-compat, but they all freeze during startup on newer kernels.

It looks like this is happening during GC.  I gathered some stack traces from an eclipse process and will I'll upload in a minute.

AG


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Run eclipse
2.
3.
  

Additional info:
Comment 1 Anthony Green 2006-01-28 12:39:32 EST
Created attachment 123840 [details]
stack traces from gdb
Comment 2 Dave Jones 2006-02-03 15:28:53 EST
*** Bug 179002 has been marked as a duplicate of this bug. ***
Comment 3 Jonathan Berry 2006-02-04 16:04:02 EST
I am seeing this as well.  It has bitten me most when trying to update. 
gcj-dbtool get stuck and yum sits there waiting on it.  Running "for((;;)); do
killall gcj-dbtool; sleep 1; done" allows yum to get through updating, but I'm
sure my java stuff is a mess.  I think I'm also seeing this affect mono apps,
like beagle.  Running beagle-search just sits there.  Attaching to it with gdb
shows:
0x00002ba878af615d in sem_wait () from /lib64/libpthread.so.0
(gdb) info threads
  3 Thread 1073822048 (LWP 3674)  0x00002ba878af7461 in __nanosleep_nocancel ()
from /lib64/libpthread.so.0
  2 Thread 1075988832 (LWP 3675)  0x00002ba878af46f7 in
pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  1 Thread 48002585361616 (LWP 3673)  0x00002ba878af615d in sem_wait () from
/lib64/libpthread.so.0

Let me know what I can do to help with debugging this problem.

Jonathan
Comment 4 Anthony Green 2006-02-04 16:23:02 EST
(In reply to comment #3)
> I think I'm also seeing this affect mono apps,
> like beagle.

This makes sense.  gcj and mono use the same GC implementation, and the problem
shows up when the collector tries to stop all threads so it can take care of
business.
Comment 5 Andrew Cagney 2006-02-08 12:31:12 EST
Changing arch=all, remarkably similar wedgie occures on i386 during garbage collect.
Comment 6 Bryce McKinlay 2006-02-08 14:37:36 EST
Actually I'm pretty sure this is x86_64 arch specific. Something is wrong with
signals/sigsuspend. Removing the following patches from kernel-2.6.15-1.1914_FC5
fixes it for me:

Patch206: linux-2.6-x86_64-tif-restore-sigmask.patch
Patch207: linux-2.6-x86_64-generic-sigsuspend.patch 
Patch208: linux-2.6-x86_64-add-ppoll-pselect.patch 
Comment 7 Adam Jocksch 2006-02-08 15:04:59 EST
Just replicated with 2.6.15-1.1826.2.10_FC5 #1 Wed Jan 11 18:12:42 EST 2006 i686
i686 i386 GNU/Linux on i386 machine using Frysk.
Comment 8 Erwin Rol 2006-02-09 07:48:28 EST
(In reply to comment #6)
> Actually I'm pretty sure this is x86_64 arch specific. Something is wrong with
> signals/sigsuspend. Removing the following patches from kernel-2.6.15-1.1914_FC5
> fixes it for me:
> 
> Patch206: linux-2.6-x86_64-tif-restore-sigmask.patch
> Patch207: linux-2.6-x86_64-generic-sigsuspend.patch 
> Patch208: linux-2.6-x86_64-add-ppoll-pselect.patch 

What did those pathces try to fix ? I mean could they temp. be disabled in the
next rawhide kernels ?



Comment 9 Christopher Aillon 2006-02-09 11:11:02 EST
*** Bug 177820 has been marked as a duplicate of this bug. ***
Comment 10 Christopher Aillon 2006-02-09 11:12:13 EST
*** Bug 179304 has been marked as a duplicate of this bug. ***
Comment 11 Christopher Aillon 2006-02-09 11:13:54 EST
*** Bug 177592 has been marked as a duplicate of this bug. ***
Comment 12 Christopher Aillon 2006-02-09 11:15:50 EST
*** Bug 177703 has been marked as a duplicate of this bug. ***
Comment 13 Christopher Aillon 2006-02-09 11:20:04 EST
*** Bug 180551 has been marked as a duplicate of this bug. ***
Comment 14 David Woodhouse 2006-02-10 06:46:54 EST
There seems to be conflicting reports about whether this is x86_64 only or not.
Can someone confirm whether it does happen on i386 and ppc?

For explanation of these patches, see http://lwn.net/Articles/164892/

The PowerPC code path is the one I handled myself; i386 was done by dhowells and
then x86_64 by Andi Kleen.

I was just able to 'yum update' from a three-day-old rawhide to current, on
ppc64 (with 1.1909 kernel). That updated a few Java packages, including eclipse,
and will definitely have involved running gcj-dbtool. There were no problems --
should I infer from this that PPC isn't affected, or is there a better
reproducer I should be trying? A smaller test case would definitely be good.
Comment 15 David Woodhouse 2006-02-10 06:50:58 EST
Could this be related to bug #180567?
Comment 16 Andrew Haley 2006-02-10 06:56:44 EST
There's no reason to belive the frysk bug and the x86_64 java bug are in any way
connected.  Let's keep them separate.
Comment 17 Jakub Jelinek 2006-02-10 11:01:58 EST
*** Bug 180926 has been marked as a duplicate of this bug. ***
Comment 18 Bryce McKinlay 2006-02-10 19:05:08 EST
(In reply to comment #14)
> There seems to be conflicting reports about whether this is x86_64 only or not.
> Can someone confirm whether it does happen on i386 and ppc?

This does not effect i386. The i386/frysk hang is a separate issue. So far I have been unable to install FC5 
on a PPC machine to test there.

Yes, I think this is very likely the same as bug #180567
Comment 19 David Woodhouse 2006-02-10 19:07:32 EST
It's fine on PowerPC -- but tell me separately about the problems you had
installing FC5. I did a rawhide install to a PPC64 machine only a couple of days
ago, and it was mostly OK.
Comment 20 Dan Siemon 2006-02-13 16:58:48 EST
After applying the updates today (Feb 13 2006), Mono apps work again. Kernel
version is now 2.6.15-1.1939_FC5 x86_64.
Comment 21 Christopher Aillon 2006-02-13 17:05:45 EST
Yeah, Dave removed the patches noted in comment 6.
Comment 22 Bryce McKinlay 2006-02-13 17:25:49 EST
I can confirm that this is fixed in 2.6.15-1.1939_FC5. I'm closing this one: the
underlying kernel problem is also being tracked in bug #180567.
Comment 23 Christopher Aillon 2006-02-21 14:40:17 EST
*** Bug 179978 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.