Bug 121902 - [FIXED] java hangs i686 kernel (eventually), but not i586
Summary: [FIXED] java hangs i686 kernel (eventually), but not i586
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: i686
OS: Linux
high
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: FC2Update
TreeView+ depends on / blocked
 
Reported: 2004-04-28 23:37 UTC by Keith Irwin
Modified: 2007-11-30 22:10 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-06-09 09:59:38 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
dmesg output (14.93 KB, text/plain)
2004-04-29 16:06 UTC, Keith Irwin
no flags Details
Snapshot of slabtop output. (3.95 KB, text/plain)
2004-04-29 16:14 UTC, Keith Irwin
no flags Details
Slabtop output with java running (jboss). (3.88 KB, text/plain)
2004-04-29 16:18 UTC, Keith Irwin
no flags Details
Slabtop output after key lock up (in X). (3.87 KB, text/plain)
2004-04-29 16:22 UTC, Keith Irwin
no flags Details
syslog (6.22 KB, text/plain)
2004-09-11 19:31 UTC, Jim Redman
no flags Details

Description Keith Irwin 2004-04-28 23:37:37 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
Gecko/20040124 Epiphany/1.0.8

Description of problem:
When I run some "big" java thing, I get a keyboard lock-up of some
sort.  For instance, when I deploy a "war" web app to jboss running as
my local user on Fedora Core 2 test 3, the keyboard seems to lock up. 

Either:

1. I can't type anything in any window, no matter what.  (But mouse
clicks work.)

2. If I happened to be typing something, like the letter "r" when this
event happens, the "r" keeps repeating in each window that has focus
as I click around.

If I log out of gnome, I never get back to a prompt.  I can then ssh
in and reboot.  When I ssh in, I see an X process, some assorted
"bash" or "tail" processes, a javac process (sometimes), a [java]
process, etc.  I can't kill -9 them.

However, when I reboot and run jboss, deploy, etc, WITHOUT ever going
in to X, none of this happens.

Also, when I try compiling code java code for a project WITHOUT being
in X, I'm successful.  When I start X, open a gnome-terminal window
and start a compile (in this case, a clean, so nothing was even being
compiled, I don't think, so it might not even be a compile at all), I
get to the point where the keyboard locks up and/or repeats a keypress
over and over.

If I don't do anything with java (while in X), or if I just start up
jboss but don't deploy anything, things seem to work fine.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. log in, startx.

2. Install Sun's java sdk 1.4.2_04.  Tarball version works fine.

3. Compile a large app, or run a large app (not sure how you can
easily do this). Compiling a simple "hello world" works fine.

Actual Results:  keyboard locks up, (see above)

Expected Results:  Java runs, X doesn't get wonky.

Additional info:

Using an HP workstation XW8000, 2 procs with hyperthreading enabled
thus simulating 4 procs.  Stock nvidia driver supplied with Fedora (I
installed no drivers myself).

Comment 1 Keith Irwin 2004-04-29 01:59:56 UTC
I take at least some of this back.

Playing around with java (compiling a big project here or there,
running jboss, etc) can lock things up regardless.

Well, java itself locks up, which locks the terminal it's in.  I can't
kill -9 the processes, and when I ssh in from elsewhere and "reboot,"
there seems to be some trouble umounting the /home partition (where
all this takes place).

Where should I file this?  Kernel?

Comment 2 Simon Roberts 2004-04-29 10:14:12 UTC
I had exactly the same symptoms. System would work fine for a while,
then the keyboard would stop working (sometimes sticking a key, which
seems to consume all the CPU handling it).  I can still SSH in.
Eventually the system hangs completely.

I blamed my crypto-filesystem, so tried again a bunch of times without
it. I even tried a uniprocessor kernel. Still got the same problem. 
Finally, I downgraded to a -305 kernel I had lying around, and the
problems went away.

The system is a single P4-HT system, 2G memory, matrox video drivers,
also running java. I think my IDE runs under 1.4.1_03 (so it's not
just JDK 1.4.2_04). It's not local-X related, I had the same problem
running X-over-ssh.

Comment 3 Keith Irwin 2004-04-29 15:16:45 UTC
If you can trace it to the kernel, I'm going to move it to that
"component" in hopes that someone will get a look at it sooner rather
than later.

Now, if I could just find that kernel ...

Comment 4 Keith Irwin 2004-04-29 15:20:44 UTC
Adding arjanv to the CC list as he's the contact for kernel related
matters, I think.

Comment 5 Keith Irwin 2004-04-29 15:21:54 UTC
And, finally, changing summary.

Comment 6 Keith Irwin 2004-04-29 15:22:25 UTC
Spelling.  (I need my coffee.)

Comment 7 Arjan van de Ven 2004-04-29 15:23:29 UTC
can I get slabtop and dmesg output ?

Comment 8 Keith Irwin 2004-04-29 16:04:38 UTC
I'd be happy to oblige: are you talking just a normal dmesg output
once the machine is booted?

While java is running (before things crash)?

And how to I give you slabtop info?

Comment 9 Keith Irwin 2004-04-29 16:06:19 UTC
Created attachment 99791 [details]
dmesg output

Here's the dmesg output.

Comment 10 Keith Irwin 2004-04-29 16:14:17 UTC
Created attachment 99792 [details]
Snapshot of slabtop output.

Here's the slabtop output.

Comment 11 Keith Irwin 2004-04-29 16:18:19 UTC
Created attachment 99794 [details]
Slabtop output with java running (jboss).

Started jboss, here's slabtop.	Will try to do something "post crash" next.

Comment 12 Keith Irwin 2004-04-29 16:22:35 UTC
Created attachment 99795 [details]
Slabtop output after key lock up (in X).

Here it is after the keys have locked.

Comment 13 Simon Roberts 2004-04-30 09:39:07 UTC
Minor correction - I downgraded to kernel-2.6.5-1.315

Comment 14 Brian G. Anderson 2004-05-06 05:17:15 UTC
I run FC2 T3 i386 on an AMD-64.  I can compile the kernel and do other
large compile jobs no problem.  However, when I compile a large java
project we are developing the system locks hard: no response, no ssh.
 It's completely dead.  Reboot time.

I've tried different Sun VMs  1.4.1_03, 1.4.2_4 and they all do it.

I hope this gets fixed before release or FC2 will be completely
useless as my java development platform.

Comment 15 Brian G. Anderson 2004-05-06 13:04:42 UTC
I just wanted to add that on my machine running X has nothing to do
with it.  I booted the machine at init level 3 and ran my large
compile and it locked the machine in the exact same way.

Comment 16 Barry K. Nathan 2004-05-06 20:39:19 UTC
Are there any large open-source (or otherwise freely available) Java
programs that can be compiled in order to reproduce this bug? (If I
could reproduce it then maybe I could try to narrow down the cause of
the bug.)

Comment 17 Rudi Chiarito 2004-05-07 12:32:24 UTC
I can confirm this bug; it bit me the other day. Symptoms are as in
the original report: keyboard not working, the letter "e" repeated
forever in the currently focused window (I was typing "locate" in a
terminal when this occurred), java process resisting any attempt at
a kill -9 (and an empty strace, too).

At the moment the keyboard stopped working, I was simply testing the
code (started through Java Web Start) and nothing was being compiled.
Of course the bug could have been actually triggered a few seconds
before, when it was still compiling.

For what it's worth, my code uses a bit of JNI and two ORBs (MICO on
the native side and Sun's own on the Java side). This was with the
Java 1.5 beta. System is a HT P4 2.8GHz with 2GB of memory, SATA,
GeForce4 (nv driver) and a Radeon9200 PCI (open-source driver).

Comment 18 chris 2004-05-07 20:06:46 UTC
Same problem here. It drives me crazy. After recompiling one kernel 
with the settings I like (e.g. preemptive option), the PC does not 
completely crash, but only the Java process crashes/stalls and uses 
100% CPU time. It can not be killed.
You can use eclipse and try a number of features, soon or later the 
whole PC crashes. It must be new and related to one of the most 
recent kernels, with test 2 I had no problem. Test 3 does not even 
boot on one of my PCs.

Comment 19 Warren Togami 2004-05-10 01:15:09 UTC
Please try the latest i686 kernel, currently 2.6.5-1.358.  Then try
booting with kernel paramter 'vdso=0'.  If that does not change any
behavior, see if the i586 kernel of the same version works any better.

Comment 20 Warren Togami 2004-05-10 01:17:54 UTC
http://java.sun.com/j2se/1.5.0/download.jsp
Also out of curiosity, is behavior improved any with Sun's Java 1.5.0
beta?

Please test everything and report back.



Comment 21 Brian G. Anderson 2004-05-10 18:51:37 UTC
So far I have tried it with 2.6.5-1.358 i686 with and without 'vdso=0'
and get the same failure.  I would try it with the i586 kernel, but
can someone tell me a way to install it over a i686 kernel without
reinstalling everything?  Removing the kernel seems dangerous. 
Perhaps I do a "rpm -hiv --force"?  I understand this is a test
machine, but I would like to do the replace in the most safe manner
possible.


Comment 22 Barry K. Nathan 2004-05-10 19:03:54 UTC
Safest way would be to install an older kernel (hopefully you still
have one lying around; if not, then
http://people.redhat.com/arjanv/2.6 still had 356 last time I
checked), reboot into that, remove 358, then install the i586 358 and
reboot into it.

Comment 23 Brian G. Anderson 2004-05-11 03:18:40 UTC
Switching to i586 kernel made the problem go away.  I hammered the
system with multiple simultaneous java compiles and runs with out a
problem.  With a i686 kernel it would have definitely locked the
system up.  

I've varified this on two different systems.

(The weird thing is that I'm positive I installed the i586 kernel but
doing uname -a yields "Linux kelly 2.6.5-1.358 #1 Sat May 8 09:00:01
EDT 2004 i686 athlon i386 GNU/Linux"; I can't find i586 mentioned
anyware except in the rpm)

Comment 24 Warren Togami 2004-05-11 04:16:29 UTC
rpm -q --qf '%{name}-%{version}-%{release}.%{arch}\n' kernel

Use this command to display archs of installed kernels.  Use "kernel-smp".


Comment 25 Warren Togami 2004-05-11 04:21:11 UTC
Oops, the above comment might look funny.  If you see a yen symbol
before the "n" at the end, it is really a backslash.  Also I meant use
"kernel-smp" if you have SMP kernels instead of uniprocessor.

In order to diagnose this kernel problem, the kernel developers will
probably need either Open Source or 'Free as in Beer' test cases that
they can run on their own machines.  If you know of any that 100%
reliably reproducible cases that are legally distributable, please
post URLs for download.  It would save kernel developer time if you
can provide detailed installation, build, and reproducing instructions
too.

Comment 26 Brian G. Anderson 2004-05-11 06:08:53 UTC
I tried to reproduce it using compiles of eclipse and jboss, but no
luck.  Still trying to figure out what is special about my build

Comment 27 chris 2004-05-11 08:04:46 UTC
Thank you! With "kernel-2.6.5-1.356.i586" my system can finally run
Java apps again :-)

I have a P4 2.6 GHz (not HT).

Would be interesting to see what the cause is, perhaps a compiler bug
or so?

Comment 28 Didier 2004-05-18 20:26:18 UTC
Complete lockup with FC2 when :

- executing twgcon (IBM Director 4.12 console component), freely (as
in beer) installable on qualified IBM equipment (ThinkPad, xSeries, ...).
- with the IBMJava2-JRE-1.4.1-8 RPM (from RHEL3 lacd) instead of the
standard included IBM Java 1.3 (the latter does not run on FC).

- Reproducible, does not happen on FC1 ;
- machine is pingable, but cannot be ssh'd into ; Alt-SysRq works.


Comment 29 Warren Togami 2004-05-18 22:14:05 UTC
http://people.redhat.com/arjanv/2.6/RPMS.kernel/
Please test the newer i686 kernels from here.

Also note that the i586 kernel can be used as a temporary workaround
for now.

Comment 30 Didier 2004-05-19 08:02:16 UTC
WRT Comment #29 :
I am already running 2.6.6-1.370 (evaluating possible ACPI & IEEE1394
fixes), but, this being my production machine (yeah I know), I am not
very inclined to test complete lockups if there is no indication in
the rpm changelogs that possible fixes/workarounds for e.g. this
particular problem are being worked on.

In other words, I really appreciate the (sparse) bugzilla #xyz
references in the changelogs.  :)

Comment 31 Sean Kennedy 2004-05-22 18:00:14 UTC
I'm getting my machine locking up whenever I load an applet in Epiphany.
Also, if I start Eclipse through a Gnome launcher my machine locks up
too.  But if I start Eclipse from the command line, or configure the
launcher to "run in a terminal" it seems to work fine, though I
haven't had the system installed long enough to know if I run into
other problems with more use of Eclipse.

Comment 32 Need Real Name 2004-05-24 11:38:46 UTC
I started having this problem after fc2t2 as well.  I'm trying to
stress test apache tomcat 5.x.  I get a complete lock up of fc2 when
starting tomcat using the ibm sdk 141.  If i use sun sdk 1.4.2_04,
tomcat starts but then folds 15 minutes into a stress test (out of
memory errors in catalina.out file however "top" shows fc2 has plenty
of physical memory available).

Comment 33 Bert 2004-05-26 16:50:07 UTC
I also experienced a java process under Fedora Core 2 (kernel
2.6.5-1.358smp) failing to terminate... consuming 99% cpu and kill -9
has no effect. Had to reboot to clear it, though the machine was not
"locked up". Sun java version 1.4.2_04-b05.  I was running just a
regular Java program which I tried to kill with "ctrl-C". I don't
believe I was compiling at the time, though I may have been.

I had previously been running core 1 on the same machine for a long
time without ever seeing this problem. 2 days after the core 2
upgrade, this happens.  Definitely seems to be a core 2 specific issue!

I'm not running a "test" release but the "official" release, fully
up-to-date as of today.

Comment 34 Keith Irwin 2004-05-26 19:39:30 UTC
How do you install a 586 kernel?  I get conflicts with the 686 kernel.
 Do you do a force, or do you boot in rescue mode off the CD?

Comment 35 Need Real Name 2004-06-01 12:56:27 UTC
i've gotten my java apps to work using "setarch -3 i386 (command)"

Comment 36 Rob S. 2004-06-01 22:57:31 UTC
Vanilla RHEL 3 Update 2 on Opteron + Sun JVM 1.4.2 == Abort.  Just 
typing "java" to get command line args causes it.


Comment 37 Warren Togami 2004-06-07 23:25:43 UTC
Please try kernel-2.6.6-1.422.  I think mingo found a kernel fix for
this, but I am not sure if it made it into the test kernels yet.

Comment 38 Brian G. Anderson 2004-06-08 12:12:18 UTC
First of all, I have been running the 414 i686 kernel with no
problems, where I had a problem with 358 i686 and had to run with a
i586 kernel.

I tried to install 422, but I get the following error when running as
root.

Preparing...               
########################################### [100%]
   1:kernel                
########################################### [100%]
memlock: Cannot allocate memory
Couldn't lock into memory, exiting.
mkinitrd failed

so I do it by hand:
sudo mkinitrd  /boot/initrd-2.6.6-1.422.img 2.6.6-1.422
memlock: Cannot allocate memory
Couldn't lock into memory, exiting.

I tried this twice.  Have others successfull installed this kernel?

Comment 39 Didier 2004-06-08 14:07:09 UTC
With kernel-2.6.6-1.422, IBM Director 4.12 actually runs without
hardlocking my machine.

Comment 40 Brian G. Anderson 2004-06-08 17:10:51 UTC
I've overcame my mkinitrd problem (losetup couldn't allocate memory
for a new loop device?) and installed 422.  I am not seeing lookups
that I saw before.

Comment 41 Eric Hedström 2004-06-08 18:17:34 UTC
The 686 build of kernel-2.6.6-1.422 is working for me too, running
WebSphere Application Server 5.1 with IBM's JDK 1.4.1. kernel-2.6.5
would hard-lock the system when starting the app server; I had been
using the 586 build as a workaround.

Comment 42 Warren Togami 2004-06-09 03:57:14 UTC
I am guessing the memlock problem is due to a bug in that kernel
revision that was since fixed.  In any case I believe this bug is now
fixed for the next FC2 update kernel.  Keeping bug open for now so it
is easier for others to find.

Comment 43 Pascal Chong 2004-06-10 04:53:04 UTC
Can someone provide more information about the fix? Or will this
information be included inside the changelog?

Comment 44 Tarjei Knapstad 2004-06-18 09:24:27 UTC
Pascal, you can find more information in this kernel issue:

http://bugzilla.kernel.org/show_bug.cgi?id=2839

Comment 45 Jim Redman 2004-09-11 19:31:19 UTC
Created attachment 103728 [details]
syslog

Comment 46 Jim Redman 2004-09-11 19:32:53 UTC
I've just updated FC1 to FC2.  Java is unstable, I have some 50+ logs
from yesterday (Unexpected Signal : 11).  This morning, the system
hung up with this in the syslog (the whole file is attached):

Sep 11 12:38:46 charizard kernel: kernel BUG at mm/rmap.c:348!
[...]
Sep 11 12:38:46 charizard kernel: Process java (pid: 7850,
threadinfo=e12c0000 task=e024cc50)

Linux charizard 2.6.8-1.521.stk16 #1 Fri Sep 3 08:45:37 CDT 2004 i686
i686 i386 GNU/Linux

I'll try a 586 kernel, but have to build a >4K stack if the system is
to say the same (NVidia driver - hence the taint).

Can provide logs if useful, open new bug, report elsewehere, etc.etc.,
please let me know.



Comment 47 Warren Togami 2004-09-11 22:16:49 UTC
Jim, we absolutely cannot support any use with the nvidia driver. 
Also we do not support if you rebuild the kernel yourself, or use
non-standard kernels from 3rd parties.  Given that nobody else
complained about this problem for a LONG TIME, I wonder if there is
something wrong with your configuration or 3rd party kernel.

Comment 48 Pascal Chong 2004-09-12 03:08:52 UTC
Hi Warren, the problem seems to have come back again. I updated the
kernel to 2.6.8-1.521, and this the error I get with IBM's Java SDK
1.4.1 :

JVMDG080: Cannot find class com/ibm/jvm/Trace
JVMXM012: Error occurred in diagnostics initialization(2)
Could not create the Java virtual machine.

Blackdown's Java 1.4.1 still runs OK, though I'm not sure if it will
hang after a long period of time.

Comment 49 Warren Togami 2004-09-12 03:24:52 UTC
Does Sun java have this problem?
It causes the kernel and entire system to hang?

The original problem was that Java would cause the kernel to fail.  If
Java itself is failing, then maybe it is just a java problem?


Comment 50 Jim Redman 2004-09-14 23:06:32 UTC
After trying lots of kernels and after a complete reinstall (no NVIDIA
driver yet), I changed out the memory.   This has definitely improved
the situation, although galeon, some gnome_applets and Java still
occasionally crash.

FC1 had been stable for some considerable time (>year?), so either
there was a hardware failure coincident with the FC2 upgrade, or
something in FC2 works the memory harder/differently from FC1 exposing
a pre-existing, but formerly benign, hardware problem.  (I considered
installing FC1 to determine this, but couldn't be bothered.)

Either way, absent new syslog messages, there's no reason to believe
the current crashes are kernel related.  Sorry to cause trouble.

PS The IBM problem seems well know (also bizarre).  The advice seems
to be to wait for the next update from IBM.  Why this would be
considered even vaguely kernel related is a mystery to me.




Comment 51 Eric Hedström 2004-09-20 20:01:51 UTC
Updating to kernel 2.6.8-1.521 from kernel-2.6.7-1.494.2.2 causes this
com/ibm/jvm/Trace error for me as well. According to the IBM folks
this will be fixed in their next JDK update, and in the meantime there
is a workaround of setting LD_ASSUME_KERNEL=2.4 .

http://www-106.ibm.com/developerworks/forums/dw_thread.jsp?message=4203085&cat=10&thread=60016&forum=367#4203085

Comment 52 Eric Hedström 2004-10-28 17:44:07 UTC
In case anyone is still following along, IBM's Java SDK 1.4.2 build
cxia32142-20040926 works on kernel 2.6.8-1.521. If you are a nut case
like me trying to run WebSphere App Server on this kernel, you need
WAS 5.1.1 with the JDK update from here:

http://www-1.ibm.com/support/docview.wss?rs=180&uid=swg24007893

.. at least until 5.1.2 is out. :)

Comment 53 Jesus Salvo Jr. 2004-11-15 01:05:57 UTC
That fix mentioned in  
 
http://bugzilla.kernel.org/show_bug.cgi?id=2839 
 
... does it only apply for x86_64 ? 
Given what was changed was: 
 
--- arch/x86_64/mm/fault.c.orig	2004-06-10 19:51:45.000000000 +0200 
+++ arch/x86_64/mm/fault.c	2004-06-10 20:38:38.000000000 +0200 


Note You need to log in before you can comment on or make changes to this bug.