Bug 102683

Summary: LTC3974-[BETA] Unable to run Domino with Java using LinuxThreads
Product: Red Hat Enterprise Linux 3 Reporter: IBM Bug Proxy <bugproxy>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 3.0CC: bennet, dff, drepper, fitzsim, fweimer, roland
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-28 04:19:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 101028    

Description IBM Bug Proxy 2003-08-19 20:15:31 UTC
The following has be reported by IBM LTC:  
[BETA] Unable to run Domino with Java using LinuxThreads
Please fill in each of the sections below.

Hardware Environment: PIII 500mhz

Software Environment: RHAS 3.0 Beta with latest online updates


Steps to Reproduce:
1. Install Domino 6.x 
2. export LD_ASSUME_KERNEL=2.4.1 to use LinuxThreads
3. Run setup

Actual Results: Java setup program takes a segmentation fault.  I expected
this possibility using NPTL (and it crashes same way with NPTL), but I did
not expect to see it with LinuxThreads - this breaks backwards compatability
and means that Domino 6 is not runnable on RHAS 3.0 as it stands.

Expected Results:  Java setup program should display a splash screen and 
you begin setup

Additional Information:

Here is the stack from the crash with a gdb run; also shows mapped libraries:

GNU gdb Red Hat Linux (5.3.90-0.20030710.14rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db 
lib rary "/lib/libthread_db.so.1".
 
(gdb) run -ss512k -Xoss5M -cp jhall.jar:cfgdomserver.jar:Notes.jar 
lotus.domino. setup.WizardManagerDomino -data /opt/d6/notesdata
Starting program: /opt/d6/lotus/notes/65000/linux/jvm/bin/exe/java -ss512k -
Xoss 5M -cp jhall.jar:cfgdomserver.jar:Notes.jar 
lotus.domino.setup.WizardManagerDomi no -data /opt/d6/notesdata
[Thread debugging using libthread_db enabled]
[New Thread 16384 (LWP 2576)]
 
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16384 (LWP 2576)]
0x004c5b89 in allocGuardPage ()
    at /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s:2264
2264    /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s: No 
suc h file or directory.
        in /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s
Current language:  auto; currently asm
(gdb) info shared
From        To          Syms Read   Shared Object Library
0x00319200  0x00321ebc  Yes         /lib/i686/libpthread.so.0
0x00b7ec30  0x00b8aac8  Yes         /lib/libnsl.so.1
0x0057fe90  0x00580d50  Yes         /lib/libdl.so.2
0x00126a10  0x00225ee8  Yes         /lib/i686/libc.so.6
0x007ebc00  0x007fd63f  Yes         /lib/ld-linux.so.2
0x003e0960  0x004cd620  
Yes         /opt/d6/lotus/notes/65000/linux/jvm/bin/classic/libjvm.so
0x0024a500  0x00261dac  Yes         /lib/i686/libm.so.6
0x0026aff0  0x0026c600  
Yes         /opt/d6/lotus/notes/65000/linux/jvm/bin/libxhpi.so
0x00277220  0x0027f860  
Yes         /opt/d6/lotus/notes/65000/linux/jvm/bin/libhpi.so
(gdb) where
#0  0x004c5b89 in allocGuardPage ()
    at /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s:2264
#1  0x004c5f19 in xeThreadInit ()
    at /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s:2264
#2  0x004c78be in eeInitNewThis ()
    at /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s:2264
#3  0x004cc756 in xmInitializeJVM ()
    at /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s:2264
#4  0x003fafd9 in ciCreateJVM ()
   from /opt/d6/lotus/notes/65000/linux/jvm/bin/classic/libjvm.so
#5  0x00404f96 in JNI_CreateJavaVM ()
   from /opt/d6/lotus/notes/65000/linux/jvm/bin/classic/libjvm.so
#6  0x0804989e in InitializeJVM ()
#7  0x08048f4e in main ()
(gdb)Ken - did you try LD_ASSUME_KERNEL=2.2.5 ?Yes, setting
LD_ASSUME_KERNEL=2.2.5 does work.  However, that is not an option 
because 1) that threading system has a hard coded 2mb stack size per thread, 2) 
that model does not have floating stacks which puts the JVM stacks in the same 
stack space as our threads and which has been known to cause issues.  In 
addition, I don't even think the 1.3.1 JVM is "supported" running in said 
configuration.  Also, at 2mb per stack, you have basically "crippled" Domino 
since it is a thread per connection and thus I really do not have a product to 
sell to enterprise customers - my present stack I ask for is 256kb, a lot less 
which gives me an 8-1 performance difference.Ken - I'd like to ask you to try
one more thing:  LD_ASSUME_KERNEL=2.4.19.
This resolved another bug which also seg faults and displays nothing.
Please let me know.  Thanks.Just tried with LD_ASSUME_KERNEL=2.4.19 and we still
sigsegv.
Greg/Glen - please submit this to Red Hat.  This bug was found on RHEL3
beta1 using the old pthreads library (LD_ASSUME_KERNEL=2.4.1).  Thanks.Glen/Greg
- please note the severity is BLOCKING.

Ken - do you have plans to fix this on NPTL ?  Using LD_ASSUME_KERNEL=2.4.1
to enable the old pthreads library should only be a work-around...We do not have
plans to support NPTL in the Domino 6 or Domino 5 codestreams as 
they are in "maintenance" mode and cannot take such a large change.  In 
addition, there are no plans for the 1.3.1 JVM to support NPTL either.  FYI:  
as stated earlier, NPTL fails the same way within the JVM.

thanks!

kenbo

Comment 1 Bill Nottingham 2003-08-19 20:18:30 UTC
What JVM are you using?

Comment 2 Ulrich Drepper 2003-08-19 20:25:59 UTC
And which kernel?  Try the normal smp kernel in B1 which doesn't have the 4G
user address space.  This is known to cause severe problems with JVMs.

Comment 3 Greg Kelleher 2003-08-25 17:33:58 UTC
1.3.1 is the JVM version

Comment 4 Greg Kelleher 2003-08-25 18:06:28 UTC
1.3.1 is the JVM version

Comment 5 Jakub Jelinek 2003-09-08 09:30:19 UTC
Please retry with current sushi kernel/setarch/glibc under setarch -3 in case
you'll be running 4G/4G kernel.

Comment 6 IBM Bug Proxy 2003-09-08 15:20:36 UTC
------ Additional Comments From kenbo.com  2003-08-09 11:14 -------
How do I run "under setarch -3 in case"?  Is this an option I pass to the 
kernel at boot time?  I tried running "setarch -3" as root and it could find no 
such command.  Thanks. 

Comment 7 Ulrich Drepper 2003-09-08 16:30:14 UTC
> I tried running "setarch -3" as root and it could find no 
> such command.

Then you don't have the package installed yet.  Run as root

  up2date --nosig setarch


This should install the package.  The man page should give you more hints on how
to use it if this is needed.

Comment 8 IBM Bug Proxy 2003-09-08 21:08:21 UTC
------ Additional Comments From kenbo.com  2003-08-09 16:42 -------
I finally was able to get through and download setarch and it did not seem to 
make a difference.  I tried it as

"setarch i686 -3"
"setarch i686 -3 /opt/lotus/notes/latest/linux/jvm/bin/java"

(when I ran it just as "setarch -3" it would segv, so I figured from the man 
page that it must need the additional argument).  None of this helped.  I've 
also patched to the latest RPMs as well. 

Comment 9 IBM Bug Proxy 2003-09-16 17:12:27 UTC
------ Additional Comments From kenbo.com  2003-16-09 13:07 -------
It's worse than originally thought.  Java 1.3.1 (tried with latest SR5) will 
not run at all on RHAS 3.0 if you use either NPTL or LinuxThreads with 
Dynamic/Floating stack support - trying to run the simple command "java -
version" will crash with a segmentation fault in the same place.  

This is very very bad.  This breaks all backwards compatibility with existing 
Java 1.3.1 programs - none of them will run on RHAS 3.0. 

Comment 10 Ulrich Drepper 2003-09-16 18:01:07 UTC
I'm still waiting for the obvious next step "... and we asked our collegues who
wrote the JVM where it crashes".  The JVM is a big black box.  And it is
meanwhile pretty well known that IBM's JRE is violating the ABI in many places.
 So we really need some input on where it fails.

Comment 13 Roland McGrath 2003-09-16 20:53:54 UTC
Please confirm the kernel rpm version you are running here (uname -a).


Comment 14 IBM Bug Proxy 2003-09-16 22:51:38 UTC
------ Additional Comments From kenbo.com  2003-16-09 18:35 -------
since RHAS 3.0 doesn't register by default with are windoze DNS and I'm not at 
the office, I cannot say for certain what kernel I am at.  However, I did do 
an up2date Monday on my system and so I'm at the latest kernel available as of 
then, since I always update all of the installed packages - I think it is like 
423, and the last one was 414.  Thanks.  kenbo 

Comment 19 Thomas Fitzsimmons 2003-09-17 00:20:49 UTC
The previous strace output was for the NPTL case.  I'm attaching the cases for
LD_ASSUME_KERNEL=2.4.1 and 2.4.19 as well.


Comment 26 Karen Bennet 2003-09-17 13:01:14 UTC
Changing bug category to needing more information as per David Edwards'
comments. If it turns out to be a Red Hat problem just reclassify.  

From David Edwards : We are working on this. We recognise that we have a problem
on SDK 1.3.1/RHEL 3. We thought at first that if is the same guard page problem
that we fixed on SDK 1.4.1, but although it is in the same area it is not the
same problem, and unfortunately it isn't so easy to fix.


Comment 28 Jakub Jelinek 2003-09-17 18:54:32 UTC
glibc with fixed pthread_getattr_np is glibc-2.3.2-87.
Thomas, can you check it out?

Comment 30 Roland McGrath 2003-09-17 19:55:07 UTC
We have traced the problem to the JRE's use of pthread_getattr_np on the initial
thread to get the stack bounds.  In all extant glibc versions, this call
produces bogus results; which random values you get has changed since previous
linuxthreads versions.  Apparently the particular bogus values returned by some
older linuxthreads versions caused the JRE not to crash, though other bad things
are probably happening there that have gone unnoticed.  

We have fixed the call in both NPTL and Linuxthreads; the glibc-2.3.2-87 rpms
(soon to hit Sushi) contain the fix.  However, the JRE does some apparently
unreasonable things with the now correct results for the initial thread's stack
bounds, and still crashes in about the same place.  Reading disassembly of the
allocGuardPage function where it crashes, I cannot understand the intent of its
%esp changes (allocGuardPage+0xb7).  This appears to be bug in this JRE code
that was masked (possibly in otherwise harmful ways) by the previous bug in
glibc's pthread_getattr_np.

We need some feedback here from the developers of this part of IBM's JRE.

Comment 31 IBM Bug Proxy 2003-09-18 19:55:48 UTC
------ Additional Comments From khoa.com  2003-18-09 15:49 -------
Ken - do you have any update from the Java team on this?  Please see the
analysis from Red Hat above.  Red Hat has fixed the return value of
pthread_getattr_np call, but they still think there is a bug in the JVM
(apparently JVM still crashes with the correct pthread_getattr_np call).
Thanks. 

Comment 32 IBM Bug Proxy 2003-09-18 19:56:22 UTC
------ Additional Comments From khoa.com  2003-18-09 15:51 -------
*** Bug 4438 has been marked as a duplicate of this bug. *** 

Comment 33 Matt Wilson 2003-09-26 00:09:17 UTC
eagerly awaiting feedback


Comment 34 IBM Bug Proxy 2003-09-26 14:39:16 UTC
------ Additional Comments From kenbo.com  2003-26-09 03:58 -------
From what i understand, the guard page issue  in 1.4.1 is not the same issue.  
It is different issue in 1.3.1 because that code is not in 1.3.1.  However, 
the Java team is wkg on the issue and may have a temporary workaround soon for 
us to test.  No eta on a permanent fix as of yet, but they are working on it.  
Thanks!  kenbo 

Comment 35 IBM Bug Proxy 2003-09-26 14:58:43 UTC
------ Additional Comments From khoa.com  2003-26-09 10:51 -------
Based on Kenbo's comment above, I'd like to defer this bug with resolution
"NeedMoreInfo" until we hear more from the Java team.  Thanks. 

Comment 36 IBM Bug Proxy 2003-10-17 15:00:34 UTC
------ Additional Comments From kenbo.com  2003-16-10 23:08 -------
Ok, at this point we believe it is something memory related with weak 
reference objects in C++/Java.  On the bad side, after 2 patches in the JVM we 
have gotten to a point which fails on all versions of pthreads - 
LD_ASSUME_KERNEL 2.2.5 -> 2.4.19 -> not set therefore NPTL.  We are working on 
this and have gotten an engineer knowing the Java backend of our Domino 
product working on this and we are continuing to work with IBM Hursley.  We 
have a test case with Domino which shows the issue and are aggresively working 
the issue (to the decrement of this developers sleep quotas :)  Any ideas are 
greatly appreciated.  Gonna try building a 2.4.21 default kernel to see if get 
same issue.  Thanks

kenbo 

Comment 37 IBM Bug Proxy 2003-10-23 03:17:46 UTC
------ Additional Comments From khoa.com  2003-22-10 18:10 -------
Kenbo - I'd like to defer this bug with resolution "Need_More_Info" as your
team and the Java team are working (feverishly) on this bug, and at this
point, there is nothing we can give Red Hat to continue.  Once you have
more information, please re-open this bug report.  Good luck.  Thanks. 

Comment 38 IBM Bug Proxy 2003-10-23 13:55:43 UTC
------ Additional Comments From kenbo.com  2003-23-10 09:41 -------
As of yesterday, we are presently testing a fix for this issue.  So far the 
issue has required 2 fixes/workarounds in the 1.3.1 JVM, 1 fix in pthreads, 
and 1 fix in Domino and it looks to be working.  I really would not defer this 
bug because without resolution it would mean Domino could not support RHAS 
3.0 - but it does look like we have the appropriate fixes.

Thanks!

kenbo 

Comment 39 IBM Bug Proxy 2003-10-24 20:11:02 UTC
------ Additional Comments From khoa.com  2003-24-10 16:08 -------
Kenbo - I've reopened this bug report.  Is the fix in pthreads that you
mentioned above provided by Red Hat (see Comment# 23 above) ? 

Comment 40 IBM Bug Proxy 2003-10-25 23:02:39 UTC
------ Additional Comments From kenbo.com  2003-25-10 19:01 -------
The pthread fix is in the latest RHAS 3.0 as part of the standard distro.  The 
JVM fix is in SR6 of 1.3.1.  The Domino fix will be in 6.04, 6.5.1, and beyond. 

Comment 41 IBM Bug Proxy 2003-10-26 18:20:55 UTC
------ Additional Comments From khoa.com  2003-26-10 13:16 -------
Kenbo - thanks for the info!  Since all bug reports in LTC Bugzilla are used 
to track bugs in Linux OS, I'd like to use this bug report to track the fix
in RHEL3 (I assume that the fixes in Java and Domino are tracked elsewhere).
Since Ken mentioned that the fix in Linux is already included in RHEL3, I'd
like to mark this bug accordingly.  Thanks. 

Comment 42 Ulrich Drepper 2004-09-28 04:19:18 UTC
I'm closing this bug now.  The fixed libpthread went out in RHEL3.

Comment 43 IBM Bug Proxy 2004-10-04 16:54:41 UTC
----- Additional Comments From markwiz.com  2004-10-04 12:52 EDT -------
This bug is marked as "closed" on the red hat side. Should this be closed on the
IBM side? 

Comment 44 IBM Bug Proxy 2004-10-04 17:11:18 UTC
----- Additional Comments From kenbo.com  2004-10-04 13:09 EDT -------
yes, this can be closed out as it was fixed in the relevant RHEL 3.0 and 
Domino/JVM versions.  Thanks!