The following has be reported by IBM LTC:
[BETA] Unable to run Domino with Java using LinuxThreads
Please fill in each of the sections below.
Hardware Environment: PIII 500mhz
Software Environment: RHAS 3.0 Beta with latest online updates
Steps to Reproduce:
1. Install Domino 6.x
2. export LD_ASSUME_KERNEL=2.4.1 to use LinuxThreads
3. Run setup
Actual Results: Java setup program takes a segmentation fault. I expected
this possibility using NPTL (and it crashes same way with NPTL), but I did
not expect to see it with LinuxThreads - this breaks backwards compatability
and means that Domino 6 is not runnable on RHAS 3.0 as it stands.
Expected Results: Java setup program should display a splash screen and
you begin setup
Here is the stack from the crash with a gdb run; also shows mapped libraries:
GNU gdb Red Hat Linux (5.3.90-0.20030710.14rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db
lib rary "/lib/libthread_db.so.1".
(gdb) run -ss512k -Xoss5M -cp jhall.jar:cfgdomserver.jar:Notes.jar
lotus.domino. setup.WizardManagerDomino -data /opt/d6/notesdata
Starting program: /opt/d6/lotus/notes/65000/linux/jvm/bin/exe/java -ss512k -
Xoss 5M -cp jhall.jar:cfgdomserver.jar:Notes.jar
lotus.domino.setup.WizardManagerDomi no -data /opt/d6/notesdata
[Thread debugging using libthread_db enabled]
[New Thread 16384 (LWP 2576)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16384 (LWP 2576)]
0x004c5b89 in allocGuardPage ()
2264 /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s: No
suc h file or directory.
Current language: auto; currently asm
(gdb) info shared
From To Syms Read Shared Object Library
0x00319200 0x00321ebc Yes /lib/i686/libpthread.so.0
0x00b7ec30 0x00b8aac8 Yes /lib/libnsl.so.1
0x0057fe90 0x00580d50 Yes /lib/libdl.so.2
0x00126a10 0x00225ee8 Yes /lib/i686/libc.so.6
0x007ebc00 0x007fd63f Yes /lib/ld-linux.so.2
0x0024a500 0x00261dac Yes /lib/i686/libm.so.6
#0 0x004c5b89 in allocGuardPage ()
#1 0x004c5f19 in xeThreadInit ()
#2 0x004c78be in eeInitNewThis ()
#3 0x004cc756 in xmInitializeJVM ()
#4 0x003fafd9 in ciCreateJVM ()
#5 0x00404f96 in JNI_CreateJavaVM ()
#6 0x0804989e in InitializeJVM ()
#7 0x08048f4e in main ()
(gdb)Ken - did you try LD_ASSUME_KERNEL=2.2.5 ?Yes, setting
LD_ASSUME_KERNEL=2.2.5 does work. However, that is not an option
because 1) that threading system has a hard coded 2mb stack size per thread, 2)
that model does not have floating stacks which puts the JVM stacks in the same
stack space as our threads and which has been known to cause issues. In
addition, I don't even think the 1.3.1 JVM is "supported" running in said
configuration. Also, at 2mb per stack, you have basically "crippled" Domino
since it is a thread per connection and thus I really do not have a product to
sell to enterprise customers - my present stack I ask for is 256kb, a lot less
which gives me an 8-1 performance difference.Ken - I'd like to ask you to try
one more thing: LD_ASSUME_KERNEL=2.4.19.
This resolved another bug which also seg faults and displays nothing.
Please let me know. Thanks.Just tried with LD_ASSUME_KERNEL=2.4.19 and we still
Greg/Glen - please submit this to Red Hat. This bug was found on RHEL3
beta1 using the old pthreads library (LD_ASSUME_KERNEL=2.4.1). Thanks.Glen/Greg
- please note the severity is BLOCKING.
Ken - do you have plans to fix this on NPTL ? Using LD_ASSUME_KERNEL=2.4.1
to enable the old pthreads library should only be a work-around...We do not have
plans to support NPTL in the Domino 6 or Domino 5 codestreams as
they are in "maintenance" mode and cannot take such a large change. In
addition, there are no plans for the 1.3.1 JVM to support NPTL either. FYI:
as stated earlier, NPTL fails the same way within the JVM.
What JVM are you using?
And which kernel? Try the normal smp kernel in B1 which doesn't have the 4G
user address space. This is known to cause severe problems with JVMs.
1.3.1 is the JVM version
Please retry with current sushi kernel/setarch/glibc under setarch -3 in case
you'll be running 4G/4G kernel.
------ Additional Comments From email@example.com 2003-08-09 11:14 -------
How do I run "under setarch -3 in case"? Is this an option I pass to the
kernel at boot time? I tried running "setarch -3" as root and it could find no
such command. Thanks.
> I tried running "setarch -3" as root and it could find no
> such command.
Then you don't have the package installed yet. Run as root
up2date --nosig setarch
This should install the package. The man page should give you more hints on how
to use it if this is needed.
------ Additional Comments From firstname.lastname@example.org 2003-08-09 16:42 -------
I finally was able to get through and download setarch and it did not seem to
make a difference. I tried it as
"setarch i686 -3"
"setarch i686 -3 /opt/lotus/notes/latest/linux/jvm/bin/java"
(when I ran it just as "setarch -3" it would segv, so I figured from the man
page that it must need the additional argument). None of this helped. I've
also patched to the latest RPMs as well.
------ Additional Comments From email@example.com 2003-16-09 13:07 -------
It's worse than originally thought. Java 1.3.1 (tried with latest SR5) will
not run at all on RHAS 3.0 if you use either NPTL or LinuxThreads with
Dynamic/Floating stack support - trying to run the simple command "java -
version" will crash with a segmentation fault in the same place.
This is very very bad. This breaks all backwards compatibility with existing
Java 1.3.1 programs - none of them will run on RHAS 3.0.
I'm still waiting for the obvious next step "... and we asked our collegues who
wrote the JVM where it crashes". The JVM is a big black box. And it is
meanwhile pretty well known that IBM's JRE is violating the ABI in many places.
So we really need some input on where it fails.
Please confirm the kernel rpm version you are running here (uname -a).
------ Additional Comments From firstname.lastname@example.org 2003-16-09 18:35 -------
since RHAS 3.0 doesn't register by default with are windoze DNS and I'm not at
the office, I cannot say for certain what kernel I am at. However, I did do
an up2date Monday on my system and so I'm at the latest kernel available as of
then, since I always update all of the installed packages - I think it is like
423, and the last one was 414. Thanks. kenbo
The previous strace output was for the NPTL case. I'm attaching the cases for
LD_ASSUME_KERNEL=2.4.1 and 2.4.19 as well.
Changing bug category to needing more information as per David Edwards'
comments. If it turns out to be a Red Hat problem just reclassify.
From David Edwards : We are working on this. We recognise that we have a problem
on SDK 1.3.1/RHEL 3. We thought at first that if is the same guard page problem
that we fixed on SDK 1.4.1, but although it is in the same area it is not the
same problem, and unfortunately it isn't so easy to fix.
glibc with fixed pthread_getattr_np is glibc-2.3.2-87.
Thomas, can you check it out?
We have traced the problem to the JRE's use of pthread_getattr_np on the initial
thread to get the stack bounds. In all extant glibc versions, this call
produces bogus results; which random values you get has changed since previous
linuxthreads versions. Apparently the particular bogus values returned by some
older linuxthreads versions caused the JRE not to crash, though other bad things
are probably happening there that have gone unnoticed.
We have fixed the call in both NPTL and Linuxthreads; the glibc-2.3.2-87 rpms
(soon to hit Sushi) contain the fix. However, the JRE does some apparently
unreasonable things with the now correct results for the initial thread's stack
bounds, and still crashes in about the same place. Reading disassembly of the
allocGuardPage function where it crashes, I cannot understand the intent of its
%esp changes (allocGuardPage+0xb7). This appears to be bug in this JRE code
that was masked (possibly in otherwise harmful ways) by the previous bug in
We need some feedback here from the developers of this part of IBM's JRE.
------ Additional Comments From email@example.com 2003-18-09 15:49 -------
Ken - do you have any update from the Java team on this? Please see the
analysis from Red Hat above. Red Hat has fixed the return value of
pthread_getattr_np call, but they still think there is a bug in the JVM
(apparently JVM still crashes with the correct pthread_getattr_np call).
------ Additional Comments From firstname.lastname@example.org 2003-18-09 15:51 -------
*** Bug 4438 has been marked as a duplicate of this bug. ***
eagerly awaiting feedback
------ Additional Comments From email@example.com 2003-26-09 03:58 -------
From what i understand, the guard page issue in 1.4.1 is not the same issue.
It is different issue in 1.3.1 because that code is not in 1.3.1. However,
the Java team is wkg on the issue and may have a temporary workaround soon for
us to test. No eta on a permanent fix as of yet, but they are working on it.
------ Additional Comments From firstname.lastname@example.org 2003-26-09 10:51 -------
Based on Kenbo's comment above, I'd like to defer this bug with resolution
"NeedMoreInfo" until we hear more from the Java team. Thanks.
------ Additional Comments From email@example.com 2003-16-10 23:08 -------
Ok, at this point we believe it is something memory related with weak
reference objects in C++/Java. On the bad side, after 2 patches in the JVM we
have gotten to a point which fails on all versions of pthreads -
LD_ASSUME_KERNEL 2.2.5 -> 2.4.19 -> not set therefore NPTL. We are working on
this and have gotten an engineer knowing the Java backend of our Domino
product working on this and we are continuing to work with IBM Hursley. We
have a test case with Domino which shows the issue and are aggresively working
the issue (to the decrement of this developers sleep quotas :) Any ideas are
greatly appreciated. Gonna try building a 2.4.21 default kernel to see if get
same issue. Thanks
------ Additional Comments From firstname.lastname@example.org 2003-22-10 18:10 -------
Kenbo - I'd like to defer this bug with resolution "Need_More_Info" as your
team and the Java team are working (feverishly) on this bug, and at this
point, there is nothing we can give Red Hat to continue. Once you have
more information, please re-open this bug report. Good luck. Thanks.
------ Additional Comments From email@example.com 2003-23-10 09:41 -------
As of yesterday, we are presently testing a fix for this issue. So far the
issue has required 2 fixes/workarounds in the 1.3.1 JVM, 1 fix in pthreads,
and 1 fix in Domino and it looks to be working. I really would not defer this
bug because without resolution it would mean Domino could not support RHAS
3.0 - but it does look like we have the appropriate fixes.
------ Additional Comments From firstname.lastname@example.org 2003-24-10 16:08 -------
Kenbo - I've reopened this bug report. Is the fix in pthreads that you
mentioned above provided by Red Hat (see Comment# 23 above) ?
------ Additional Comments From email@example.com 2003-25-10 19:01 -------
The pthread fix is in the latest RHAS 3.0 as part of the standard distro. The
JVM fix is in SR6 of 1.3.1. The Domino fix will be in 6.04, 6.5.1, and beyond.
------ Additional Comments From firstname.lastname@example.org 2003-26-10 13:16 -------
Kenbo - thanks for the info! Since all bug reports in LTC Bugzilla are used
to track bugs in Linux OS, I'd like to use this bug report to track the fix
in RHEL3 (I assume that the fixes in Java and Domino are tracked elsewhere).
Since Ken mentioned that the fix in Linux is already included in RHEL3, I'd
like to mark this bug accordingly. Thanks.
I'm closing this bug now. The fixed libpthread went out in RHEL3.
----- Additional Comments From email@example.com 2004-10-04 12:52 EDT -------
This bug is marked as "closed" on the red hat side. Should this be closed on the
----- Additional Comments From firstname.lastname@example.org 2004-10-04 13:09 EDT -------
yes, this can be closed out as it was fixed in the relevant RHEL 3.0 and
Domino/JVM versions. Thanks!