The following has be reported by IBM LTC: [BETA] Unable to run Domino with Java using LinuxThreads Please fill in each of the sections below. Hardware Environment: PIII 500mhz Software Environment: RHAS 3.0 Beta with latest online updates Steps to Reproduce: 1. Install Domino 6.x 2. export LD_ASSUME_KERNEL=2.4.1 to use LinuxThreads 3. Run setup Actual Results: Java setup program takes a segmentation fault. I expected this possibility using NPTL (and it crashes same way with NPTL), but I did not expect to see it with LinuxThreads - this breaks backwards compatability and means that Domino 6 is not runnable on RHAS 3.0 as it stands. Expected Results: Java setup program should display a splash screen and you begin setup Additional Information: Here is the stack from the crash with a gdb run; also shows mapped libraries: GNU gdb Red Hat Linux (5.3.90-0.20030710.14rh) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db lib rary "/lib/libthread_db.so.1". (gdb) run -ss512k -Xoss5M -cp jhall.jar:cfgdomserver.jar:Notes.jar lotus.domino. setup.WizardManagerDomino -data /opt/d6/notesdata Starting program: /opt/d6/lotus/notes/65000/linux/jvm/bin/exe/java -ss512k - Xoss 5M -cp jhall.jar:cfgdomserver.jar:Notes.jar lotus.domino.setup.WizardManagerDomi no -data /opt/d6/notesdata [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 2576)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 16384 (LWP 2576)] 0x004c5b89 in allocGuardPage () at /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s:2264 2264 /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s: No suc h file or directory. in /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s Current language: auto; currently asm (gdb) info shared From To Syms Read Shared Object Library 0x00319200 0x00321ebc Yes /lib/i686/libpthread.so.0 0x00b7ec30 0x00b8aac8 Yes /lib/libnsl.so.1 0x0057fe90 0x00580d50 Yes /lib/libdl.so.2 0x00126a10 0x00225ee8 Yes /lib/i686/libc.so.6 0x007ebc00 0x007fd63f Yes /lib/ld-linux.so.2 0x003e0960 0x004cd620 Yes /opt/d6/lotus/notes/65000/linux/jvm/bin/classic/libjvm.so 0x0024a500 0x00261dac Yes /lib/i686/libm.so.6 0x0026aff0 0x0026c600 Yes /opt/d6/lotus/notes/65000/linux/jvm/bin/libxhpi.so 0x00277220 0x0027f860 Yes /opt/d6/lotus/notes/65000/linux/jvm/bin/libhpi.so (gdb) where #0 0x004c5b89 in allocGuardPage () at /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s:2264 #1 0x004c5f19 in xeThreadInit () at /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s:2264 #2 0x004c78be in eeInitNewThis () at /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s:2264 #3 0x004cc756 in xmInitializeJVM () at /userlvl/cxia32131/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_invokers.s:2264 #4 0x003fafd9 in ciCreateJVM () from /opt/d6/lotus/notes/65000/linux/jvm/bin/classic/libjvm.so #5 0x00404f96 in JNI_CreateJavaVM () from /opt/d6/lotus/notes/65000/linux/jvm/bin/classic/libjvm.so #6 0x0804989e in InitializeJVM () #7 0x08048f4e in main () (gdb)Ken - did you try LD_ASSUME_KERNEL=2.2.5 ?Yes, setting LD_ASSUME_KERNEL=2.2.5 does work. However, that is not an option because 1) that threading system has a hard coded 2mb stack size per thread, 2) that model does not have floating stacks which puts the JVM stacks in the same stack space as our threads and which has been known to cause issues. In addition, I don't even think the 1.3.1 JVM is "supported" running in said configuration. Also, at 2mb per stack, you have basically "crippled" Domino since it is a thread per connection and thus I really do not have a product to sell to enterprise customers - my present stack I ask for is 256kb, a lot less which gives me an 8-1 performance difference.Ken - I'd like to ask you to try one more thing: LD_ASSUME_KERNEL=2.4.19. This resolved another bug which also seg faults and displays nothing. Please let me know. Thanks.Just tried with LD_ASSUME_KERNEL=2.4.19 and we still sigsegv. Greg/Glen - please submit this to Red Hat. This bug was found on RHEL3 beta1 using the old pthreads library (LD_ASSUME_KERNEL=2.4.1). Thanks.Glen/Greg - please note the severity is BLOCKING. Ken - do you have plans to fix this on NPTL ? Using LD_ASSUME_KERNEL=2.4.1 to enable the old pthreads library should only be a work-around...We do not have plans to support NPTL in the Domino 6 or Domino 5 codestreams as they are in "maintenance" mode and cannot take such a large change. In addition, there are no plans for the 1.3.1 JVM to support NPTL either. FYI: as stated earlier, NPTL fails the same way within the JVM. thanks! kenbo
What JVM are you using?
And which kernel? Try the normal smp kernel in B1 which doesn't have the 4G user address space. This is known to cause severe problems with JVMs.
1.3.1 is the JVM version
Please retry with current sushi kernel/setarch/glibc under setarch -3 in case you'll be running 4G/4G kernel.
------ Additional Comments From kenbo.com 2003-08-09 11:14 ------- How do I run "under setarch -3 in case"? Is this an option I pass to the kernel at boot time? I tried running "setarch -3" as root and it could find no such command. Thanks.
> I tried running "setarch -3" as root and it could find no > such command. Then you don't have the package installed yet. Run as root up2date --nosig setarch This should install the package. The man page should give you more hints on how to use it if this is needed.
------ Additional Comments From kenbo.com 2003-08-09 16:42 ------- I finally was able to get through and download setarch and it did not seem to make a difference. I tried it as "setarch i686 -3" "setarch i686 -3 /opt/lotus/notes/latest/linux/jvm/bin/java" (when I ran it just as "setarch -3" it would segv, so I figured from the man page that it must need the additional argument). None of this helped. I've also patched to the latest RPMs as well.
------ Additional Comments From kenbo.com 2003-16-09 13:07 ------- It's worse than originally thought. Java 1.3.1 (tried with latest SR5) will not run at all on RHAS 3.0 if you use either NPTL or LinuxThreads with Dynamic/Floating stack support - trying to run the simple command "java - version" will crash with a segmentation fault in the same place. This is very very bad. This breaks all backwards compatibility with existing Java 1.3.1 programs - none of them will run on RHAS 3.0.
I'm still waiting for the obvious next step "... and we asked our collegues who wrote the JVM where it crashes". The JVM is a big black box. And it is meanwhile pretty well known that IBM's JRE is violating the ABI in many places. So we really need some input on where it fails.
Please confirm the kernel rpm version you are running here (uname -a).
------ Additional Comments From kenbo.com 2003-16-09 18:35 ------- since RHAS 3.0 doesn't register by default with are windoze DNS and I'm not at the office, I cannot say for certain what kernel I am at. However, I did do an up2date Monday on my system and so I'm at the latest kernel available as of then, since I always update all of the installed packages - I think it is like 423, and the last one was 414. Thanks. kenbo
The previous strace output was for the NPTL case. I'm attaching the cases for LD_ASSUME_KERNEL=2.4.1 and 2.4.19 as well.
Changing bug category to needing more information as per David Edwards' comments. If it turns out to be a Red Hat problem just reclassify. From David Edwards : We are working on this. We recognise that we have a problem on SDK 1.3.1/RHEL 3. We thought at first that if is the same guard page problem that we fixed on SDK 1.4.1, but although it is in the same area it is not the same problem, and unfortunately it isn't so easy to fix.
glibc with fixed pthread_getattr_np is glibc-2.3.2-87. Thomas, can you check it out?
We have traced the problem to the JRE's use of pthread_getattr_np on the initial thread to get the stack bounds. In all extant glibc versions, this call produces bogus results; which random values you get has changed since previous linuxthreads versions. Apparently the particular bogus values returned by some older linuxthreads versions caused the JRE not to crash, though other bad things are probably happening there that have gone unnoticed. We have fixed the call in both NPTL and Linuxthreads; the glibc-2.3.2-87 rpms (soon to hit Sushi) contain the fix. However, the JRE does some apparently unreasonable things with the now correct results for the initial thread's stack bounds, and still crashes in about the same place. Reading disassembly of the allocGuardPage function where it crashes, I cannot understand the intent of its %esp changes (allocGuardPage+0xb7). This appears to be bug in this JRE code that was masked (possibly in otherwise harmful ways) by the previous bug in glibc's pthread_getattr_np. We need some feedback here from the developers of this part of IBM's JRE.
------ Additional Comments From khoa.com 2003-18-09 15:49 ------- Ken - do you have any update from the Java team on this? Please see the analysis from Red Hat above. Red Hat has fixed the return value of pthread_getattr_np call, but they still think there is a bug in the JVM (apparently JVM still crashes with the correct pthread_getattr_np call). Thanks.
------ Additional Comments From khoa.com 2003-18-09 15:51 ------- *** Bug 4438 has been marked as a duplicate of this bug. ***
eagerly awaiting feedback
------ Additional Comments From kenbo.com 2003-26-09 03:58 ------- From what i understand, the guard page issue in 1.4.1 is not the same issue. It is different issue in 1.3.1 because that code is not in 1.3.1. However, the Java team is wkg on the issue and may have a temporary workaround soon for us to test. No eta on a permanent fix as of yet, but they are working on it. Thanks! kenbo
------ Additional Comments From khoa.com 2003-26-09 10:51 ------- Based on Kenbo's comment above, I'd like to defer this bug with resolution "NeedMoreInfo" until we hear more from the Java team. Thanks.
------ Additional Comments From kenbo.com 2003-16-10 23:08 ------- Ok, at this point we believe it is something memory related with weak reference objects in C++/Java. On the bad side, after 2 patches in the JVM we have gotten to a point which fails on all versions of pthreads - LD_ASSUME_KERNEL 2.2.5 -> 2.4.19 -> not set therefore NPTL. We are working on this and have gotten an engineer knowing the Java backend of our Domino product working on this and we are continuing to work with IBM Hursley. We have a test case with Domino which shows the issue and are aggresively working the issue (to the decrement of this developers sleep quotas :) Any ideas are greatly appreciated. Gonna try building a 2.4.21 default kernel to see if get same issue. Thanks kenbo
------ Additional Comments From khoa.com 2003-22-10 18:10 ------- Kenbo - I'd like to defer this bug with resolution "Need_More_Info" as your team and the Java team are working (feverishly) on this bug, and at this point, there is nothing we can give Red Hat to continue. Once you have more information, please re-open this bug report. Good luck. Thanks.
------ Additional Comments From kenbo.com 2003-23-10 09:41 ------- As of yesterday, we are presently testing a fix for this issue. So far the issue has required 2 fixes/workarounds in the 1.3.1 JVM, 1 fix in pthreads, and 1 fix in Domino and it looks to be working. I really would not defer this bug because without resolution it would mean Domino could not support RHAS 3.0 - but it does look like we have the appropriate fixes. Thanks! kenbo
------ Additional Comments From khoa.com 2003-24-10 16:08 ------- Kenbo - I've reopened this bug report. Is the fix in pthreads that you mentioned above provided by Red Hat (see Comment# 23 above) ?
------ Additional Comments From kenbo.com 2003-25-10 19:01 ------- The pthread fix is in the latest RHAS 3.0 as part of the standard distro. The JVM fix is in SR6 of 1.3.1. The Domino fix will be in 6.04, 6.5.1, and beyond.
------ Additional Comments From khoa.com 2003-26-10 13:16 ------- Kenbo - thanks for the info! Since all bug reports in LTC Bugzilla are used to track bugs in Linux OS, I'd like to use this bug report to track the fix in RHEL3 (I assume that the fixes in Java and Domino are tracked elsewhere). Since Ken mentioned that the fix in Linux is already included in RHEL3, I'd like to mark this bug accordingly. Thanks.
I'm closing this bug now. The fixed libpthread went out in RHEL3.
----- Additional Comments From markwiz.com 2004-10-04 12:52 EDT ------- This bug is marked as "closed" on the red hat side. Should this be closed on the IBM side?
----- Additional Comments From kenbo.com 2004-10-04 13:09 EDT ------- yes, this can be closed out as it was fixed in the relevant RHEL 3.0 and Domino/JVM versions. Thanks!