Bug 102683
Summary: | LTC3974-[BETA] Unable to run Domino with Java using LinuxThreads | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | IBM Bug Proxy <bugproxy> |
Component: | glibc | Assignee: | Jakub Jelinek <jakub> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 3.0 | CC: | bennet, dff, drepper, fitzsim, fweimer, roland |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-09-28 04:19:18 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 101028 |
Description
IBM Bug Proxy
2003-08-19 20:15:31 UTC
What JVM are you using? And which kernel? Try the normal smp kernel in B1 which doesn't have the 4G user address space. This is known to cause severe problems with JVMs. 1.3.1 is the JVM version 1.3.1 is the JVM version Please retry with current sushi kernel/setarch/glibc under setarch -3 in case you'll be running 4G/4G kernel. ------ Additional Comments From kenbo.com 2003-08-09 11:14 ------- How do I run "under setarch -3 in case"? Is this an option I pass to the kernel at boot time? I tried running "setarch -3" as root and it could find no such command. Thanks. > I tried running "setarch -3" as root and it could find no
> such command.
Then you don't have the package installed yet. Run as root
up2date --nosig setarch
This should install the package. The man page should give you more hints on how
to use it if this is needed.
------ Additional Comments From kenbo.com 2003-08-09 16:42 ------- I finally was able to get through and download setarch and it did not seem to make a difference. I tried it as "setarch i686 -3" "setarch i686 -3 /opt/lotus/notes/latest/linux/jvm/bin/java" (when I ran it just as "setarch -3" it would segv, so I figured from the man page that it must need the additional argument). None of this helped. I've also patched to the latest RPMs as well. ------ Additional Comments From kenbo.com 2003-16-09 13:07 ------- It's worse than originally thought. Java 1.3.1 (tried with latest SR5) will not run at all on RHAS 3.0 if you use either NPTL or LinuxThreads with Dynamic/Floating stack support - trying to run the simple command "java - version" will crash with a segmentation fault in the same place. This is very very bad. This breaks all backwards compatibility with existing Java 1.3.1 programs - none of them will run on RHAS 3.0. I'm still waiting for the obvious next step "... and we asked our collegues who wrote the JVM where it crashes". The JVM is a big black box. And it is meanwhile pretty well known that IBM's JRE is violating the ABI in many places. So we really need some input on where it fails. Please confirm the kernel rpm version you are running here (uname -a). ------ Additional Comments From kenbo.com 2003-16-09 18:35 ------- since RHAS 3.0 doesn't register by default with are windoze DNS and I'm not at the office, I cannot say for certain what kernel I am at. However, I did do an up2date Monday on my system and so I'm at the latest kernel available as of then, since I always update all of the installed packages - I think it is like 423, and the last one was 414. Thanks. kenbo The previous strace output was for the NPTL case. I'm attaching the cases for LD_ASSUME_KERNEL=2.4.1 and 2.4.19 as well. Changing bug category to needing more information as per David Edwards' comments. If it turns out to be a Red Hat problem just reclassify. From David Edwards : We are working on this. We recognise that we have a problem on SDK 1.3.1/RHEL 3. We thought at first that if is the same guard page problem that we fixed on SDK 1.4.1, but although it is in the same area it is not the same problem, and unfortunately it isn't so easy to fix. glibc with fixed pthread_getattr_np is glibc-2.3.2-87. Thomas, can you check it out? We have traced the problem to the JRE's use of pthread_getattr_np on the initial thread to get the stack bounds. In all extant glibc versions, this call produces bogus results; which random values you get has changed since previous linuxthreads versions. Apparently the particular bogus values returned by some older linuxthreads versions caused the JRE not to crash, though other bad things are probably happening there that have gone unnoticed. We have fixed the call in both NPTL and Linuxthreads; the glibc-2.3.2-87 rpms (soon to hit Sushi) contain the fix. However, the JRE does some apparently unreasonable things with the now correct results for the initial thread's stack bounds, and still crashes in about the same place. Reading disassembly of the allocGuardPage function where it crashes, I cannot understand the intent of its %esp changes (allocGuardPage+0xb7). This appears to be bug in this JRE code that was masked (possibly in otherwise harmful ways) by the previous bug in glibc's pthread_getattr_np. We need some feedback here from the developers of this part of IBM's JRE. ------ Additional Comments From khoa.com 2003-18-09 15:49 ------- Ken - do you have any update from the Java team on this? Please see the analysis from Red Hat above. Red Hat has fixed the return value of pthread_getattr_np call, but they still think there is a bug in the JVM (apparently JVM still crashes with the correct pthread_getattr_np call). Thanks. ------ Additional Comments From khoa.com 2003-18-09 15:51 ------- *** Bug 4438 has been marked as a duplicate of this bug. *** eagerly awaiting feedback ------ Additional Comments From kenbo.com 2003-26-09 03:58 ------- From what i understand, the guard page issue in 1.4.1 is not the same issue. It is different issue in 1.3.1 because that code is not in 1.3.1. However, the Java team is wkg on the issue and may have a temporary workaround soon for us to test. No eta on a permanent fix as of yet, but they are working on it. Thanks! kenbo ------ Additional Comments From khoa.com 2003-26-09 10:51 ------- Based on Kenbo's comment above, I'd like to defer this bug with resolution "NeedMoreInfo" until we hear more from the Java team. Thanks. ------ Additional Comments From kenbo.com 2003-16-10 23:08 ------- Ok, at this point we believe it is something memory related with weak reference objects in C++/Java. On the bad side, after 2 patches in the JVM we have gotten to a point which fails on all versions of pthreads - LD_ASSUME_KERNEL 2.2.5 -> 2.4.19 -> not set therefore NPTL. We are working on this and have gotten an engineer knowing the Java backend of our Domino product working on this and we are continuing to work with IBM Hursley. We have a test case with Domino which shows the issue and are aggresively working the issue (to the decrement of this developers sleep quotas :) Any ideas are greatly appreciated. Gonna try building a 2.4.21 default kernel to see if get same issue. Thanks kenbo ------ Additional Comments From khoa.com 2003-22-10 18:10 ------- Kenbo - I'd like to defer this bug with resolution "Need_More_Info" as your team and the Java team are working (feverishly) on this bug, and at this point, there is nothing we can give Red Hat to continue. Once you have more information, please re-open this bug report. Good luck. Thanks. ------ Additional Comments From kenbo.com 2003-23-10 09:41 ------- As of yesterday, we are presently testing a fix for this issue. So far the issue has required 2 fixes/workarounds in the 1.3.1 JVM, 1 fix in pthreads, and 1 fix in Domino and it looks to be working. I really would not defer this bug because without resolution it would mean Domino could not support RHAS 3.0 - but it does look like we have the appropriate fixes. Thanks! kenbo ------ Additional Comments From khoa.com 2003-24-10 16:08 ------- Kenbo - I've reopened this bug report. Is the fix in pthreads that you mentioned above provided by Red Hat (see Comment# 23 above) ? ------ Additional Comments From kenbo.com 2003-25-10 19:01 ------- The pthread fix is in the latest RHAS 3.0 as part of the standard distro. The JVM fix is in SR6 of 1.3.1. The Domino fix will be in 6.04, 6.5.1, and beyond. ------ Additional Comments From khoa.com 2003-26-10 13:16 ------- Kenbo - thanks for the info! Since all bug reports in LTC Bugzilla are used to track bugs in Linux OS, I'd like to use this bug report to track the fix in RHEL3 (I assume that the fixes in Java and Domino are tracked elsewhere). Since Ken mentioned that the fix in Linux is already included in RHEL3, I'd like to mark this bug accordingly. Thanks. I'm closing this bug now. The fixed libpthread went out in RHEL3. ----- Additional Comments From markwiz.com 2004-10-04 12:52 EDT ------- This bug is marked as "closed" on the red hat side. Should this be closed on the IBM side? ----- Additional Comments From kenbo.com 2004-10-04 13:09 EDT ------- yes, this can be closed out as it was fixed in the relevant RHEL 3.0 and Domino/JVM versions. Thanks! |