Bug 172417 - LTC19618- IBM JRE crashes in Admin Server on EM64T arch
LTC19618- IBM JRE crashes in Admin Server on EM64T arch
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: java-1.4.2-ibm (Show other bugs)
4.0
i386 Linux
high Severity high
: ---
: ---
Assigned To: Thomas Fitzsimmons
Chandrasekar Kannan
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-11-04 01:36 EST by Steve Parkinson
Modified: 2015-01-04 18:19 EST (History)
4 users (show)

See Also:
Fixed In Version: 1.4.2.7-1jpp.4.el4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-02 21:28:35 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
DMESG output from EM64T machine (16.55 KB, text/plain)
2005-11-04 01:41 EST, Steve Parkinson
no flags Details
dmesg.out (15.08 KB, text/plain)
2005-11-09 11:55 EST, IBM Bug Proxy
no flags Details
cpuinfo.txt (2.32 KB, text/plain)
2005-11-09 13:55 EST, IBM Bug Proxy
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 19618 None None None Never

  None (edit)
Description Steve Parkinson 2005-11-04 01:36:25 EST
This is a critical customer escalation.

CS 7.1 on Dell 1425 with 2 physical intel EM64T processors, each showing as
2 CPU's in /proc/cpuinfo, for a total of 4 CPUs.

OS: RHEL4 update 2. 32-bit mode

Customer reports that admin server crashes. After reproducing, I
found the crash in the IBM JRE (1.4.2 base) with the following stack trace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 180235 (LWP 6345)]
0x4441d8d0 in InitAtomicTestAndSet ()
   from /export/cs/bin/base/jre-142-base/bin/libjitc.so
(gdb) where
#0  0x4441d8d0 in InitAtomicTestAndSet ()
   from /export/cs/bin/base/jre-142-base/bin/libjitc.so
#1  0x441de30e in java_lang_Compiler_start (CompiledCodeLinkVector=0x4153d80c)
    at /userlvl/cxia32142ifx/src/jit/sov/java_hook/jit_compiler_dllmain.c:1224
#2  0x414b34dc in JVM_InitializeCompiler (env=0x80c9f10, compCls=0xbe5ff360)
    at /userlvl/cxia32142ifx/src/jvm/sov/xe/common/jit.c:1560
#3  0x414e767e in mmisInvoke_V_VHelper (o=0x441a3dd8, mb=0x824aa0c,
    args_size=0, ee=0x80c9f10, optop=0xbe5ff380)
    at /userlvl/cxia32142ifx/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_jni_invokers.c:420
#4  0x414b50cf in getee_end_13 ()
    at
/userlvl/cxia32142ifx/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_custom_invokers.s:3114
#5  0x0824aa0c in ?? ()
#6  0x414bec9f in isq_doinvoke_V__ ()
    at /userlvl/cxia32142ifx/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_execute0.s:30283
#7  0x0824ae70 in ?? ()
#8  0x414bec9f in isq_doinvoke_V__ ()
    at /userlvl/cxia32142ifx/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_execute0.s:30283
#9  0x00000000 in ?? ()


(gdb) disassemble 0x4441d8d0
Dump of assembler code for function InitAtomicTestAndSet:
0x4441d8d0 <InitAtomicTestAndSet+0>:    cmpl   $0x1,0x4457265c
0x4441d8d7 <InitAtomicTestAndSet+7>:    jle    0x4441d8e0 <skip_add_lock>
0x4441d8d9 <InitAtomicTestAndSet+9>:    movb   $0xf0,0x4441d8ec


The admin server is a Netscape Enterprise Server webserver, which initializes
a JVM in-process using the JNI_CreateJavaVM call to handle servlets

Other notes:
Does not crash if booted in non-SMP kernel
Does not crash if JIT turned off (-Djava.compiler=NONE)
Does not crash with older 2-CPU hyperthreaded Xeons running same O/S
Does not crash if java program invoked from command line with 'java Program',
[however this is not viable for our program]
Still crashes with kernel.exec-shield turned off
Still crashes with kernel.exec-shield-randomize off
Still crashes SELinux turned off
Still crashes LD_ASSUME_KERNEL=2.2.5
Still crashes with IBM 1.4.2 SR2, SR3

IBM JRE 1.5 beta does not crash, but JNI_CreateJavaVM does return an error (-1)
A beta is not a viable fix anyway.
Comment 1 Steve Parkinson 2005-11-04 01:41:46 EST
Created attachment 120716 [details]
DMESG output from EM64T machine
Comment 2 Thomas Fitzsimmons 2005-11-04 10:39:13 EST
Reassigning to Mark Wisner.
Comment 3 Steve Parkinson 2005-11-04 12:08:08 EST
Turning off execshield as described in this Dell technote resolves the problem:
http://support.dell.com/support/topics/global.aspx/support/kb/en/document?dn=1091548&l=en&langid=1&c=us&cs=&s=gen

We have proposed this workaround to the customer, however I would still like to
hear a comment from IBM. Thanks
Comment 5 IBM Bug Proxy 2005-11-07 11:27:10 EST
---- Additional Comments From markwiz@us.ibm.com  2005-11-07 11:23 EDT -------
Red Hat, 

Can IBM contact the customer directly?

If so, can we have the contact information? 
Comment 6 Steve Parkinson 2005-11-07 13:09:09 EST
I cannot give out customer details in this case.

However, we have reproduced this problem in our lab with identical hardware.
So, if you need more info, we can help
Comment 7 IBM Bug Proxy 2005-11-07 14:07:11 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-11-07 14:03 EDT -------
This bug looks very similar to a bug reported in RIT54773 / LTC bug #12450. In
it, it mentions the following (it was an AMD64 CPU not Intel but still x86-64
arch though using a 32-bit Hugemem kernel on RHEL 3):

\"The problem is, the JIT is trying to jump to an function that is written in
assembler in what must be the data section. The reason it causes a problem here
is the kernel on AMD64 uses the NX feature to make VMAs that are marked as not
executable not executable.

This will require the JIT to be fixed, but a workaround can be used. Use the
kernel parameter \"noexec=off\" to get things going.\"

I going to try and track down the PMR that was previously submitted to Java
folks for that bug to see what the resolution was. According to other comments,
it should have been something like this:

\"I understand that the problem is in InitAtomicTestAndSet() method (implemented 
in assembly).....

Since this method has self-modifying code,we need to have the segment 
as \"writable\" hence we have written this method in data segment. With the NX 
feature enable,this will cause problem,because data segment is not \"executable\" 
by default.

Earlier we had a similar problem on Windows,The problem affected some assembly 
files.The fix was in the make file, where in we linked the specific data 
segment with \"executable\" attribute on(by passing apropriate linker flags 
as \"SHLDFLAGS += -SECTION:_TEXT_ATOMIC,ERW\").

I am looking into the gcc man pages to check if we have similar options in 
linux,(i.e setting a specific data segment with \"executable\" permission.)\" 
Comment 8 IBM Bug Proxy 2005-11-07 15:19:27 EST
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hariprasad@in.ibm.com




------- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-11-07 15:12 EDT -------
I sent an email to Java JITC developer to find out more about the resolution to
RIT54773 since I can\'t find the PMR in the system. 
Comment 9 IBM Bug Proxy 2005-11-08 11:23:40 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-11-08 11:17 EDT -------
What would it take to be able to recreate/see the problem here at IBM? The Java
JITC would like to see the problem. He believes it was fixed but would still
investigate. I\'ve asked him if he could provide a debug JVM in the interim that
would help in remote analysis if required and am awaiting a response. 
Comment 10 Steve Parkinson 2005-11-08 14:11:56 EST
The simplest way (so far) to see this problem is to download Fedora Directory
Server and install it. It uses the same component (called admin server) that
I am using in Certificate System (But certificate system is not available for
download at this time).

The download page is here:
http://directory.fedora.redhat.com/wiki/Download

fedora-ds-7.1.tar.gz

NOTE: the crash will happen during the next step, and will cause the setup
process to fail. So, boot the system with execshield off in order to do the
next step.

Once you unpack the tar file, follow the steps under 'creating an instance'
on this page:
http://directory.fedora.redhat.com/wiki/Install_Guide

You will see in the <server-root>/admin-serv/logs/errors file, the following
entries, indicating a successful install:

[04/Nov/2005:19:41:53] info ( 3266): A new configuration was successfully installed
[04/Nov/2005:19:41:53] info ( 3266): Using the Classic VM v1.4.2 from IBM
Corporation

Now, reboot with execshield turned on, and start admin-server by running
<server-root>/start-admin

You will see the following in the log:

[04/Nov/2005:19:41:53] info ( 3266): A new configuration was successfully installed

But the entry about starting the JVM is not there - the system crashes
before that happens.








Comment 11 Steve Parkinson 2005-11-08 14:16:35 EST
Note that running the app under gdb is quite complex. If you need to do that,
you might have some problems.  But if you must, try:

modify start-admin:
Change
PRODUCT_BIN=uxwdog
to 
PRODUCT_BIN=ns-httpd

and later,
                ./$PRODUCT_BIN -d $PRODUCT_SUBDIR/config $@
to
 -start)
                echo ./$PRODUCT_BIN -d $PRODUCT_SUBDIR/config $@
                gdb ./$PRODUCT_BIN

But gdb has lots of problems handling forking of our process.


Also, I was told:  turning execshield off for the entire system is WAY overkill; use
setarch i386 -X  <app>
Comment 12 IBM Bug Proxy 2005-11-08 15:34:52 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-11-08 15:34 EDT -------
I located a IBM xSeries 226 (2-Way Xeon 3.2Ghz HT) that has RHEL 4 GA on it.
I\'ll try to update the kernel and install some of the components mentioned and
see what I get. 
Comment 13 IBM Bug Proxy 2005-11-08 17:44:55 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-11-08 17:41 EDT -------
Must I use the tar.gz or can I just install the fedora-ds-7.1-2.RHEL4.i386.opt.rpm ? 
Comment 14 Chandrasekar Kannan 2005-11-08 18:07:04 EST
the rpm should work as well
Comment 15 IBM Bug Proxy 2005-11-09 11:55:06 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-11-09 11:52 EDT -------
I did the following:

1. Booted kernel with noexec=off on kernel command line
2. Installed IBMJava2-142-ia32-JRE-1.4.2-3.0.i386.rpm (SR3)
3. installed the fedora-ds-7.1-2.RHEL4.i386.opt.rpm
4. Ran /opt/fedora-ds/setup/setup and took all defaults to setup server
5. After installation, it indicated the server was started
6. /opt/fedora-ds/admin-serv/logs/error file showed message \"Using the Classic
VM v1.4.2 from IBM Corporation\"
7. I executed /opt/fedora-ds/stop-admin to shutdown
8. I renamed the /opt/fedora-ds/bin/base/jre containing what looks like 1.4.2
SR1a and placed a symlink for jre to /opt/IBMJava2-142/jre to try the SR3 release
9. I executed /opt/fedora-ds/start-admin and the server indicated it was started
10. Checked the log and it gave the same messages as step 6
11. Rebooted without the noexec kernel command line parm 
12. Repeated start-admin but still saw the message indicating it was using the
IBM JVM and that it started

What am I not doing correctly to see the problem? I am using the
kernel-smp-2.6.9-22.0.1.EL.x86_64.rpm. Should I be using the i386 SMP kernel?

I\'ll attach the dmesg output of the system so you can see if they are similar to
yours. 
Comment 16 IBM Bug Proxy 2005-11-09 11:55:30 EST
Created attachment 120847 [details]
dmesg.out
Comment 17 IBM Bug Proxy 2005-11-09 11:55:50 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-11-09 11:54 EDT -------
 
dmesg output from IBM xSeries 226 (2-Way Xeon 3.2Ghz HT) 
Comment 18 Steve Parkinson 2005-11-09 12:13:49 EST
Hmm, do you have the NX capability on your processor? You can look at the
'flags' line in  /proc/cpuinfo:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 3000.557
cache size      : 2048 KB
physical id     : 3
siblings        : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm pni monitor
ds_cpl cid xtpr
bogomips        : 5996.54

Comment 19 IBM Bug Proxy 2005-11-09 13:55:03 EST
Created attachment 120851 [details]
cpuinfo.txt
Comment 20 IBM Bug Proxy 2005-11-09 13:55:19 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-11-09 13:50 EDT -------
 
full /proc/cpuinfo output

I\'ve attached the full cpuinfo but the flags show nx as best I can tell. Also,
I didn\'t have an em64t box locally so I had to borrow one remotely from
Beaverton, OR (I\'m in Austin, TX) which had RHEL 4 GA and I just upgraded the
kernel so I hope their isn\'t a requirement to have the full U2 installed but it
sounds like this should be restricted to using a specific kernel feature
without a glibc dependency.

Also, the Java developer has provided me a PD (problem determination) build of
the SDK which I will try and place somewhere for you this afternoon for
collecting more diagnostic information. I\'ll most likely place it on
testcase.software.ibm.com ftp site.

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 4
model name	:		    Intel(R) Xeon(TM) CPU 3.20GHz
stepping	: 1
cpu MHz 	: 3200.242
cache size	: 1024 KB
physical id	: 0
siblings	: 2
core id 	: 0
cpu cores	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm pni monitor
ds_cpl cid cx16 xtpr
bogomips	: 6324.22
clflush size	: 64
cache_alignment : 128
address sizes	: 36 bits physical, 48 bits virtual
power management: 
Comment 21 IBM Bug Proxy 2005-11-09 15:25:05 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-11-09 15:21 EDT -------
You should be able to download the two files I was provided by Java developer
using (note that the files will remain on the ftp server for at most 3 or 4 days
before a cron job removes them):

wget
testcase.software.ibm.com:/linux/fromibm/RH172417/cxia32142-20050929-sdk.jar.zip
wget
testcase.software.ibm.com:/linux/fromibm/RH172417/cxia32142-20050929-jre_pd.jar.zip

Here are the instructions I was given which I need to try myself as well:

Note:- cxia32142-20050929-jre_pd.jar is a debug version of libraries.

Please follow the steps to use the same.
1)	Download the files from the above location in the binary mode.
2)	Extract cxia32142-20050929-sdk.jar to get a folder sdk.
3)	Goto sdk/jre folder.
4)	Extract cxia32142-20050929-jre_pd.jar to get a pd folder in sdk/jre
5)	Change the script to use as \" java -Xpd classfile\"

Provide the core file if system crashes. 
Comment 22 IBM Bug Proxy 2005-11-09 17:20:14 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-11-09 17:19 EDT -------
I had to chmod +x some of the bin files in the pd sdk after unzipping it in
order for it to run. I also changed the symlink for /opt/fedora-ds/bin/base/jre
to point the pd sdk/jre directory. It came up OK. I then edited the
/opt/fedora-ds/jvm12.conf thinking that\'s the place to add the -Xpd option but
when I restart the admin-server I get:

JVMCI167: Bad pCluster initial size -Xpd
JVMCI123: Unable to parse 1.2 format supplied options - rc=-6

and it doesn\'t start. :( Taking the option out of the jvm12.conf allows it to
start again. 
Comment 23 IBM Bug Proxy 2005-11-09 17:30:07 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-11-09 17:27 EDT -------
FWIW, I am not sure that those messages are coming from java interpreter since
from command line it seems to acknowledge the option:

$ /opt/fedora-ds/bin/base/jre/bin/java -Xpd -version
java version \"1.4.2\"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2)
Classic VM (build 1.4.2, J2RE 1.4.2 IBM build cxia32142-20050929 (SR3) [PD
Build] (JIT enabled: jitc)) 
Comment 24 IBM Bug Proxy 2005-11-10 00:34:38 EST
---- Additional Comments From hariprasad@in.ibm.com  2005-11-10 00:34 EDT -------
Adding -Xpd in the file /opt/fedora-ds/jvm12.conf, treats -Xpd as the JVM 
option, However -Xpd should be provided in the place where we invoke java 
command.
-Xpd should not be given in the option file. 
Comment 25 IBM Bug Proxy 2005-11-10 00:54:55 EST
---- Additional Comments From hariprasad@in.ibm.com  2005-11-10 00:52 EDT -------
Please while trying to recrete the problem, make sure that.
1) We are on em64t
2) Latest RHEL kernel.
3) We are at IBM JDK 142SR3. 
Comment 26 Steve Parkinson 2005-11-10 16:32:32 EST
hariprasad:
We don't actually 'invoke the java command'. We are initializing the JVM
using the JNI_CreateJavaVM() call from our webserver (admin server) process.

So, I am confused by what you said - all the options in jvm12.conf are
passed to the JNI_CreateJavaVM(&jvm, &env, &vm_args) call (as the vm_args argument.)

I am downloading the pd jvm now. I hope I can get it to work.


Comment 27 Steve Parkinson 2005-11-10 20:56:51 EST
I understand now. The -Xpd option is only recognized by the java executable,
which just uses that flag to determine which vm to load.

I worked around this by overlaying the pd/bin and pd/lib directories over
the regular jre/bin and jre/lib directories

Now I can run 'java -version' and it shows
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2)
Classic VM (build 1.4.2, J2RE 1.4.2 IBM build cxia32142-20050929 (SR3) [PD
Build] (JIT enabled: jitc)) 


As you can see, PD is enabled, even though I didn't specify the -Xpd flag.

Now, when I run the server, I get:
[root@localhost redhat-cs]# ./start-admin
Netscape-Enterprise/6.2 B04/18/2005 13:49
warning: daemon is running as super-user
[LS ls1] http://localhost.localdomain, port 35198 ready to accept requests
UTE011: Active tracepoint array length for JIT is 307;  should be 43

The program then coredumps. The stacktrace is as follows:
(gdb) where
#0  0x00000000 in ?? ()
#1  0x085b658d in skip_handler0_check () at rtx86catch_.s:1949
#2  0x085b6540 in ask_os0 () at rtx86catch_.s:1949
#3  0x05dccb14 in ?? ()
#4  0x05dccb00 in ?? ()
#5  0x05dccdf0 in ?? ()
#6  0x075bd5aa in getsehr_end_2 ()
    at /userlvl/cxia32142ifx/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_exception_stub.s:922
#7  0x05dccb00 in ?? ()
#8  0x067c3d34 in ?? ()
#9  0x05dccb14 in ?? ()
#10 0x05dccd24 in ?? ()
#11 0x0000000b in ?? ()
#12 0x00000000 in ?? ()


I'm not exactly clear what else I am supposed to see from the PD build.

I can send you the core file if you like. It's 652MB, and compresses down
to 1.9MB.

There is still no indication in the logs/errors file that the JVM is starting.

Comment 28 IBM Bug Proxy 2005-11-11 01:10:06 EST
---- Additional Comments From hariprasad@in.ibm.com  2005-11-11 01:07 EDT -------

If you are launching the VM through the JNI call,
It is better to prepend the LIBPATH to point to the PD libs.
Even after setting the LIBPATH,we need to have some scripts which 
passess VM arguments during startup( not as VM options).
.
However,I am not sure about the setting used in this server,probably
I may be able to figureout once I get the access to the machine.
Meanwhile, Please Run with the normal SDK(non PD version).
. 
Comment 29 IBM Bug Proxy 2005-11-14 11:51:50 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-11-14 11:50 EDT -------
I have a new IBM x366 (2 Intel(R) Xeon(TM) CPU 3.60GHz with HT and NX feature
enabled) installed with RHEL 4 U2, the latest 2.6.22-0.1-ELsmp x86_64 kernel,
the same fedora-ds-7.1-2.RHEL4.i386.opt.rpm installed and cat
/proc/sys/kernel/exec-shield says \"1\" but everything still works when I exec
start-admin. That makes two machines I have yet to be able to see the problem
on. Any ideas why I can\'t see the problem reported?

Where can we find the coredump, BTW? 
Comment 30 IBM Bug Proxy 2005-11-18 10:01:48 EST
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEEDINFO




------- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-11-18 10:00 EDT -------
Any help redhat can offer so we\'re not idle with this high severity bug is
appreciated. I have the Java JITC developer standing by but either he needs for
me to be able to recreate the bug for him or at the very least some additional
debug information from you (even the coredump you have may show something). 
Comment 32 Steve Parkinson 2005-11-18 16:18:15 EST
Yes, I owe you an update - sorry for the delay.

The core file I have is with different (private) binaries than I had you install,
so you would not be able to read it.

I will re-run the test with the fedora binaries.

Comment 33 Steve Parkinson 2005-11-18 18:36:39 EST
I have uploaded the core files.

Here is the first one, with the PD build:
crash1 - pd build
corefile name - core.32486
uncompressed: -rw-------  1 root root 685330432 Nov 18 23:20 ibmjre_core.32486
compressed:   -rw-------  1 root root 1996988 Nov 18 23:20 ibmjre_core.32486.gz
Download location: wget ftp://enterprise.redhat.com/incoming/ibmjre_core.32486.gz

Stack trace:
#0  0xb5a268d4 in search_cached_cc0 ()
   from /opt/redhat-cs/bin/base/jre_combined/bin/libjitc.so
#1  0xb590b58d in skip_handler0_check () at rtx86catch_.s:1949
#2  0xb590b540 in ask_os0 () at rtx86catch_.s:1949
#3  0x072101c4 in ?? ()
#4  0x072101b0 in ?? ()
#5  0x00000000 in ?? ()
Comment 34 Steve Parkinson 2005-11-18 18:37:48 EST
crash2 - non pd build

corefile name - core.32562
uncompressed: -rw-------  1 root root 685715456 Nov 18 23:28 ibmjre_core.32562
compressed:   -rw-------  1 root root 2009552 Nov 18 23:28 ibmjre_core.32562.gz
Download location: wget ftp://enterprise.redhat.com/incoming/ibmjre_core.32562.gz

Stack Trace:
#0  0x04d65a04 in search_cached_cc0 ()
   from /opt/fedora-ds/bin/base/jre_ibm142sr3_nonpd/bin/libjitc.so
#1  0x04d3660d in skip_handler0_check ()
   from /opt/fedora-ds/bin/base/jre_ibm142sr3_nonpd/bin/libjitc.so
#2  0x04d365c0 in ask_os0 ()
   from /opt/fedora-ds/bin/base/jre_ibm142sr3_nonpd/bin/libjitc.so
#3  0x06fbc1c4 in ?? ()
#4  0x06fbc1b0 in ?? ()
#5  0x00000000 in ?? ()

Comment 35 IBM Bug Proxy 2005-11-22 05:52:38 EST
---- Additional Comments From hariprasad@in.ibm.com  2005-11-22 05:49 EDT -------
I downloaded both the core file,
I noticed the same failure occuring in both the core files.
.
JVM is crashing while moving an reference from stack to a register.
.
Will continue my analysis, and post it. 
Comment 36 IBM Bug Proxy 2005-11-22 09:58:48 EST
---- Additional Comments From hariprasad@in.ibm.com  2005-11-22 09:57 EDT -------
Update:- 
.
I have instrumented a JDK and placed it in the following path.
.
ftp://testcase.boulder.ibm.com/ns/fromibm/20421/sdk.tar.gz
.
Please download it and repackage it for the WAS environment.
.
run it on the failing machine.
.
Thanks
-Hari.. 
Comment 37 Steve Parkinson 2005-11-22 19:22:15 EST
Hmm - I am not able to run this build?  I get the following error:

JVMDG218: JVM is not fully initialized - will not do dump processing.
Segmentation fault

My preparation:
[root@localhost base]# mkdir test
[root@localhost base]# cd test
[root@localhost test]# gtar zxf ../sdk.1122.tar.gz
[root@localhost base]# cd test/sdk/jre
[root@localhost jre]# chmod -R a+x .
[root@localhost jre]# bin/java -version
JVMDG218: JVM is not fully initialized - will not do dump processing.
Segmentation fault


This does not appear to be a PD build as before. However I tried also to
apply it over the IBM 1.4.2SR3 release described in comment 21 and I get the
same results.
Comment 38 IBM Bug Proxy 2005-11-23 09:53:10 EST
---- Additional Comments From hariprasad@in.ibm.com  2005-11-23 09:48 EDT -------
I have placed cxia32142-20051121.tar.gz file on 
ftp://testcase.boulder.ibm.com/ns/fromibm/20421 folder.
.
Please download it and test it again.
.
Thanks
-Hari.. 
Comment 39 Steve Parkinson 2005-11-23 20:38:33 EST
I rerun the test with the SDK from comment 38. The core file is here:

-rw-------  1 root root 682446848 Nov 24 01:25 ibmjre_core.17150
-rw-------  1 root root   1955391 Nov 24 01:25 ibmjre_core.17150.gz

please download from here:
ftp://enterprise.redhat.com/incoming/ibmjre_core.17150.gz

I'm not sure how long it will stay there. If it is gone by the time you
try it, I will not be able to re-upload until next Monday.
Thanks
Comment 40 IBM Bug Proxy 2005-11-28 17:04:11 EST
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |OPEN




------- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-11-28 17:00 EDT -------
Hari is on vacation and I just came back from one so I downloaded the file and
uploaded the file to
ftp://testcase.boulder.ibm.com/linux/fromibm/20421/ibmjre_core.17150.gz
internally and sent a note to Hari\'s backup to help analyze it. 
Comment 41 IBM Bug Proxy 2005-12-05 05:15:24 EST
---- Additional Comments From hariprasad@in.ibm.com  2005-12-05 05:14 EDT -------
When we launch gdb on the new core file that was sent across, the modules do 
not get loaded as the core file seems to have been generated using :

./ns-httpd -d /opt/redhat-cs/admin-serv/config

as opposed to a java process. 

As a result we are unable to look at the stack trace. 
We expect the changes to prevent the crash and hence want to ensure that the 
older and the newer crash are the same. 

We\'d like to request the gdbtrace to have a look at the failure stack and 
understand if the earlier and the current crashes are the same. 
Comment 42 Steve Parkinson 2005-12-06 18:28:36 EST
The core file in comment 39 might have been bad - it wasn't necessarily running
with the identical set of binaries you had.

So, I have uploaded a new core:
ftp://enterprise.redhat.com/incoming/ibmjre-core.3243.gz


Yes, the core is from ns-httpd - as I stated in my first comment, the JRE is
being started from JNI:

  The admin server is a Netscape Enterprise Server webserver, which initializes
  a JVM in-process using the JNI_CreateJavaVM call to handle servlets

The ns-httpd binary is in bin/httpd/bin/

The stack trace from this core is:
(gdb) where
#0  0x06c80ee4 in search_cached_cc0 ()
   from /opt/fedora-ds/bin/base/fixed_jre/bin/libjitc.so
#1  0x06c54d0d in skip_handler0_check ()
   from /opt/fedora-ds/bin/base/fixed_jre/bin/libjitc.so
#2  0x06c54cc0 in ask_os0 ()
   from /opt/fedora-ds/bin/base/fixed_jre/bin/libjitc.so
#3  0x073eb1c4 in ?? ()
#4  0x073eb1b0 in ?? ()
#5  0x00000000 in ?? ()

Which to me looks the same as before (comment 34).


Also, let me elaborate how I installed the JRE this time. I simply copied the
jre subdirectory out from under the 'sdk' directory after untarring your
JRE - I copied it into the bin/base/fixedjre directory, and made a symlink
from bin/base/jre to bin/base/fixedjre. The only jre was moved out of the way.




Comment 43 IBM Bug Proxy 2005-12-12 10:42:04 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-12-12 10:38 EDT -------
Hari,

Is there something more we can obtain to move forward?

Steve,

I still really want to get this recreated here within IBM so Hari can get
whatever he needs without impacting your time. Would it be worth you simply
creating a tar.gz of your /opt/fedora-ds tree to install here in case there is
some minute difference that will show us the problem? Any problems doing that?
Would I have to tweek anything to make it work? 
Comment 44 Chandrasekar Kannan 2005-12-12 10:53:00 EST
Steve is gonna be busy in meetings all through this week. 
let me see what I can do. 

I do have the .tar.gz. Its about 165mb. trying to get it to our ftp site.
Comment 45 Chandrasekar Kannan 2005-12-12 17:02:18 EST
couldn't upload to enterprise.redhat.com. 
download from here - http://people.redhat.com/ckannan/fedora-ds.tar.gz

Comment 46 Chandrasekar Kannan 2005-12-12 17:03:48 EST
untar into /opt/ directory. also make sure your hostname is
"localhost.localdomain". and then try to start the admin server like this - 

/opt/fedora-ds/start-admin

Comment 47 IBM Bug Proxy 2005-12-13 09:42:15 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-12-13 09:38 EDT -------
I downloaded the tar.gz file. I\'ll give it a try today. Thanks. 
Comment 48 IBM Bug Proxy 2005-12-13 10:47:17 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-12-13 10:45 EDT -------
I renamed the original /opt/fedora-ds and unpacked the supplied one.

[root@lola fedora-ds]# uname -a
Linux lola.ltc.austin.ibm.com 2.6.9-22.0.1.ELsmp #1 SMP Tue Oct 18 18:39:02 EDT
2005 x86_64 x86_64 x86_64 GNU/Linux

[root@lola fedora-ds]# cat /proc/cmdline
ro root=/dev/VolGroup00/LogVol00 rhgb quiet noexec=on console=tty0

System has two  Intel(R) Xeon(TM) CPU 3.60GHz CPUs with HT enabled and NX flag
displays in flags for /proc/cpuinfo which lists 4 CPUs

./start-admin output:

Netscape-Enterprise/6.2 B04/18/2005 13:49
warning: daemon is running as super-user
[LS ls1] http://localhost.localdomain, port 1143 ready to accept requests
startup: server started successfully

Log shows:

[13/Dec/2005:09:33:46] info ( 3351): Installing a new configuration
[13/Dec/2005:09:33:46] info ( 3351): [LS ls1] http://localhost.localdomain, port
1143 ready to accept requests
[13/Dec/2005:09:33:46] info ( 3351): A new configuration was successfully installed
[13/Dec/2005:09:33:46] info ( 3351): Using the Classic VM v1.4.2 from IBM
Corporation

Unfortunetly, I don\'t think it failed so the difference must in hardware, timing
or somewhere else in the OS. Anything look different to y\'all from what I have
set up? 
Comment 49 Chandrasekar Kannan 2005-12-13 11:21:45 EST
just for comparison ... 

###########################################################################

[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.9-11.ELsmp #1 SMP Fri May 20 18:26:27 EDT 2005
i686 i686 i386 GNU/Linux
[root@localhost ~]# cat /proc/cmdline
ro root=LABEL=/ rhgb quiet
[root@localhost ~]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 3000.594
cache size      : 2048 KB
physical id     : 0
siblings        : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm pni monitor
ds_cpl cid xtpr
bogomips        : 5947.39

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 3000.594
cache size      : 2048 KB
physical id     : 0
siblings        : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm pni monitor
ds_cpl cid xtpr
bogomips        : 5996.54

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 3000.594
cache size      : 2048 KB
physical id     : 3
siblings        : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm pni monitor
ds_cpl cid xtpr
bogomips        : 5996.54

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 3000.594
cache size      : 2048 KB
physical id     : 3
siblings        : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm pni monitor
ds_cpl cid xtpr
bogomips        : 5996.54

[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# tail /opt/fedora-ds/admin-serv/logs/error
[06/Dec/2005:23:12:50] info ( 3243): A new configuration was successfully installed
[12/Dec/2005:15:45:46] info (16345): successful server startup
[12/Dec/2005:15:45:46] info (16345): Netscape-Enterprise/6.2 B04/18/2005 13:49
[12/Dec/2005:15:45:46] warning (16345): admin40_check_ds_availability_init():
WARNING: Configuration Directory Server is down or unreachable (Can't contact
LDAP server)
[12/Dec/2005:15:45:46] failure (16345): Warning! admin40_task_eval_init():
unable to set User/Group baseDN
[12/Dec/2005:15:45:46] info (16345): Access Host filter is: *.localdomain
[12/Dec/2005:15:45:46] info (16345): Access Address filter is: *
[12/Dec/2005:15:45:46] info (16346): Installing a new configuration
[12/Dec/2005:15:45:46] info (16346): [LS ls1] http://localhost.localdomain, port
1143 ready to accept requests
[12/Dec/2005:15:45:46] info (16346): A new configuration was successfully installed
[root@localhost ~]#

###########################################################################

You seem to be running a 64-bit operating system as reported by uname -a.
we are running a 32-bit OS. 

Can you try on the same hardware but with a 32-bit RHEL 4 ??



Comment 50 IBM Bug Proxy 2005-12-13 14:57:25 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-12-13 14:56 EDT -------
I tried just running the i686 kernel but that hung miserably so I\'ll re-install
the whole system using the i386 (not the x86_64) RHEL 4 U2 install CD though it
looks as if you are running U1? I gather this issue still exists with U2? 
Comment 51 Chandrasekar Kannan 2005-12-13 15:04:51 EST
yes , we are at rhel4 (32-bit ) - update 1

[root@localhost ~]# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 1)


and we have also seen the problem occur with - update 2
Comment 52 IBM Bug Proxy 2005-12-13 16:47:26 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-12-13 16:47 EDT -------
Finally! After re-installing with the i386 32-bit RHEL 4 U2 and not the x86_64
version I think I get the expected results. Looking at the
/opt/fedora-ds/admin-serv/logs/error, all I see is:

[13/Dec/2005:15:40:05] info ( 3452): successful server startup
[13/Dec/2005:15:40:05] info ( 3452): Netscape-Enterprise/6.2 B04/18/2005 13:49
[13/Dec/2005:15:40:05] warning ( 3452): admin40_check_ds_availability_init():
WARNING: Configuration Directory Server is down or unreachable (Can\'t contact
LDAP server)
[13/Dec/2005:15:40:05] failure ( 3452): Warning! admin40_task_eval_init():
unable to set User/Group baseDN[13/Dec/2005:15:40:05] info ( 3452): Access Host
filter is: *.localdomain
[13/Dec/2005:15:40:05] info ( 3452): Access Address filter is: *
[13/Dec/2005:15:40:05] info ( 3453): Installing a new configuration
[13/Dec/2005:15:40:05] info ( 3453): [LS ls1] http://localhost.localdomain, port
1143 ready to accept requests
[13/Dec/2005:15:40:05] info ( 3453): A new configuration was successfully installed

No messages about using the IBM JVM. I\'ll send an email to Hari to see if he can
access my system now. Thanks! 
Comment 53 Chandrasekar Kannan 2005-12-13 18:29:38 EST
thanks for the update!.
Comment 54 IBM Bug Proxy 2005-12-14 13:07:30 EST
---- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2005-12-14 13:04 EDT -------
Hari is on vacation until next Monday but his backup was able to access the
system but I have a question as to debugging this (at least with gdb). I made
the changes to start-admin to change the product bin and to start up gdb as
previously described. I then issue the following commands before issuing the run
command:

(gdb) set args -d /opt/fedora-ds/admin-serv/config
(gdb) set follow-fork-mode child

but I am still having \"issues\" with gdb following the fork. I can\'t seem to get
it to drop into gdb on the segfault. Any advise? 
Comment 55 Chandrasekar Kannan 2005-12-16 19:06:23 EST
we have an updated ns-httpd, libns-httpd.so that doesn't fork 
and you should be able to get it into gdb.

the stack trace that I obtained using this updated version is shown below:

######################################################################

[root@localhost fedora-ds]# ./start-admin
./ns-httpd -d /opt/fedora-ds/admin-serv/config
GNU gdb Red Hat Linux (6.3.0.0-0.31rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db
library "/lib/tls/libthread_db.so.1".

(gdb) run -d /opt/fedora-ds/admin-serv/config
Starting program: /opt/fedora-ds/bin/https/bin/ns-httpd -d
/opt/fedora-ds/admin-serv/config
[Thread debugging using libthread_db enabled]
[New Thread -1208117568 (LWP 9560)]
Netscape-Enterprise/6.2SP1 B11/02/2005 09:53
[New Thread 71007152 (LWP 9563)]
Detaching after fork from child process 9564.
Detaching after fork from child process 9565.
Detaching after fork from child process 9566.

Program received signal SIG33, Real-time event 33.
[Switching to Thread 71007152 (LWP 9563)]
0x009db7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) where
#0  0x009db7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00118cfc in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/tls/libpthread.so.0
#2  0x006f8c32 in pt_TimedWait () from ../lib/libnspr4.so
#3  0x006f8fbb in PR_WaitCondVar () from ../lib/libnspr4.so
#4  0x006fe46e in PR_Sleep () from ../lib/libnspr4.so
#5  0x00b483f6 in StatsRunningThread::run (this=0x9119120)
    at statsmanager.cpp:230
#6  0x006a9f57 in Thread::run_ () from ../lib/libnsprwrap.so
#7  0x006a9f8b in ThreadMain () from ../lib/libnsprwrap.so
#8  0x006fdbd3 in _pt_root () from ../lib/libnspr4.so
#9  0x00116341 in start_thread () from /lib/tls/libpthread.so.0
#10 0x00320fee in clone () from /lib/tls/libc.so.6
(gdb) info thread
* 2 Thread 71007152 (LWP 9563)  0x009db7a2 in _dl_sysinfo_int80 ()
   from /lib/ld-linux.so.2
  1 Thread -1208117568 (LWP 9560)  0x009db7a2 in _dl_sysinfo_int80 ()
   from /lib/ld-linux.so.2
(gdb) info threads
* 2 Thread 71007152 (LWP 9563)  0x009db7a2 in _dl_sysinfo_int80 ()
   from /lib/ld-linux.so.2
  1 Thread -1208117568 (LWP 9560)  0x009db7a2 in _dl_sysinfo_int80 ()
   from /lib/ld-linux.so.2
(gdb) cont
Continuing.
warning: daemon is running as super-user
[New Thread 34560944 (LWP 9567)]
[New Thread 45050800 (LWP 9568)]
[New Thread 102583216 (LWP 9569)]
[New Thread 55540656 (LWP 9570)]
[New Thread 85818288 (LWP 9571)]
[Thread 85818288 (LWP 9571) exited]
[New Thread 127716272 (LWP 9572)]
[New Thread 85818288 (LWP 9573)]
[LS ls1] http://localhost.localdomain, port 1143 ready to accept requests
[New Thread 113073072 (LWP 9574)]
[New Thread 145017776 (LWP 9575)]
[New Thread -1216349264 (LWP 9576)]
[New Thread 4967344 (LWP 9577)]
[Thread 4967344 (LWP 9577) exited]
[New Thread 5544880 (LWP 9578)]
[New Thread 5811120 (LWP 9579)]
[New Thread 6077360 (LWP 9580)]
[New Thread 7646128 (LWP 9581)]
[New Thread 9608112 (LWP 9582)]
[New Thread 9874352 (LWP 9583)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1216349264 (LWP 9576)]
0x054478d4 in skip_add_lock_7 ()
   from /opt/fedora-ds/bin/base/fixed_jre/bin/libjitc.so
(gdb) where
#0  0x054478d4 in skip_add_lock_7 ()
   from /opt/fedora-ds/bin/base/fixed_jre/bin/libjitc.so
#1  0x05211308 in replace_a_method (mb=0xb780731c,
    c_index=JavaUtilRandomClass, m_index=0)
    at /home/havenkat/cxia32142-20051121/src/jit/sov/lib/replace_methods.c:298
#2  0x051f048b in jit_setup_methods (cb=0xb6dcd428, ee=0x9190924)
    at /home/havenkat/cxia32142-20051121/src/jit/sov/xjit/jit_compctrl.c:982
#3  0x051f5b8f in _jitc_InitializeForCompiler (cb=0xb6dcd428,
    isInitializeLoadedClasses=1, hasBinLock=TRUE)
    at /home/havenkat/cxia32142-20051121/src/jit/sov/java_hook/jit_hookfuncs.c:102
#4  0x051f49d8 in InitializeClassforJIT (cb=0xb6dcd428, hasBinClassLock=TRUE)
    at
/home/havenkat/cxia32142-20051121/src/jit/sov/java_hook/jit_compiler_dllmain.c:273
#5  0x051f4af4 in visit (i=127, n=260)
    at
/home/havenkat/cxia32142-20051121/src/jit/sov/java_hook/jit_compiler_dllmain.c:319
#6  0x051f4d8c in InitializeLoadedClasses ()
    at
/home/havenkat/cxia32142-20051121/src/jit/sov/java_hook/jit_compiler_dllmain.c:393
#7  0x051f5892 in java_lang_Compiler_start (CompiledCodeLinkVector=0x47d624c)
    at
/home/havenkat/cxia32142-20051121/src/jit/sov/java_hook/jit_compiler_dllmain.c:1420
---Type <return> to continue, or q <return> to quit---
#8  0x0474bd8c in JVM_InitializeCompiler (env=0x9190924, compCls=0xb77fece0)
    at /userlvl/cxia32142/src/jvm/sov/xe/common/jit.c:1560
#9  0x0477ff3e in mmisInvoke_V_VHelper (o=0xb49846c8, mb=0xb789517c,
    args_size=0, ee=0x9190924, optop=0xb77fed00)
    at /userlvl/cxia32142/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_jni_invokers.c:420
#10 0x0474d97f in getee_end_13 ()
    at /userlvl/cxia32142/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_custom_invokers.s:3114
#11 0xb789517c in ?? ()
#12 0x0475755f in isq_doinvoke_V__ ()
    at /userlvl/cxia32142/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_execute0.s:30283
#13 0xb78955e0 in ?? ()
#14 0x0475755f in isq_doinvoke_V__ ()
    at /userlvl/cxia32142/obj/x86_linux_2/jvm/sov/xe/mmi/mmi_execute0.s:30283
#15 0x00000000 in ?? ()
(gdb)
(gdb)

[root@localhost opt]#

######################################################################


Will be uploading a tarball shortly to ftp://enterprise.redhat.com/incoming/ ..
Comment 56 Chandrasekar Kannan 2005-12-16 19:24:45 EST
Ok. download from here ftp://enterprise.redhat.com/incoming/fedora-ds-1.tar.gz

Comment 57 IBM Bug Proxy 2005-12-20 05:16:03 EST
---- Additional Comments From hariprasad@in.ibm.com  2005-12-20 05:12 EDT -------
I am able to access the machine and could see the following message in the 
error file.
---------------------------------------------------------------------------
[20/Dec/2005:04:07:24] info ( 7833): Netscape-Enterprise/6.2 B04/18/2005 13:49
[20/Dec/2005:04:07:24] warning ( 7833): admin40_check_ds_availability_init(): 
WARNING: Configuration Directory Server is down or unreachable (Can\'t contact 
LDAP server)
[20/Dec/2005:04:07:24] failure ( 7833): Warning! admin40_task_eval_init(): 
unable to set User/Group baseDN
[20/Dec/2005:04:07:24] info ( 7833): Access Host filter is: *.localdomain
[20/Dec/2005:04:07:24] info ( 7833): Access Address filter is: *
[20/Dec/2005:04:07:28] info ( 7834): Installing a new configuration
[20/Dec/2005:04:07:28] info ( 7834): [LS ls1] http://localhost.localdomain, 
port 1143 ready to accept requests
[20/Dec/2005:04:07:28] info ( 7834): A new configuration was successfully 
installed
------------------------------------------------------------------------------
Please do let me know if this is an error.
.
I tried using the gdb version of the star-admin script..,however I didnot 
receive any SIGSEGV..
.
gdb says program exited normally.
.
Thanks
-Hari.. 
Comment 58 Chandrasekar Kannan 2005-12-20 10:02:03 EST
are you using the new tarball (
ftp://enterprise.redhat.com/incoming/fedora-ds-1.tar.gz ) that I sent ? 

if jvm initialization were successfull then you should see this message "Using
the Classic VM v1.4.2 from IBM Corporation". 
Comment 59 IBM Bug Proxy 2005-12-21 01:49:11 EST
---- Additional Comments From hariprasad@in.ibm.com  2005-12-21 01:46 EDT -------
Yes,with noexec set off.. I could see the following statements...
-----------------------------------------------------------------------------
[21/Dec/2005:00:42:43] failure ( 3143): Warning! admin40_task_eval_init(): 
unable to set User/Group baseDN
[21/Dec/2005:00:42:43] info ( 3143): Access Host filter is: *.localdomain
[21/Dec/2005:00:42:43] info ( 3143): Access Address filter is: *
[21/Dec/2005:00:42:43] info ( 3144): Installing a new configuration
[21/Dec/2005:00:42:43] info ( 3144): [LS ls1] http://localhost.localdomain, 
port 1143 ready to accept requests
[21/Dec/2005:00:42:43] info ( 3144): A new configuration was successfully 
installed
[21/Dec/2005:00:42:43] info ( 3144): Using the Classic VM v1.4.2 from IBM 
Corporation
[21/Dec/2005:00:42:43] info ( 3144): Java VM classpath: /opt/fedora-
.
.
.
.......sessions is 1000
---------------------------------------------------------------------------
I will proceed further. 
Comment 60 IBM Bug Proxy 2005-12-21 09:49:07 EST
---- Additional Comments From hariprasad@in.ibm.com  2005-12-21 09:45 EDT -------
I am seeing the same problem even after providing execute permission to all the 
binaries.
.
Same case with the new tarball provided.
. 
Comment 61 Chandrasekar Kannan 2005-12-21 10:22:23 EST
Are you saying that you cannot reproduce the original problem ?. 

See comment #52 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=172417#c52
where it looks like you guys were able to see the problem.

Comment 62 IBM Bug Proxy 2005-12-22 00:39:14 EST
---- Additional Comments From hariprasad@in.ibm.com  2005-12-22 00:38 EDT -------
Oops...
I am sorry, if my updates are not clear...
I will try to summarize it again.
.
- with noexec=off
 1) I am able to see the IBM java is being used,I can see the statement 
    (Using the Classic VM v1.4.2 from IBM Corporation)
- noexec=on
  1) I am not seeing statement,hence concluding it as a FAILURE.
  2) Even after providing execute permission to all the binaries, still we are  
not seeing the above statement-----> CONFUSING.However, I will ignore this and 
work on FAILURE condition.
.
Thanks
-Hari.. 
Comment 63 IBM Bug Proxy 2005-12-26 01:41:34 EST
---- Additional Comments From hariprasad@in.ibm.com  2005-12-26 01:41 EDT -------
I have asked our test team to run our scripts on this machine,
.
This just to ensure that outside httpd.. JVM works.
.
Thanks
-Hari.. 
Comment 64 IBM Bug Proxy 2005-12-26 08:01:45 EST
---- Additional Comments From hariprasad@in.ibm.com  2005-12-26 08:01 EDT -------
I see that our stress test is passing  and it is running fine.
.
I don\'t see any java testcase is failing.
.
it will great help to diagnoise if get information on how to use our debug 
build with httpd.
. 
Comment 65 IBM Bug Proxy 2005-12-28 06:06:38 EST
---- Additional Comments From hariprasad@in.ibm.com  2005-12-28 06:09 EDT -------
I could able to get the stack trace for the failing scenario..
#0  0xb6226f34 in search_cached_cc0 () from /root/sdk/jre/bin/libjitc.so
#1  0xb610b9ed in skip_handler0_check () at rtx86catch_.s:1943
#2  0xb610b9a0 in ask_os0 () at rtx86catch_.s:1943
#3  0x0804c1c4 in ?? ()
#4  0x0804c1b0 in ?? ()
#5  0x00000000 in ?? () 
Comment 66 IBM Bug Proxy 2006-01-02 03:41:24 EST
---- Additional Comments From hariprasad@in.ibm.com  2006-01-02 03:43 EDT -------
I see that JVM is crashing while we try to access a location from the Stack 
pointer.
.
mov eax,[esp+4] is the failing instruction.
.
I am understanding that even the reading from the stack pointer is cauing the 
crashing.
, 
Comment 67 IBM Bug Proxy 2006-01-03 08:56:25 EST
---- Additional Comments From hariprasad@in.ibm.com  2006-01-03 08:57 EDT -------
I have placed planned fix onto the following path:-
ftp://javaserv.hursley.ibm.com/pmrs/20421/sdk.tar.gz
.
Please use the following credentials to login.
UID:- anonymous
password :- ident
.
I have tested the fix locally,also I have opened a 
defect 99066 to integrate the code chages.
.
Note:- external customer won\'t be able to access the above ftp site.
LTC should make it available to RH team.
.
Thanks
-Hari.. 
Comment 68 IBM Bug Proxy 2006-01-03 12:16:28 EST
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |FIXEDAWAITINGTEST
         Resolution|                            |FIX_BY_IBM




------- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2006-01-03 12:15 EDT -------
I uploaded the test fix to an external ftp server. It can be downloaded with
wget (or whatever ftp client software) as:

wget testcase.software.ibm.com:/linux/fromibm/RH172417/sdk.tar.gz

It\'ll only be out there for at most 3 or 4 working days so please try to
download it soon.

Hari,

Thanks for the help! 
Comment 69 Chandrasekar Kannan 2006-01-03 15:51:54 EST
Steve, 

I have this downloaded on 10.14.1.29:/export/ckannan/downloads/sdk.tar.gz
When you get time, please try this out.

thanks,
--Chandra

Comment 70 Chandrasekar Kannan 2006-01-10 17:31:25 EST
I tried this today. the start-admin command hangs after starting up... but it
looks like JVM has initialized ..


[root@localhost fedora-ds]# ./start-admin
Netscape-Enterprise/6.2SP1 B11/02/2005 09:53
warning: daemon is running as super-user
[LS ls1] http://localhost.localdomain, port 1143 ready to accept requests
startup: server started successfully


I expected start-admin to return to the prompt/command-line. But it doesn't. 

From the admin-server logs ...

========================================
/Jan/2006:22:22:07] failure (23420): Warning! admin40_task_eval_init(): unable
to set User/Group baseDN
[10/Jan/2006:22:22:07] info (23420): Access Host filter is: *.localdomain
[10/Jan/2006:22:22:07] info (23420): Access Address filter is: *
[10/Jan/2006:22:22:07] info (23420): Installing a new configuration
[10/Jan/2006:22:22:07] info (23420): [LS ls1] http://localhost.localdomain, port
1143 ready to accept requests
[10/Jan/2006:22:22:07] info (23420): A new configuration was successfully installed
[10/Jan/2006:22:22:08] info (23420): Using the Classic VM v1.4.2 from IBM
Corporation
[10/Jan/2006:22:22:08] info (23420): Java VM classpath:
/opt/fedora-ds/bin/https/jar/NSServletLayer.jar:/opt/fedora-ds/bin/https/jar/NSJavaUtil.jar:/opt/fedora-ds/bin/https/jar/NSJavaMiscUtil.jar:/opt/fedora-ds/bin/https/jar/servlet.jar:/opt/fedora-ds/bin/https/jar/servlet-2.3-filters-api.jar:/opt/fedora-ds/bin/https/jar/jaxp.jar:/opt/fedora-ds/bin/https/jar/crimson.jar:/opt/fedora-ds/bin/https/jar/xalan.jar:/opt/fedora-ds/bin/https/jar/jspengine.jar:/opt/fedora-ds/bin/https/jar/jakarta-naming.jar:/opt/fedora-ds/java/ldapjdk.jar:/opt/fedora-ds/java/jss3.jar:
[10/Jan/2006:22:22:08] info (23420): Loading IWSSessionManager by default.
[10/Jan/2006:22:22:08] info (23420): IWSSessionManager: Maximum number of
sessions is 1000

========================================

running strace shows this ...

============================================
...

clone(child_stack=0x983784c4,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED,
parent_tidptr=0x98378bf8, {entry_number:6, base_addr:0x98378bb0, limit:1048575,
seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0,
useable:1}, child_tidptr=0x98378bf8) = 23380
sched_setscheduler(23380, SCHED_OTHER, { 0 }) = 0
futex(0x98378d94, FUTEX_WAKE, 1)        = 1
gettimeofday({1136960325, 753743}, NULL) = 0
mprotect(0xb4b95000, 4096, PROT_READ|PROT_WRITE) = 0
mmap2(NULL, 10489856, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x96f77000
mprotect(0x96f77000, 4096, PROT_NONE)   = 0
sched_get_priority_min(SCHED_OTHER)     = 0
sched_get_priority_max(SCHED_OTHER)     = 0
clone(child_stack=0x979774c4,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED,
parent_tidptr=0x97977bf8, {entry_number:6, base_addr:0x97977bb0, limit:1048575,
seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0,
useable:1}, child_tidptr=0x97977bf8) = 23381
sched_setscheduler(23381, SCHED_OTHER, { 0 }) = 0
futex(0x97977d94, FUTEX_WAKE, 1)        = 1
gettimeofday({1136960325, 754255}, NULL) = 0
mprotect(0xb4b96000, 4096, PROT_READ|PROT_WRITE) = 0
mmap2(NULL, 10489856, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x96576000
mprotect(0x96576000, 4096, PROT_NONE)   = 0
sched_get_priority_min(SCHED_OTHER)     = 0
sched_get_priority_max(SCHED_OTHER)     = 0
clone(child_stack=0x96f764c4,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED,
parent_tidptr=0x96f76bf8, {entry_number:6, base_addr:0x96f76bb0, limit:1048575,
seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0,
useable:1}, child_tidptr=0x96f76bf8) = 23382
sched_setscheduler(23382, SCHED_OTHER, { 0 }) = 0
futex(0x96f76d94, FUTEX_WAKE, 1)        = 1
gettimeofday({1136960325, 754754}, NULL) = 0
mprotect(0xb4b97000, 4096, PROT_READ|PROT_WRITE) = 0
mmap2(NULL, 10489856, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x95b75000
mprotect(0x95b75000, 4096, PROT_NONE)   = 0
sched_get_priority_min(SCHED_OTHER)     = 0
sched_get_priority_max(SCHED_OTHER)     = 0
clone(child_stack=0x965754c4,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED,
parent_tidptr=0x96575bf8, {entry_number:6, base_addr:0x96575bb0, limit:1048575,
seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0,
useable:1}, child_tidptr=0x96575bf8) = 23383
sched_setscheduler(23383, SCHED_OTHER, { 0 }) = 0
futex(0x96575d94, FUTEX_WAKE, 1)        = 1
write(2, "startup: server started successf"..., 36startup: server started
successfully) = 36
write(2, "\n", 1
)                       = 1
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024}) = 0
futex(0x82ee2e0, FUTEX_WAIT, 1, NULL

============================================
Any clues ... ?

Comment 71 Chandrasekar Kannan 2006-01-10 18:20:03 EST
Sorry for that little hiccup. 

Steve told me that the ns-httpd in my install
was actually patched for some other purpose. 

So, I removed that installation of fedora-ds and 
re-installed it. applied this jre fix. 

I can now confirm that the admin-server starts up fine. I do see
that the IBM JRE 1.4.2 (JVM) is being initialized. and I can
access the admin server just fine.



Comment 72 Chandrasekar Kannan 2006-01-12 21:10:09 EST
I guess the next question is ... when can we get an official build of this
from IBM so that we can give it to our customer ?.

Comment 73 IBM Bug Proxy 2006-01-17 03:56:15 EST
---- Additional Comments From hariprasad@in.ibm.com  2006-01-17 03:59 EDT -------
Since release of SR4 is on targetted to published in this month or early Feb 
2006, hence it is not possible to provide it in Sr4,However it will be made 
available on SR5.which is tentatively scheduled for April 2006. 
Comment 74 IBM Bug Proxy 2006-05-26 09:17:21 EDT
----- Additional Comments From chavez@us.ibm.com(prefers email via lnx1138@us.ibm.com)  2006-05-26 09:21 EDT -------
Java 1.4.2 SR5 is now available for download via
http://www-128.ibm.com/developerworks/java/jdk/linux/download.html

As I understand the fix was integrated in this release. Can you please verify it
so we can close the bug? Thanks! 
Comment 75 IBM Bug Proxy 2006-06-21 15:16:58 EDT
----- Additional Comments From markwiz@us.ibm.com  2006-06-21 15:22 EDT -------
Steve Parkinson at Red Hat,
have you tested or has the customer tested  the new Java pointed to on 5/26?

Can we close this bug? 
Comment 76 IBM Bug Proxy 2006-09-14 23:51:13 EDT
----- Additional Comments From chavez@us.ibm.com (prefers email at lnx1138@us.ibm.com)  2006-09-14 23:49 EDT -------
IBM Java 1.4.2 SR6 is now available for download from
http://www-128.ibm.com/developerworks/java/jdk/linux/download.html . 

Can the submittor of the RH bug please verify the problem no longer exists with
SR6 so we can close? 
Comment 77 IBM Bug Proxy 2006-09-26 15:26:01 EDT
----- Additional Comments From chavez@us.ibm.com (prefers email at lnx1138@us.ibm.com)  2006-09-26 15:24 EDT -------
YARFSTVF (Yet Another Request For Submittor To Very Fix) 
Comment 78 IBM Bug Proxy 2006-09-29 08:00:48 EDT
----- Additional Comments From chavez@us.ibm.com (prefers email at lnx1138@us.ibm.com)  2006-09-29 07:55 EDT -------
I should have been clear when I meant submittor I meant Steve Parkinson at Red Hat. 
Comment 79 Thomas Fitzsimmons 2006-10-30 15:47:32 EST
I've sent an email to Steve, asking him to confirm that
java-1.4.2-ibm-1.4.2.6-1jpp.2.el4 fixes the problem he reported here.  Proposing
this for RHEL-4.5.
Comment 80 RHEL Product and Program Management 2006-10-30 16:06:41 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 81 Steve Parkinson 2006-10-30 17:19:20 EST
My apologies for missing this.

I discussed this with Chandra, and we're not sure if we tested this specific
build. So, we're going to re-run the test when we get some spare cycles during
this week. I'm not sure of the schedule for 4.5 - I hope this won't cause
additional delay.

Comment 83 IBM Bug Proxy 2007-01-26 10:30:47 EST
----- Additional Comments From chavez@us.ibm.com (prefers email at lnx1138@us.ibm.com)  2007-01-26 10:24 EDT -------
SR7 is now the latest IBM Java 1.4.2. If you can verify this soon so we can
finally close this out, I would appreciate it. Bug is getting really old now.
Thanks. 
Comment 84 IBM Bug Proxy 2007-02-28 10:36:06 EST
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ACCEPTED                    |CLOSED
             Impact|------                      |Functionality




------- Additional Comments From chavez@us.ibm.com (prefers email at lnx1138@us.ibm.com)  2007-02-28 10:28 EDT -------
java-1.4.2-ibm-1.4.2.7-1jpp.4.el4.i386.rpm is on the LACD for RHEL 4.5 Beta. 
Closing... 

Note You need to log in before you can comment on or make changes to this bug.