Bug 50989 - Segmentation fault in pthread_alt_lock
Summary: Segmentation fault in pthread_alt_lock
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: glibc
Version: 7.1
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Aaron Brown
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-08-06 05:51 UTC by Martin Strassburger
Modified: 2016-11-24 15:12 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2001-08-15 11:42:56 UTC
Embargoed:


Attachments (Terms of Use)

Description Martin Strassburger 2001-08-06 05:51:33 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.2-2 i686)

Description of problem:
During the tests of SAP's latest Business Application Server on RedHat 7.1 
it occurs that the app aborts with segmentation fault. The executable is
compiled
and linked on RH 6.1.

How reproducible:
Sometimes

Steps to Reproduce:
Core was generated by `dw.sapWAS_DVEBMGS18
pf=/usr/sap/WAS/SYS/profile/WAS_DVEBMGS18_lwsp700'. 
Program terminated with signal 11, Segmentation fault. 
Reading symbols from /lib/libdl.so.2...done. 
Loaded symbols for /lib/libdl.so.2 
Reading symbols from /lib/i686/libpthread.so.0...done. 

warning: Unable to set global thread event mask: generic error 
[New Thread 1024 (LWP 11580)] 
Error while reading shared library symbols: 
Cannot enable thread event reporting for Thread 1024 (LWP 11580): generic
error Reading symbols from
/usr/sap/WAS/SYS/exe/run/dw_xml.so...done. 
Loaded symbols for /usr/sap/WAS/SYS/exe/run/dw_xml.so 
Reading symbols from /usr/sap/WAS/SYS/exe/run/dw_xtc.so...done. 
Loaded symbols for /usr/sap/WAS/SYS/exe/run/dw_xtc.so 
Reading symbols from /usr/sap/WAS/SYS/exe/run/dw_stl.so...done. 
Loaded symbols for /usr/sap/WAS/SYS/exe/run/dw_stl.so 
Reading symbols from /usr/lib/libstdc++-libc6.1-2.so.3...done. 
Loaded symbols for /usr/lib/libstdc++-libc6.1-2.so.3 
Reading symbols from /lib/i686/libm.so.6...done. 
Loaded symbols for /lib/i686/libm.so.6 
Reading symbols from /lib/i686/libc.so.6...done. 
Loaded symbols for /lib/i686/libc.so.6 
Reading symbols from /lib/ld-linux.so.2...done. 
Loaded symbols for /lib/ld-linux.so.2 
#0  __pthread_alt_lock (lock=0x40869124, self=0x0) at spinlock.c:407 
407     spinlock.c: No such file or directory. 
        in spinlock.c 
(gdb) bt 
#0  __pthread_alt_lock (lock=0x40869124, self=0x0) at spinlock.c:407 
#1  0x40028c86 in __pthread_mutex_lock (mutex=0x40869114) at mutex.c:120 
#2  0x40862e31 in __register_frame_info (begin=0x402a3d80, ob=0x402d5cc0) 
    at ../../gcc/frame.c:627 
#3  0x400c52d2 in _init () from /usr/sap/WAS/SYS/exe/run/dw_xml.so 
#4  0x400c0131 in _init () from /usr/sap/WAS/SYS/exe/run/dw_xml.so 
#5  0x4000df57 in _dl_init () at eval.c:41 
(gdb) [wasadm@lwsp700 work]$ 


Additional info:

The same executable is running without Problems on SuSE 7.2.
Both distributions claim to have glibc 2.2.2.

Comment 1 Jakub Jelinek 2001-08-07 08:57:59 UTC
Can you try running it with LD_ASSUME_KERNEL=2.2.5 in the environment?
AFAIK SuSE does not enable floating thread stacks (the above environ variable
disables them in RHL 7.1).
Can you attach strace log?
Also, can you run it under debugger and see what %gs register contains when
it crashes?

Comment 2 Martin Strassburger 2001-08-08 05:47:39 UTC
I started the problematic executable with strace -ff -o trace <command> but got
no output files (I noticed a strange behaviour of the threads I can not
explain).
Running with gdb does not show the segmentation fault.
Setting the environment variable LD_ASSUME_KERNEL seems to be a workaround. I
did not see the segmentation fault any more.
Thanks for the workaround.


Comment 3 Jakub Jelinek 2001-08-08 15:26:14 UTC
Can you try LD_DEBUG=all to see if it really crashes before
transferring control: foobarbaz
line? If yes, would it be possible to pack the main binary and its
DT_NEEDED libraries somewhere, so that I could check it out myself?

Comment 4 Martin Strassburger 2001-08-09 12:24:53 UTC
I started the application with LD_DEBUG=all (without LD_ASSUME_KERNEL), the
segmenation fault does not occur. An output of ~120MB was generated (I do not
think your are interessted in).
To reproduce the error by yourself it is necessary to install the SAP
Application Server Software (several executables need to run in parallel), the
SAPDB 7.3 Database Software and to create an initial database with SAP Business
Software in it (because the executales do not run without db connect). During
the initialization of the SAP System and without LD_ASSUME_KERNEL in the
environment the application server stops with seg fault (reproduceable, but only
in initialization phase). For replaying you need ~750MB of compressed apps and
db data. I can not imagine you are really interessed in that stuff.
I downloaded 7.1.93 (roswell) and replayed the initialization without
LD_ASSUME_KERNEL - no segmentation fault. There seems to be change in glibc
2.2.2 -> glibc 2.2.3 that lets the problem disappear.

Comment 5 Martin Strassburger 2001-08-14 13:36:48 UTC
I currently tried to create a db2 database (DB/2 Version 7.1). The database
tools crashed with the same behaviour:
[root@lwsp700 install]# gdb /usr/IBMdb2/V7.1/instance/db2icknm core
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
Core was generated by `/usr/IBMdb2/V7.1/instance/db2icknm db2xxx dbxxxadm'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/IBMdb2/V7.1/lib/libdb2.so.1...done.
Loaded symbols for /usr/IBMdb2/V7.1/lib/libdb2.so.1
Reading symbols from /usr/lib/libstdc++-libc6.1-1.so.2...done.
Loaded symbols for /usr/lib/libstdc++-libc6.1-1.so.2
Reading symbols from /lib/i686/libm.so.6...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/i686/libpthread.so.0...done.

warning: Unable to set global thread event mask: generic error
[New Thread 1024 (LWP 17864)]
Error while reading shared library symbols:
Cannot enable thread event reporting for Thread 1024 (LWP 17864): generic error
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
#0  __pthread_alt_lock (lock=0x40a58e58, self=0x0) at spinlock.c:407
407     spinlock.c: No such file or directory.
        in spinlock.c
(gdb) bt
#0  __pthread_alt_lock (lock=0x40a58e58, self=0x0) at spinlock.c:407
#1  0x40a4ec86 in __pthread_mutex_lock (mutex=0x40a58e48) at mutex.c:120
#2  0x40a4f162 in __pthread_atfork (prepare=0x40962864 <ptmalloc_lock_all>,
    parent=0x40966f80 <ptmalloc_unlock_all>,
    child=0x40967070 <ptmalloc_init_all>) at ptfork.c:60
#3  0x40962a6d in ptmalloc_init () at malloc.c:1716
#4  0x40967184 in malloc_hook_ini (sz=24, caller=0x40a4f13e) at malloc.c:1765
#5  0x4096309d in __libc_malloc (bytes=24) at malloc.c:2701
#6  0x40a4f13e in __pthread_atfork (prepare=0, parent=0,
    child=0x404eafc0 <sqlo_child_reset_pid(void)>) at ptfork.c:57
#7  0x404eb012 in sqlo_init_pid () at eval.c:41
#8  0x404eb054 in global constructors keyed to waste_time () at eval.c:41
#9  0x4055ee34 in __do_global_ctors_aux () at eval.c:41
#10 0x400e4dc6 in _init () at eval.c:41
#11 0x4000df57 in _dl_init () at eval.c:41

Probably it is easier for you to reproduce the error in a not so complex
environment like SAP Systems.



Comment 6 Martin Strassburger 2001-08-14 13:37:50 UTC
The workaround with LD_ASSUME_KERNEL=2.2.5 works.

Comment 7 Jakub Jelinek 2001-08-14 13:43:24 UTC
Can you try 2.2.8 kernel or at least the changes to
arch/i386/kernel/{ldt,process}.c and include/asm-i386/{mmu,mmu_context}.h?
It contains an important SMP LDT handling fix.

Comment 8 Martin Strassburger 2001-08-15 05:41:22 UTC
I suppose you meen kernel 2.4.8 (as far as I know no 2.2.x kernel is supported
on RH 7.1).
I downloaded 2.4.8 (Linus version) and installed it. The problem did not occur.

Comment 9 Jakub Jelinek 2001-08-15 07:47:30 UTC
Typo, sorry.
If you could test ftp://people.redhat.com/jakub/kernel/
2.4.7-2 too, it would be great, but I hope it is this patch which matters.

Comment 10 Martin Strassburger 2001-08-15 11:42:50 UTC
I upgraded my RH 7.1 with kernel 2.4.7-2enterprise ( including required mkinitrd
and e2fsprogras I got with roswell).
I restarted a complete SAP installation without problems. The core dump problem
did not occur.



Note You need to log in before you can comment on or make changes to this bug.