Bug 115867

Summary: Seg fault in threaded application during dl_sysinfo_int80
Product: Red Hat Enterprise Linux 3 Reporter: Scott Christley <schristley>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED WORKSFORME QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-08-26 05:23:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Scott Christley 2004-02-16 18:50:05 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030922

Description of problem:
I am working with GNUstep and Swarm which are ObjectiveC libraries, my
particular application runs fine on RedHat v9.0 (glibc-2.3.2-5) on a
PIII machine I have, but I got a new dual Xeon and put AS v3.0 on it.
 Now my application is crashing with a segmentation fault.

In both cases, I'm using gcc 3.2.2, the latest CVS release of GNUstep
and a development version of Swarm.  Of particular note is that both
GNUstep and Swarm use the ffcall library (ffcall-1.8d) to perform
dynamic invocation of methods; however, I don't think that is an issue
in this bug because I don't believe ffcall is actually used within my
application.

So my application has two threads, one is running the GNUstep gui
while the second thread is running the Swarm simulation.  The crash is
not deterministic; it will crash in different parts of the application
which lends me to believe its related to the threads.  What is
consistent though is that one of the threads is in the
dl_sysinfo_int80 routine while the other thread is doing some
unrelated code.  Sometimes it crashes with a seg fault and sometimes
it just hangs.

Here is the stack trace of the two threads for a hanging example:

(gdb) info threads
  2 Thread -1237058640 (LWP 9487)  0xb75ebc32 in _dl_sysinfo_int80 ()
   from /lib/ld-linux.so.2
* 1 Thread -1230684032 (LWP 9484)  0xb72c318c in objc_msg_lookup (
    receiver=0x8331388, op=0xb6d786c0) at sendmsg.c:167

(gdb) bt
#0  0xb72c318c in objc_msg_lookup (receiver=0x8331388, op=0xb6d786c0)
    at sendmsg.c:167
#1  0xb6d57dce in -[Index(any) _findNext:] (self=0x8331388,
_cmd=0xb6d7ba98, 
    anObject=0x8379c60) at Collection.m:354
#2  0xb6d5fa16 in -[List(mlinks) _addLast:] (self=0x8127050,
_cmd=0xb6dd7460, 
    anObject=0x8379c60) at List_GEN.m:150
#3  0xb6daf542 in -[Zone(c) _allocIVars:] (self=0x8126ff8,
_cmd=0xb6d7b108, 
    aClass=0xb6d7af00) at Zone.m:191
#4  0xb6d5e7e4 in -[List(linked) _begin:] (self=0xb6d786c0,
_cmd=0xb6d78438, 
    aZone=0x8126ff8) at List_GEN.m:243

(gdb) thread 2
[Switching to thread 2 (Thread -1237058640 (LWP 9487))]#0  0xb75ebc32
in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0  0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0xb70281ee in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/libpthread.so.0
#2  0xb72c3dac in __objc_condition_wait (condition=0xfffffffc, 
    mutex=0xfffffffc) at gthr-posix.h:408
#3  0xb72c3770 in objc_condition_wait (condition=0xb5900780,
mutex=0xb5900790)
    at thr.c:493
#4  0xb7165ef2 in -[NSConditionLock lockWhenCondition:] ()
   from
/usr/local/gnustep/3.2.2/System/Library/Libraries/libgnustep-base.so.1

Here is an example of seg fault

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1237058640 (LWP 9493)]
0xb6d5fa5a in -[List(mlinks) _addLast:] (self=0x8127050, _cmd=0xb6dd7460, 
    anObject=0xb59009c0) at List_GEN.m:162
162           firstLink->prevLink->nextLink = newLink;
(gdb) info threads
* 2 Thread -1237058640 (LWP 9493)  0xb6d5fa5a in -[List(mlinks)
_addLast:] (
    self=0x8127050, _cmd=0xb6dd7460, anObject=0xb59009c0) at
List_GEN.m:162
  1 Thread -1230684032 (LWP 9492)  0xb75ebc32 in _dl_sysinfo_int80 ()
   from /lib/ld-linux.so.2

(gdb) bt
#0  0xb6d5fa5a in -[List(mlinks) _addLast:] (self=0x8127050,
_cmd=0xb6dd7460, 
    anObject=0xb59009c0) at List_GEN.m:162
#1  0xb6daf542 in -[Zone(c) _allocIVars:] (self=0x8126ff8,
_cmd=0xb6d7ce58, 
    aClass=0xb6d7c680) at Zone.m:191
#2  0xb6d635e4 in -[Map(c) _begin:] (self=0xb5900480, _cmd=0xb6e0fda8, 
    aZone=0x8126ff8) at Map.m:649
#3  0xb6e00995 in removeObsoleteMerges (currentIndex=0xb6e0fda8)
    at XActivity.m:120
#4  0xb6e00ae3 in -[Activity(c) __run:] (self=0x83c5e30, _cmd=0xb6e0fd98)
    at XActivity.m:240

(gdb) thread 1
[Switching to thread 1 (Thread -1230684032 (LWP 9492))]#0  0xb75ebc32
in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0  0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0xb702a8eb in __write_nocancel () from /lib/tls/libpthread.so.0
#2  0xb6bdf7c0 in XUnlockDisplay () from /usr/X11R6/lib/libX11.so.6
#3  0xb6be03ff in _X11TransWrite () from /usr/X11R6/lib/libX11.so.6
#4  0xb6bc0022 in _XFlush () from /usr/X11R6/lib/libX11.so.6
#5  0xb6bbfea9 in _XFlush () from /usr/X11R6/lib/libX11.so.6
#6  0xb6ba6368 in XFillRectangle () from /usr/X11R6/lib/libX11.so.6


I'm not sure how I could provide a test case because I don't know what
dl_sysinfo_int80 is doing.

There maybe I have some non-thread safety code somewhere, so I am
continuing to investigate.



Version-Release number of selected component (if applicable):
glibc-2.3.2-95.6

How reproducible:
Always

Steps to Reproduce:
1. openapp cCulture.app
2. select new simulation
3. select start
    

Additional info:

Comment 1 Scott Christley 2004-02-17 14:22:46 UTC
Swarm has extensive use of nested functions; i.e. functions which are
defined only within the scope of a method.  I seem to have heard of
issues with nested functions, not sure if it was related to specific
version of gcc or architectures; maybe this is part of the problem?


Comment 2 Jakub Jelinek 2004-02-17 14:31:10 UTC
Crash in _dl_sysinfo_int80 means crash in some syscall, _dl_sysinfo_int80
is the int $0x80 instruction which enters the kernel.
You can try to see if strace (or strace -f) reveals some bad arguments
passed to the kernel.
Certainly from the backtraces I see no reason to suspect glibc, unless
you manage to create a small self-contained testcase which points to a glibc bug.
As for nested functions, they are usually implemented with trampolines
on the stack, so they don't work with non-executable stack unless
the binary/library has appropriate PT_GNU_STACK PF_R|PF_W|PF_X program
header (or no PT_GNU_STACK header at all).  But stack is executable
on RHEL3, only on FC1 it is not, so this certainly shouldn't be an issue.

Comment 3 Ulrich Drepper 2004-08-26 05:23:52 UTC
No reply in 6 months.  Closing.