From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030922 Description of problem: I am working with GNUstep and Swarm which are ObjectiveC libraries, my particular application runs fine on RedHat v9.0 (glibc-2.3.2-5) on a PIII machine I have, but I got a new dual Xeon and put AS v3.0 on it. Now my application is crashing with a segmentation fault. In both cases, I'm using gcc 3.2.2, the latest CVS release of GNUstep and a development version of Swarm. Of particular note is that both GNUstep and Swarm use the ffcall library (ffcall-1.8d) to perform dynamic invocation of methods; however, I don't think that is an issue in this bug because I don't believe ffcall is actually used within my application. So my application has two threads, one is running the GNUstep gui while the second thread is running the Swarm simulation. The crash is not deterministic; it will crash in different parts of the application which lends me to believe its related to the threads. What is consistent though is that one of the threads is in the dl_sysinfo_int80 routine while the other thread is doing some unrelated code. Sometimes it crashes with a seg fault and sometimes it just hangs. Here is the stack trace of the two threads for a hanging example: (gdb) info threads 2 Thread -1237058640 (LWP 9487) 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 * 1 Thread -1230684032 (LWP 9484) 0xb72c318c in objc_msg_lookup ( receiver=0x8331388, op=0xb6d786c0) at sendmsg.c:167 (gdb) bt #0 0xb72c318c in objc_msg_lookup (receiver=0x8331388, op=0xb6d786c0) at sendmsg.c:167 #1 0xb6d57dce in -[Index(any) _findNext:] (self=0x8331388, _cmd=0xb6d7ba98, anObject=0x8379c60) at Collection.m:354 #2 0xb6d5fa16 in -[List(mlinks) _addLast:] (self=0x8127050, _cmd=0xb6dd7460, anObject=0x8379c60) at List_GEN.m:150 #3 0xb6daf542 in -[Zone(c) _allocIVars:] (self=0x8126ff8, _cmd=0xb6d7b108, aClass=0xb6d7af00) at Zone.m:191 #4 0xb6d5e7e4 in -[List(linked) _begin:] (self=0xb6d786c0, _cmd=0xb6d78438, aZone=0x8126ff8) at List_GEN.m:243 (gdb) thread 2 [Switching to thread 2 (Thread -1237058640 (LWP 9487))]#0 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) bt #0 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0xb70281ee in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0xb72c3dac in __objc_condition_wait (condition=0xfffffffc, mutex=0xfffffffc) at gthr-posix.h:408 #3 0xb72c3770 in objc_condition_wait (condition=0xb5900780, mutex=0xb5900790) at thr.c:493 #4 0xb7165ef2 in -[NSConditionLock lockWhenCondition:] () from /usr/local/gnustep/3.2.2/System/Library/Libraries/libgnustep-base.so.1 Here is an example of seg fault Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1237058640 (LWP 9493)] 0xb6d5fa5a in -[List(mlinks) _addLast:] (self=0x8127050, _cmd=0xb6dd7460, anObject=0xb59009c0) at List_GEN.m:162 162 firstLink->prevLink->nextLink = newLink; (gdb) info threads * 2 Thread -1237058640 (LWP 9493) 0xb6d5fa5a in -[List(mlinks) _addLast:] ( self=0x8127050, _cmd=0xb6dd7460, anObject=0xb59009c0) at List_GEN.m:162 1 Thread -1230684032 (LWP 9492) 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) bt #0 0xb6d5fa5a in -[List(mlinks) _addLast:] (self=0x8127050, _cmd=0xb6dd7460, anObject=0xb59009c0) at List_GEN.m:162 #1 0xb6daf542 in -[Zone(c) _allocIVars:] (self=0x8126ff8, _cmd=0xb6d7ce58, aClass=0xb6d7c680) at Zone.m:191 #2 0xb6d635e4 in -[Map(c) _begin:] (self=0xb5900480, _cmd=0xb6e0fda8, aZone=0x8126ff8) at Map.m:649 #3 0xb6e00995 in removeObsoleteMerges (currentIndex=0xb6e0fda8) at XActivity.m:120 #4 0xb6e00ae3 in -[Activity(c) __run:] (self=0x83c5e30, _cmd=0xb6e0fd98) at XActivity.m:240 (gdb) thread 1 [Switching to thread 1 (Thread -1230684032 (LWP 9492))]#0 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) bt #0 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0xb702a8eb in __write_nocancel () from /lib/tls/libpthread.so.0 #2 0xb6bdf7c0 in XUnlockDisplay () from /usr/X11R6/lib/libX11.so.6 #3 0xb6be03ff in _X11TransWrite () from /usr/X11R6/lib/libX11.so.6 #4 0xb6bc0022 in _XFlush () from /usr/X11R6/lib/libX11.so.6 #5 0xb6bbfea9 in _XFlush () from /usr/X11R6/lib/libX11.so.6 #6 0xb6ba6368 in XFillRectangle () from /usr/X11R6/lib/libX11.so.6 I'm not sure how I could provide a test case because I don't know what dl_sysinfo_int80 is doing. There maybe I have some non-thread safety code somewhere, so I am continuing to investigate. Version-Release number of selected component (if applicable): glibc-2.3.2-95.6 How reproducible: Always Steps to Reproduce: 1. openapp cCulture.app 2. select new simulation 3. select start Additional info:
Swarm has extensive use of nested functions; i.e. functions which are defined only within the scope of a method. I seem to have heard of issues with nested functions, not sure if it was related to specific version of gcc or architectures; maybe this is part of the problem?
Crash in _dl_sysinfo_int80 means crash in some syscall, _dl_sysinfo_int80 is the int $0x80 instruction which enters the kernel. You can try to see if strace (or strace -f) reveals some bad arguments passed to the kernel. Certainly from the backtraces I see no reason to suspect glibc, unless you manage to create a small self-contained testcase which points to a glibc bug. As for nested functions, they are usually implemented with trampolines on the stack, so they don't work with non-executable stack unless the binary/library has appropriate PT_GNU_STACK PF_R|PF_W|PF_X program header (or no PT_GNU_STACK header at all). But stack is executable on RHEL3, only on FC1 it is not, so this certainly shouldn't be an issue.
No reply in 6 months. Closing.