Bug 59158 - Threaded program with threads blocked gets SEGV after gdb is attached
Threaded program with threads blocked gets SEGV after gdb is attached
Status: CLOSED DEFERRED
Product: Red Hat Linux
Classification: Retired
Component: glibc (Show other bugs)
7.3
alpha Linux
medium Severity medium
: ---
: ---
Assigned To: Beth Uptagrafft
Beth Uptagrafft
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-01-31 17:24 EST by harry.heinisch
Modified: 2007-04-18 12:39 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-04-24 09:50:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Source code for the "phil" dining philsophers example program (8.84 KB, text/plain)
2002-01-31 17:25 EST, harry.heinisch
no flags Details

  None (edit)
Description harry.heinisch 2002-01-31 17:24:22 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)

Description of problem:
I have a threaded program that works fine until I attach gdb to it, and when 
I "continue" in gdb,  the program crashes with a SEGV in __pthread_alt_unlock.  

Version-Release number of selected component (if applicable):
 
glibc 2.2.5, I believe

How reproducible:
Always

Steps to Reproduce:
1.run the "phil" program (I'll put source as an attachment) 
         phil &
2. gdb /bla/bla/phil pid  (pick any of the threads)
3. gdb> ...  0x200000c91ac in __sigsuspend ()
(gdb) continue
Continuing.


Actual Results:  0x200000c91ac in __sigsuspend ()
(gdb) cont
Continuing.

  Philosopher 1 eating with chopsticks 1 and 2

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 9459]

Program received signal SIGSEGV, Segmentation fault.
0x20000056854 in __pthread_alt_unlock (lock=0x1201018c0) at spinlock.c:651
spinlock.c:651: No such file or directory.

Expected Results:  The program should have finished normally, as it does when I 
don't try to attach gdb.

Additional info:

Gdb version is: GNU gdb 4.17.0.14 with Linux support
uname -a: Linux xxx 2.4.9-14a   (I don't really know if it's 'Roswell', but
                                 it's the Beta for Linux Alpha with this
                                 kernel.)

It appears that one of the queues that alt_unlock uses has been corrupted. 
Hypothesis: the signal from the debugger is waking up a thread blocked on a 
mutex which then maybe thinks the resource is available because it got a signal.

This problem can only be seen with threads doing a lot of blocking.  The 
example program is the "dining philosophers" problem which actually goes into a 
deadlock where all threads are blocked, so it's easy to reproduce with this 
(source code in the attachment).
Comment 1 harry.heinisch 2002-01-31 17:25:47 EST
Created attachment 44179 [details]
Source code for the "phil" dining philsophers example program
Comment 2 harry.heinisch 2002-02-01 08:40:45 EST
I just realized that I said the normal case is for the application to "finish". 
That was true for the example program where I originally saw this problem, but 
in my attachment I gave you the "phil" program, where the normal behavior is 
slightly different.The test program "phil" always deadlocks eventually so it 
will basically hang if it's "successful".  The real difference is that it never 
gets the segfault when run without trying to attach the debugger.
Comment 3 Beth Uptagrafft 2002-04-24 13:57:46 EDT
Harry, we have not been able to reproduce this one.  Can you try this again on 
the latest source code?  
Thanks, Beth
Comment 4 Phil Copeland 2002-04-29 12:57:41 EDT
[root@vegetta root]# gcc -g -O2 -o test test.c -lpthread
test.c: In function `main':
test.c:154: warning: cast to pointer from integer of different size
test.c: In function `dine':
test.c:186: warning: cast from pointer to integer of different size
test.c:203: warning: cast to pointer from integer of different size

[root@vegetta root]# ./test &
[1] 5509
Using 3 philosophers
[root@vegetta root]# Philosopher 0 has chopstick 0 ... attempt to pick up
chopstick 1
  Philosopher 1 has chopstick 1 ... attempt to pick up chopstick 2
    Philosopher 2 has chopstick 2 ... attempt to pick up chopstick 0

[root@vegetta root]# gdb ./test 5509
GNU gdb Red Hat Linux (5.1-1)
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "alpha-redhat-linux"...
/root/5509: No such file or directory.
Attaching to program: /root/./test, process 5509
Reading symbols from /lib/libpthread.so.0...done.
[New Thread 1024 (LWP 5509)]
[New Thread 2049 (LWP 5510)]
[New Thread 1026 (LWP 5511)]
[New Thread 2051 (LWP 5512)]
[New Thread 3076 (LWP 5513)]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libc.so.6.1...done.
Loaded symbols for /lib/libc.so.6.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
0x200000cae0c in sigsuspend () from /lib/libc.so.6.1
(gdb) cont
Continuing.
<CTRL-C>
                   
Program received signal SIGINT, Interrupt.
[Switching to Thread 3076 (LWP 5513)]
0x200000cae0c in sigsuspend () from /lib/libc.so.6.1
(gdb) cont


Now then. The on caveat that I can think of is that you're not attaching  to the
parent  (5509 in my case) if you don't then you will see the sigv behavior

just gdb the root process
use 'info threads' to see whats available
and then
thread <number> to switch between

Phil
=--=

<looks fine>

Comment 5 Christopher Holmes 2002-04-30 10:51:16 EDT
I've tried to reproduce this with the latest binaries and I concur with 
copeland's remarks: you need to attach to the parent in order to access the 
child threads. The child threads don't allow you to attach to them (see below).

[cholmes@amt73 ~/phil]$ uname -a
Linux amt73.zko.dec.com 2.4.9-31.Jsmp #1 SMP Fri Apr 26 18:18:30 EDT 2002 alpha 
unknown
[cholmes@amt73 ~/phil]$ cat /proc/cpuinfo
cpu			: Alpha
cpu model		: EV67
cpu variation		: 7
cpu revision		: 0
cpu serial number	: AY03108715
system type		: Tsunami
system variation	: Clipper
system revision		: 0
system serial number	: 4032DPSZ1000
cycle frequency [Hz]	: 666666666 
timer frequency [Hz]	: 1024.00
page size [bytes]	: 8192
phys. address bits	: 44
max. addr. space #	: 255
BogoMIPS		: 1330.04
kernel unaligned acc	: 0 (pc=0,va=0)
user unaligned acc	: 0 (pc=0,va=0)
platform string		: Compaq AlphaServer ES40
cpus detected		: 4
cpus active		: 4
cpu active mask		: 000000000000000f
[cholmes@amt73 ~/phil]$ ps
  PID TTY          TIME CMD
15249 pts/0    00:00:00 emacs
15219 pts/0    00:00:00 csh
15291 pts/0    00:00:00 phil
15290 pts/0    00:00:00 phil
15289 pts/0    00:00:00 phil
15288 pts/0    00:00:00 phil
15287 pts/0    00:00:00 phil
15345 pts/0    00:00:00 ps
[cholmes@amt73 ~/phil]$ gdb ./phil 15288
GNU gdb Red Hat Linux (5.1-1)
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "alpha-redhat-linux"...
/usr/usersc/cholmes/phil/15288: No such file or directory.
Attaching to program: /usr/usersc/cholmes/phil/./phil, process 15288
Child process unexpectedly missing: No child processes.

Program terminated with signal ?, Unknown signal.
The program no longer exists.
(gdb) q
[cholmes@amt73 ~/phil]$ ps
  PID TTY          TIME CMD
15249 pts/0    00:00:00 emacs
15219 pts/0    00:00:00 csh
15291 pts/0    00:00:00 phil
15290 pts/0    00:00:00 phil
15289 pts/0    00:00:00 phil
15287 pts/0    00:00:00 phil
15288 pts/0    00:00:00 phil
15356 pts/0    00:00:00 emacs
15357 pts/0    00:00:00 ps
[cholmes@amt73 ~/phil]$ gdb ./phil 15287
GNU gdb Red Hat Linux (5.1-1)
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "alpha-redhat-linux"...
/usr/usersc/cholmes/phil/15287: No such file or directory.
Attaching to program: /usr/usersc/cholmes/phil/./phil, process 15287
Reading symbols from /lib/libpthread.so.0...done.
[New Thread 1024 (LWP 15287)]
[New Thread 2049 (LWP 15288)]
[New Thread 1026 (LWP 15289)]
[New Thread 2051 (LWP 15290)]
[New Thread 3076 (LWP 15291)]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libc.so.6.1...done.
Loaded symbols for /lib/libc.so.6.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
0x200000c728c in sigsuspend () from /lib/libc.so.6.1
(gdb) bt
#0  0x200000c728c in sigsuspend () from /lib/libc.so.6.1
#1  0x20000051f60 in __pthread_wait_for_restart_signal (self=0x2000006f760)
    at pthread.c:969
#2  0x2000004e6f4 in pthread_join (thread_id=1026, thread_return=0x0)
    at restart.h:34
#3  0x120000d18 in main (argc=0, argv=0x20) at phil.c:162
#4  0x200000af10c in __libc_start_main (main=0x120000b10 <main>, argc=1, 
    ubp_av=0x11ffff908, init=0x1200009b8 <_init>, 
    fini=0x2000002d778 <_dl_debug_mask>, rtld_fini=0, stack_end=0x11ffff8f0)
    at ../sysdeps/generic/libc-start.c:129
(gdb) info threads
  5 Thread 3076 (LWP 15291)  0x200000c728c in sigsuspend ()
   from /lib/libc.so.6.1
  4 Thread 2051 (LWP 15290)  0x200000c728c in sigsuspend ()
   from /lib/libc.so.6.1
  3 Thread 1026 (LWP 15289)  0x200000c728c in sigsuspend ()
   from /lib/libc.so.6.1
  2 Thread 2049 (LWP 15288)  0x2000017c5cc in __poll (fds=0x120015830, nfds=1, 
    timeout=2000) at ../sysdeps/unix/sysv/linux/poll.c:63
  1 Thread 1024 (LWP 15287)  0x200000c728c in sigsuspend ()
   from /lib/libc.so.6.1
(gdb) thread 5
[Switching to thread 5 (Thread 3076 (LWP 15291))]#0  0x200000c728c in 
sigsuspend () from /lib/libc.so.6.1
(gdb) bt
#0  0x200000c728c in sigsuspend () from /lib/libc.so.6.1
#1  0x20000051f60 in __pthread_wait_for_restart_signal (self=0x20001a01a60)
    at pthread.c:969
#2  0x20000054940 in __pthread_alt_lock (lock=0x0, self=0x6f) at restart.h:34
#3  0x20000050220 in __pthread_mutex_lock (mutex=0x120011878) at mutex.c:120
#4  0x120000fc4 in dine (arg=0x0) at phil.c:222
#5  0x2000004eeb8 in pthread_start_thread (arg=0x0) at manager.c:284
(gdb) q
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/usersc/cholmes/phil/./phil, process 15287
[cholmes@amt73 ~/phil]$ ps
  PID TTY          TIME CMD
15249 pts/0    00:00:00 emacs
15219 pts/0    00:00:00 csh
15356 pts/0    00:00:00 emacs
15291 pts/0    00:00:00 phil
15290 pts/0    00:00:00 phil
15289 pts/0    00:00:00 phil
15288 pts/0    00:00:00 phil
15287 pts/0    00:00:00 phil
15376 pts/0    00:00:00 ps
[cholmes@amt73 ~/phil]$ gdb ./phil 15291
GNU gdb Red Hat Linux (5.1-1)
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "alpha-redhat-linux"...
/usr/usersc/cholmes/phil/15291: No such file or directory.
Attaching to program: /usr/usersc/cholmes/phil/./phil, process 15291
Child process unexpectedly missing: No child processes.

Program terminated with signal ?, Unknown signal.
The program no longer exists.
(gdb) 

Comment 6 Christopher Holmes 2002-04-30 10:57:38 EDT
glibc has been upgraded in the latest binaries...

[cholmes@amt73 ~/phil]$ rpm -qf /lib/libc.so.6.1
glibc-2.2.4-24
Comment 7 harry.heinisch 2002-05-14 14:24:31 EDT
In your examples, I don't see that you tried to "continue" after switching to 
one of the other threads. When I attach to the root process, and then switch to 
another thread/process, and then say "continue", that's when I see the segv.

(gdb) thread 3
[Switching to Thread 24791]
#0  0x200000cae0c in __sigsuspend ()
(gdb) cont
Continuing.

    Philosopher 2 eating with chopsticks 2 and 0

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 24793]

Program received signal SIGSEGV, Segmentation fault.
__pthread_alt_unlock (lock=0x120101870) at spinlock.c:628
spinlock.c:628: No such file or directory.


In your example, it looks like you switched to the other thread and then 
detached instead of continuing.

Note You need to log in before you can comment on or make changes to this bug.