Bug 13785

Summary: Bug in pthreads blocks ability to preempt suspend and resume threads on SMP machines
Product: [Retired] Red Hat Linux Reporter: Thor Nolen <nolen>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 6.2CC: fv_tnagrax
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2000-07-12 13:11:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Thor Nolen 2000-07-12 13:11:33 UTC
In order to suspend and resume threads preemptively, a signal needs to be
sent to the thread to tell it to pause execution.  If a certain part of
the pthread library (__pthread_wait_for_restart_signal) is being executed
at the time that the signal is sent, it may result in a hang where the
threads get stuck in a wait state.  This seems to happen if the preempting
signal was received when __pthread_wait_for_restart_signal is waiting for
the availability of the pthread descriptor block.

Mail me <thor> for a demo program that demonstrates the bug. 
Here is a sample execution and backtrace:

      bash$ ./pthreadhang
      Pthread Hang C Test Case

      Current Count: 1    Total Created: 1
      Thread going into 0 second sleep!
      Thread awaking from 0 second sleep!
      Thread going into 0 second sleep!
      Thread awaking from 0 second sleep!
      Thread going into 0 second sleep!
      Thread awaking from 0 second sleep!
      Thread going into 0 second sleep!
      Thread awaking from 0 second sleep!
      Thread going into 0 second sleep!
      Thread awaking from 0 second sleep!
      Thread going into 0 second sleep!
      Thread awaking from 0 second sleep!
      
The demo program creates threads that sleep for a certain amount of time.
When a thread goes to sleep, it prints a line to the screen, and when it
wakes up, it prints a line to the screen.  You know that the program has
hung when it stops printing to the screen.  Please see the source code
comments for more information about the sleep time and max threads
parameters and what they signify.
          
      [tcard@em4d pthreadhang]$ ps ax | grep pthreadhang  
      17265 pts/3    S      0:00 ./pthreadhang
      17266 pts/3    S      0:00 ./pthreadhang      
      17273 pts/3    S      0:00 ./pthreadhang
      17300 pts/4    S      0:00 grep pthreadhang
      [tcard@em4d pthreadhang]$ gdb pthreadhang 17265
      Type "show copying" to see the conditions.
      There is absolutely no warranty for GDB.  Type "show warranty" for
details.
      This GDB was configured as "i386-redhat-linux"...

      /home/tcard/pthreadhang/17265: No such file or directory.
      Attaching to program: /home/tcard/pthreadhang/pthreadhang, Pid 17265
      Reading symbols from /lib/libpthread.so.0...done.
      Reading symbols from /lib/libdl.so.2...done.
      Reading symbols from /lib/libc.so.6...done.
      Reading symbols from /lib/ld-linux.so.2...done.
      0x40050deb in __sigsuspend (set=0xbffffa24)
          at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
      48        ../sysdeps/unix/sysv/linux/sigsuspend.c: No such file or
directory.
      (gdb) i thr
        3 Thread 17273  0x40050deb in __sigsuspend (set=0xbf5ffa24)
          at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
      * 2 Thread 17265 (initial thread)  0x40050deb in __sigsuspend
(set=0xbffffa24)
          at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
        1 Thread 17266 (manager thread)  0x400ddf50 in __poll
(fds=0x804c990,
          nfd(gdb) bt
      #0  0x40050deb in __sigsuspend (set=0xbffffa24)
          at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
      #1  0x40022c82 in __pthread_wait_for_restart_signal (self=0x4002a940)
          at pthread.c:785
      #2  0x400242c2 in __pthread_lock (lock=0x40026910, self=0x4002a940)
          at spinlock.c:68
      #3  0x40023437 in pthread_kill (thread=7171, signo=12) at
signals.c:58
      #4  0x8048c2d in resumeSuspendedThread ()
      #5  0x8049246 in main ()
      #6  0x4004a9cb in __libc_start_main (main=0x8049178 <main>, argc=1,
          argv=0xbffffb74, init=0x8048700 <_init>, fini=0x80492bc <_fini>,
          rtld_fini=0x4000ae60 <_dl_fini>, stack_end=0xbffffb6c)
          at ../sysdeps/generic/libc-start.c:92
            s=1, timeout=2000) at ../sysdeps/unix/sysv/linux/poll.c:45     
#0  0x40050deb in __sigsuspend (set=0xbf5ffa24)
          at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
      48        in ../sysdeps/unix/sysv/linux/sigsuspend.c
      (gdb) bt
      #0  0x40050deb in __sigsuspend (set=0xbf5ffa24)
          at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
      #1  0x8048ee2 in signalHandler ()
      #2  0x40023601 in pthread_sighandler_rt (signo=12, si=0xbf5ffae0,
          uc=0xbf5ffb60) at signals.c:119
      #3  0x40050c60 in __restore_rt ()
          at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127
      #4  0x40022c82 in __pthread_wait_for_restart_signal (self=0xbf5ffe40)
          at pthread.c:785
      #5  0x400242c2 in __pthread_lock (lock=0x40026910, self=0xbf5ffe40)
          at spinlock.c:68
      #6  0x40022a8a in __pthread_set_own_extricate_if (self=0xbf5ffe40,
          peif=0xbf5ffd64) at pthread.c:770
      #7  0x4002544d in __old_sem_wait (sem=0x804aac0) at oldsemaphore.c:93
      #8  0x8048ccf in startSleep ()
      #9  0x40020b85 in pthread_start_thread (arg=0xbf5ffe40) at
manager.c:241

gdb was loaded and attached to the thread of the demo program.  A printout
of
the threads shows that three threads exist, one of which is the manager
thread
which does not matter (to them).  Backtraced the second thread, switched to
the third, and backtraced it.  The backtraces show where the system hangs,
and
the backtrace of thread 3 shows how __pthread_lock caused the suspend that
stops the program in this test case.

Comment 1 Jakub Jelinek 2000-08-25 08:21:27 UTC
Fixed in glibc 2.1.92