Bug 60674 - Default pthread mutexs cause SIGSEGV on SMP IA64 machines.
Default pthread mutexs cause SIGSEGV on SMP IA64 machines.
Status: CLOSED ERRATA
Product: Red Hat Linux
Classification: Retired
Component: glibc (Show other bugs)
7.2
ia64 Linux
high Severity high
: ---
: ---
Assigned To: Jakub Jelinek
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-03-04 12:37 EST by IBM Bug Proxy
Modified: 2005-10-31 17:00 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2002-03-04 12:37:36 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description IBM Bug Proxy 2002-03-04 12:37:32 EST
Please fill in each of the sections below.

Hardware Environment:
4-way SMP IA64 with 9 GB RAM

Software Environment:
Redhat 7.2 for IA64 with glibc-2.2.4 .

Steps to Reproduce:
1. Compile example testcase program (listed below) with:
     gcc alt_test.c -o alt_test -lpthread
2.  Run with ./alt_test
3.  Examine generated core file with gdb (fails much faster, and consistently if 
gdb is not attached during the run).

Actual Results:
"Segmentation Fault"

Expected Results:
Testcause should run indefinitely.

Additional Information:

We have found that the default type of pthread mutexs (TIMED) do not function 
properly on SMP machines. This problem is timing dependant and so it SIGSEGVs in 
slightly different ways, but it appears as though it is all to do with wait_node 
handling.

Here is an example backtrace:

#0  __pthread_alt_unlock (lock=0x80000fffffffb7b8) at spinlock.c:396
#1  0x2000000000068df0 in __pthread_mutex_unlock (mutex=0x80000fffffffb7a0) at 
mutex.c:195
#2  0x4000000000000c20 in lock_unlock ()
#3  0x4000000000000ca0 in critical ()
#4  0x2000000000066e20 in pthread_start_thread (arg=0x2000000000effa60) at 
manager.c:284
#5  0x2000000000249d00 in __clone2 () at soinit.c:56
#6  0x2000000000068df0 in __pthread_mutex_unlock (mutex=0x0) at mutex.c:195
#7  0x00000000 in ?? ()

The example testcase below creates 2 threads that repeatedly loop, locking and
unlocking a mutex. The main loop repeatedly creates a thread that locks and 
unlocks the mutex once and then exits. As wait_nodes are allocated on the stack,
the short lived thread is what appears to be provoking the SIGSEGV, but I 
suspect that there are locking issues within wait_node_dequeue. I have been 
unable to recreate this problem on a 4-way PowerPC 64 machine.

I have tried using ADAPTIVE mutexs, however the problem still remains as some 
C-library functions use mutexs too, such as calloc. 
Also, we can't use NGPT.

/* alt_test.c */
#include <pthread.h>
#include <stdio.h>
#include <time.h>

void lock_unlock(pthread_mutex_t *mux)
{
  sched_yield();
  pthread_mutex_lock(mux);
  sched_yield();
  pthread_mutex_unlock(mux);
}

void *critical(void *arg)
{
 pthread_mutex_t *mux;

 mux = (pthread_mutex_t  *) arg;

 lock_unlock(mux);

 return NULL; 
}

void *loopcritical(void *arg)
{
 pthread_mutex_t *mux;
 mux = (pthread_mutex_t  *) arg;
 do
 {
  sched_yield();
  sched_yield();
   lock_unlock(mux);
 }while(1);

         return NULL;
}


int main(int argc,char *argv)
{
 unsigned long int x =0;
 pthread_mutex_t mux;

 pthread_mutexattr_t ma;
 pthread_t thr[3];

 pthread_mutexattr_init(&ma);
if(argc>1)
 pthread_mutexattr_settype(&ma,PTHREAD_MUTEX_ADAPTIVE_NP);
 pthread_mutex_init(&mux,&ma);

 pthread_create(&thr[0],NULL,loopcritical,(void *) &mux);
 pthread_create(&thr[2],NULL,loopcritical,(void *) &mux);
do{

 pthread_create(&thr[1],NULL,critical,(void *) &mux);
sched_yield();
lock_unlock(&mux);
 pthread_join(thr[1],NULL);

}while(1);

}

*******************PATCH ********************
The source code mentioned in this defect is in
   glibc-*/linuxthreads/spinlock.c

266,268c266,268
<   struct wait_node *next;     /* Next node in null terminated linked list */
<   pthread_descr thr;          /* The thread waiting with this node */
<   int abandoned;              /* Atomic flag */
---
>   volatile  struct wait_node *next;   /* Next node in null terminated linked
list */
>   volatile pthread_descr thr;         /* The thread waiting with this node */
>   volatile int abandoned;             /* Atomic flag */

********************PATCH *******************************************
Comment 1 Jakub Jelinek 2002-04-05 04:01:32 EST
Fixed in glibc-2.2.4-24.

Note You need to log in before you can comment on or make changes to this bug.