Bug 60674

Summary: Default pthread mutexs cause SIGSEGV on SMP IA64 machines.
Product: [Retired] Red Hat Linux Reporter: IBM Bug Proxy <bugproxy>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: fweimer
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2002-03-04 17:37:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description IBM Bug Proxy 2002-03-04 17:37:32 UTC
Please fill in each of the sections below.

Hardware Environment:
4-way SMP IA64 with 9 GB RAM

Software Environment:
Redhat 7.2 for IA64 with glibc-2.2.4 .

Steps to Reproduce:
1. Compile example testcase program (listed below) with:
     gcc alt_test.c -o alt_test -lpthread
2.  Run with ./alt_test
3.  Examine generated core file with gdb (fails much faster, and consistently if 
gdb is not attached during the run).

Actual Results:
"Segmentation Fault"

Expected Results:
Testcause should run indefinitely.

Additional Information:

We have found that the default type of pthread mutexs (TIMED) do not function 
properly on SMP machines. This problem is timing dependant and so it SIGSEGVs in 
slightly different ways, but it appears as though it is all to do with wait_node 
handling.

Here is an example backtrace:

#0  __pthread_alt_unlock (lock=0x80000fffffffb7b8) at spinlock.c:396
#1  0x2000000000068df0 in __pthread_mutex_unlock (mutex=0x80000fffffffb7a0) at 
mutex.c:195
#2  0x4000000000000c20 in lock_unlock ()
#3  0x4000000000000ca0 in critical ()
#4  0x2000000000066e20 in pthread_start_thread (arg=0x2000000000effa60) at 
manager.c:284
#5  0x2000000000249d00 in __clone2 () at soinit.c:56
#6  0x2000000000068df0 in __pthread_mutex_unlock (mutex=0x0) at mutex.c:195
#7  0x00000000 in ?? ()

The example testcase below creates 2 threads that repeatedly loop, locking and
unlocking a mutex. The main loop repeatedly creates a thread that locks and 
unlocks the mutex once and then exits. As wait_nodes are allocated on the stack,
the short lived thread is what appears to be provoking the SIGSEGV, but I 
suspect that there are locking issues within wait_node_dequeue. I have been 
unable to recreate this problem on a 4-way PowerPC 64 machine.

I have tried using ADAPTIVE mutexs, however the problem still remains as some 
C-library functions use mutexs too, such as calloc. 
Also, we can't use NGPT.

/* alt_test.c */
#include <pthread.h>
#include <stdio.h>
#include <time.h>

void lock_unlock(pthread_mutex_t *mux)
{
  sched_yield();
  pthread_mutex_lock(mux);
  sched_yield();
  pthread_mutex_unlock(mux);
}

void *critical(void *arg)
{
 pthread_mutex_t *mux;

 mux = (pthread_mutex_t  *) arg;

 lock_unlock(mux);

 return NULL; 
}

void *loopcritical(void *arg)
{
 pthread_mutex_t *mux;
 mux = (pthread_mutex_t  *) arg;
 do
 {
  sched_yield();
  sched_yield();
   lock_unlock(mux);
 }while(1);

         return NULL;
}


int main(int argc,char *argv)
{
 unsigned long int x =0;
 pthread_mutex_t mux;

 pthread_mutexattr_t ma;
 pthread_t thr[3];

 pthread_mutexattr_init(&ma);
if(argc>1)
 pthread_mutexattr_settype(&ma,PTHREAD_MUTEX_ADAPTIVE_NP);
 pthread_mutex_init(&mux,&ma);

 pthread_create(&thr[0],NULL,loopcritical,(void *) &mux);
 pthread_create(&thr[2],NULL,loopcritical,(void *) &mux);
do{

 pthread_create(&thr[1],NULL,critical,(void *) &mux);
sched_yield();
lock_unlock(&mux);
 pthread_join(thr[1],NULL);

}while(1);

}

*******************PATCH ********************
The source code mentioned in this defect is in
   glibc-*/linuxthreads/spinlock.c

266,268c266,268
<   struct wait_node *next;     /* Next node in null terminated linked list */
<   pthread_descr thr;          /* The thread waiting with this node */
<   int abandoned;              /* Atomic flag */
---
>   volatile  struct wait_node *next;   /* Next node in null terminated linked
list */
>   volatile pthread_descr thr;         /* The thread waiting with this node */
>   volatile int abandoned;             /* Atomic flag */

********************PATCH *******************************************

Comment 1 Jakub Jelinek 2002-04-05 09:01:32 UTC
Fixed in glibc-2.2.4-24.