Red Hat Bugzilla – Bug 115349
mutex hang when using pthread_cond_broadcast() under high contention
Last modified: 2007-11-30 17:07:00 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6)
Description of problem:
The attached test program hangs when run on a dual Xenon 2.4 GHZ box.
The main thread (and some of the worker threads) blocks in futex_wait,
waiting to acquire the mutex "mtx", which is unlocked. Attaching and
detaching a debugger causes the program to continue, as does sending
the process a STOP and CONT signal.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Compile the attached program with
cc -o cvtest cvtest.c -lpthread
2. In one window, run a server process
3. In the other window, run the test client
Actual Results: The test client will hang within minutes. Attach a
debugger and examine the main thread--it will be in the futex syscall,
inside __lll_mutex_lock_wait. The futex for the associated mutex will
have value 0.
Expected Results: The test client continues to print '.' characters.
If you examine the worker threads, you will find some also hanging in
the futex syscall for the same unlocked futex.
If you instead run cvtest with no arguments, causing it to never use
pthread_cond_broadcast(), it will not hang.
Created attachment 97574 [details]
Kernel is 2.4.21-9.ELsmp
Could you please try ftp://people.redhat.com/jakub/glibc/errata/2.3.2-95.10/
These packages have temporarily disabled FUTEX_REQUEUE.
The bug does not reproduce with 2.3.2-95.10.
I've seen this same bug with the Boehm-Demers-Weiser conservative
garbage collector (aka libgc):
It was fixed by the updated glibc I got from here:
An errata has been issued which should help the problem described in this bug report.
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen
this bug report if the solution does not work for you.
Here is a simplified reproducer (hangs with -b with glibc which
doesn't have FUTEX_REQUEUE (or FUTEX_CMP_REQUEUE) commented out):
#define _XOPEN_SOURCE 500
tf (void *arg)
pthread_cond_wait (&cv, &mtx);
main (int argc, char **argv)
int i, spins = 0;
pthread_mutexattr_settype (&mtxa, PTHREAD_MUTEX_ERRORCHECK_NP);
pthread_mutex_init (&mtx, &mtxa);
pthread_cond_init (&cv, NULL);
if (argc > 1)
if (!strcmp (argv, "-b"))
broadcast = 1;
else if (!strcmp (argv, "-B"))
broadcast = 2;
for (i = 0; i < 40; i++)
pthread_create (&th, NULL, tf, NULL);
if ((spins++ % 1000) == 0)
write (1, ".", 1);
int njobs = rand () % 41;
nn = njobs;
if (broadcast && (broadcast > 1 || (rand () % 30) == 0))
It happens even if cond->__data.__lock is held during the futex (FUTEX_REQUEUE)
syscall and only hangs with -b option, doesn't hang without any options
or with -B, so mixing pthread_cond_broadcast with pthread_cond_signal
syscalls is essential.
*** Bug 121283 has been marked as a duplicate of this bug. ***
Was this bug accidentally linked to the wrong errata?
I fail to see how an updated shadow-utils rpm resolves a problem with
No, the reference is correct. shadow-utils has to be updated in
addition to glibc.