Bug 115349 - mutex hang when using pthread_cond_broadcast() under high contention
mutex hang when using pthread_cond_broadcast() under high contention
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: glibc (Show other bugs)
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Jakub Jelinek
: 121283 (view as bug list)
Depends On:
  Show dependency treegraph
Reported: 2004-02-10 20:57 EST by John G. Myers
Modified: 2007-11-30 17:07 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-05-11 21:28:24 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Test program (4.16 KB, text/plain)
2004-02-10 20:58 EST, John G. Myers
no flags Details

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:143 normal SHIPPED_LIVE GNU C Library bugfix update 2004-05-11 00:00:00 EDT
Red Hat Product Errata RHBA-2004:212 normal SHIPPED_LIVE Updated shadow-utils package available 2004-05-11 00:00:00 EDT
Red Hat Product Errata RHBA-2004:213 normal SHIPPED_LIVE Updated ypserv package available 2004-05-11 00:00:00 EDT

  None (edit)
Description John G. Myers 2004-02-10 20:57:21 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6)

Description of problem:
The attached test program hangs when run on a dual Xenon 2.4 GHZ box.

The main thread (and some of the worker threads) blocks in futex_wait,
waiting to acquire the mutex "mtx", which is unlocked.  Attaching and
detaching a debugger causes the program to continue, as does sending
the process a STOP and CONT signal.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Compile the attached program with
cc -o cvtest cvtest.c -lpthread

2. In one window, run a server process
./cvtest -s

3. In the other window, run the test client
./cvtest -b

Actual Results:  The test client will hang within minutes.  Attach a
debugger and examine the main thread--it will be in the futex syscall,
inside __lll_mutex_lock_wait.  The futex for the associated mutex will
have value 0.

Expected Results:  The test client continues to print '.' characters.

If you examine the worker threads, you will find some also hanging in
the futex syscall for the same unlocked futex.

Additional info:

If you instead run cvtest with no arguments, causing it to never use
pthread_cond_broadcast(), it will not hang.
Comment 1 John G. Myers 2004-02-10 20:58:21 EST
Created attachment 97574 [details]
Test program
Comment 2 John G. Myers 2004-02-11 14:04:19 EST
Kernel is 2.4.21-9.ELsmp
Comment 3 Jakub Jelinek 2004-02-13 02:18:43 EST
Could you please try ftp://people.redhat.com/jakub/glibc/errata/2.3.2-95.10/
These packages have temporarily disabled FUTEX_REQUEUE.
Comment 4 John G. Myers 2004-02-13 15:56:38 EST
The bug does not reproduce with 2.3.2-95.10.
Comment 5 Kenneth C. Schalk 2004-04-29 17:03:59 EDT
I've seen this same bug with the Boehm-Demers-Weiser conservative
garbage collector (aka libgc):


It was fixed by the updated glibc I got from here:

Comment 6 John Flanagan 2004-05-11 21:28:25 EDT
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

Comment 7 Jakub Jelinek 2004-05-12 09:42:54 EDT
Here is a simplified reproducer (hangs with -b with glibc which
doesn't have FUTEX_REQUEUE (or FUTEX_CMP_REQUEUE) commented out):

#define _XOPEN_SOURCE 500
#include <unistd.h>
#include <stdlib.h>
#include <pthread.h>

pthread_mutex_t mtx;
pthread_cond_t cv;
int broadcast;
int nn;

void *
tf (void *arg)
  for (;;)
      pthread_mutex_lock (&mtx);
      while (!nn)
        pthread_cond_wait (&cv, &mtx);
      pthread_mutex_unlock (&mtx);

main (int argc, char **argv)
  int i, spins = 0;
  pthread_mutexattr_t mtxa;

  pthread_mutexattr_init (&mtxa);
  pthread_mutexattr_settype (&mtxa, PTHREAD_MUTEX_ERRORCHECK_NP);
  pthread_mutex_init (&mtx, &mtxa);
  pthread_cond_init (&cv, NULL);

  if (argc > 1)
      if (!strcmp (argv[1], "-b"))
        broadcast = 1;
      else if (!strcmp (argv[1], "-B"))
        broadcast = 2;

  for (i = 0; i < 40; i++)
      pthread_t th;
      pthread_create (&th, NULL, tf, NULL);

  pthread_mutex_lock (&mtx);
  for (;;)
      if ((spins++ % 1000) == 0)
        write (1, ".", 1);

      pthread_mutex_unlock (&mtx);

      pthread_mutex_lock (&mtx);

      int njobs = rand () % 41;
      nn = njobs;
      if (broadcast && (broadcast > 1 || (rand () % 30) == 0))
        pthread_cond_broadcast (&cv);
        while (njobs--)
          pthread_cond_signal (&cv);

It happens even if cond->__data.__lock is held during the futex (FUTEX_REQUEUE)
syscall and only hangs with -b option, doesn't hang without any options
or with -B, so mixing pthread_cond_broadcast with pthread_cond_signal
syscalls is essential.
Comment 8 Van Okamura 2004-05-28 23:49:05 EDT
*** Bug 121283 has been marked as a duplicate of this bug. ***
Comment 9 Paul Waterman 2004-06-02 15:56:02 EDT
Was this bug accidentally linked to the wrong errata?

I fail to see how an updated shadow-utils rpm resolves a problem with
Comment 10 Ulrich Drepper 2004-06-02 19:31:13 EDT
No, the reference is correct.  shadow-utils has to be updated in
addition to glibc.

Note You need to log in before you can comment on or make changes to this bug.