Bug 115349
Summary: | mutex hang when using pthread_cond_broadcast() under high contention | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | John G. Myers <jgmyers> | ||||
Component: | glibc | Assignee: | Jakub Jelinek <jakub> | ||||
Status: | CLOSED ERRATA | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.0 | CC: | drepper, jbs, roland, szabka, tao, van.okamura | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-05-12 01:28:24 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
John G. Myers
2004-02-11 01:57:21 UTC
Created attachment 97574 [details]
Test program
Kernel is 2.4.21-9.ELsmp Could you please try ftp://people.redhat.com/jakub/glibc/errata/2.3.2-95.10/ These packages have temporarily disabled FUTEX_REQUEUE. The bug does not reproduce with 2.3.2-95.10. I've seen this same bug with the Boehm-Demers-Weiser conservative garbage collector (aka libgc): http://www.hpl.hp.com/personal/Hans_Boehm/gc/ It was fixed by the updated glibc I got from here: ftp://people.redhat.com/jakub/glibc/errata/2.3.2-95.20/ An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-212.html Here is a simplified reproducer (hangs with -b with glibc which doesn't have FUTEX_REQUEUE (or FUTEX_CMP_REQUEUE) commented out): #define _XOPEN_SOURCE 500 #include <unistd.h> #include <stdlib.h> #include <pthread.h> pthread_mutex_t mtx; pthread_cond_t cv; int broadcast; int nn; void * tf (void *arg) { for (;;) { pthread_mutex_lock (&mtx); while (!nn) pthread_cond_wait (&cv, &mtx); --nn; pthread_mutex_unlock (&mtx); } } int main (int argc, char **argv) { int i, spins = 0; pthread_mutexattr_t mtxa; pthread_mutexattr_init (&mtxa); pthread_mutexattr_settype (&mtxa, PTHREAD_MUTEX_ERRORCHECK_NP); pthread_mutex_init (&mtx, &mtxa); pthread_cond_init (&cv, NULL); if (argc > 1) { if (!strcmp (argv[1], "-b")) broadcast = 1; else if (!strcmp (argv[1], "-B")) broadcast = 2; } for (i = 0; i < 40; i++) { pthread_t th; pthread_create (&th, NULL, tf, NULL); } pthread_mutex_lock (&mtx); for (;;) { if ((spins++ % 1000) == 0) write (1, ".", 1); pthread_mutex_unlock (&mtx); pthread_mutex_lock (&mtx); int njobs = rand () % 41; nn = njobs; if (broadcast && (broadcast > 1 || (rand () % 30) == 0)) pthread_cond_broadcast (&cv); else while (njobs--) pthread_cond_signal (&cv); } } It happens even if cond->__data.__lock is held during the futex (FUTEX_REQUEUE) syscall and only hangs with -b option, doesn't hang without any options or with -B, so mixing pthread_cond_broadcast with pthread_cond_signal syscalls is essential. *** Bug 121283 has been marked as a duplicate of this bug. *** Was this bug accidentally linked to the wrong errata? I fail to see how an updated shadow-utils rpm resolves a problem with glibc/pthreads... No, the reference is correct. shadow-utils has to be updated in addition to glibc. |