Bug 552960 - Possible deadlock in pthread_mutex_lock/pthread_cond_wait
Summary: Possible deadlock in pthread_mutex_lock/pthread_cond_wait
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: glibc
Version: 6.0
Hardware: x86_64
OS: Linux
Target Milestone: rc
: ---
Assignee: Siddhesh Poyarekar
QA Contact: Arjun Shankar
: 854725 (view as bug list)
Depends On:
Blocks: 782183
TreeView+ depends on / blocked
Reported: 2010-01-06 16:43 UTC by sdh4
Modified: 2018-12-04 14:18 UTC (History)
24 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Prior to this update, there were multiple synchronization bugs in pthread_cond_wait and pthread_cond_timedwait on x86_64 and i686, such that when a multithreaded program uses a priority-inherited mutex to synchronize access to a condition variable, some threads may deadlock when woken using pthread_cond_signal or when cancelled. This update fixes all such known problems related to condition variable synchronization.
Clone Of:
Last Closed: 2013-11-21 10:38:17 UTC

Attachments (Terms of Use)
Deadlock test case. Note the suggested compile parameters (4.45 KB, text/x-csrc)
2010-01-19 03:44 UTC, sdh4
no flags Details
simplified reproducer (1.88 KB, text/plain)
2011-08-16 14:10 UTC, Frantisek Hrbata
no flags Details
makefile for simplified reproducer (844 bytes, text/plain)
2011-08-16 14:11 UTC, Frantisek Hrbata
no flags Details
proposed patch (3.57 KB, text/plain)
2011-08-16 14:12 UTC, Frantisek Hrbata
no flags Details
proposed patch V2 with cond_lock (4.05 KB, text/plain)
2011-08-16 17:34 UTC, Frantisek Hrbata
no flags Details
Consolidated patch backported from upstream (34.06 KB, patch)
2012-10-17 13:04 UTC, Siddhesh Poyarekar
no flags Details | Diff
output of custom ps command during deadlock (8.66 KB, application/octet-stream)
2012-10-17 13:08 UTC, IBM Bug Proxy
no flags Details
'ps -ALo pid,tid,pri,rtprio,stat,status,wchan:30,cmd' taken while the threads were "frozen" (43.02 KB, text/plain)
2012-10-17 13:08 UTC, IBM Bug Proxy
no flags Details
'ps -ALo pid,tid,pri,rtprio,stat,status,wchan:30,ucomm' taken while the threads were "frozen" (25.76 KB, text/plain)
2012-10-17 13:08 UTC, IBM Bug Proxy
no flags Details
'ps -ALo pid,tid,pri,rtprio,stat,status,wchan:30,cmd' taken while the threads were "frozen" (41.40 KB, application/octet-stream)
2012-10-17 13:08 UTC, IBM Bug Proxy
no flags Details
ftrace events trace captured during the test (3.38 MB, application/x-bzip2)
2012-10-17 13:09 UTC, IBM Bug Proxy
no flags Details

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2013:1605 normal SHIPPED_LIVE Moderate: glibc security, bug fix, and enhancement update 2013-11-20 21:54:09 UTC
IBM Linux Technology Center 84844 None None None 2019-04-27 20:57:39 UTC

Description sdh4 2010-01-06 16:43:12 UTC
I have a possible deadlock condition in the pthreads library. It is a very rare and random occurrance in very simple code that dispatches jobs from a main thread to a pool of worker threads. 

This code uses pthread_mutex_lock(), pthread_cond_wait(), and pthread_cond_signal() to control access to the list of jobs. The mutex is never held for more than a few lines of code and there are no code paths that could allow the mutex to be left held. pthread_cond_signal() and pthread_cond_wait() are always called with the mutex locked. 

The symptom is that a random thread gets stuck in pthread_mutex_lock(). In the deadlocked state, the mutex __lock field contains the thread ID of one of the other threads that is sitting in pthread_cond_wait() OR'd with 0x80000000. Count I believe is 1, owner is 0. nusers is 9. All of the other threads are 
waiting in pthread_cond_wait(). 

Attempting at the same point to call pthread_mutex_lock() from another thread locks the other thread as well.

The problem is observed on Fedora 12 on a 4-core Core I7 with hyperthreading, so 8 cpus. 

The threads have PTHREAD_PRIO_INHERIT set. Setting PTHREAD_MUTEX_ERRORCHECK does not prevent the problem (and none of the pthread_mutex()) calls return an error, but setting ERRORCHECK does seem to make the problem occur less frequently. 

In this situation there isn't a whole lot of work for the worker threads to do. I suspect that there might be a race condition in the response of multiple threads in pthread_cond_wait() that leads to a problem in the mutex. 

glibc version: glibc-2.11-2.x86_64

Comment 1 Andreas Schwab 2010-01-11 13:30:44 UTC
Please provide a complete test case.

Comment 2 sdh4 2010-01-11 16:30:12 UTC
Working on it... As this is a heisenbug, a simple test case may not be possible. The hardware on which this failed isn't mine... one of my research partners.

Do you know if a nonzero __lock field with a zero owner is a legitimate mutex state? 

That looked very suspicious to me. 

I'm pasting the relevant code below, which I will be using to assemble the test case. 

Code snippets

Mutex Creation: 
#ifdef WFMMATH_DEBUG // optional... this seems to make the problem happen less frequently 

Thread creation:


	/* SCHED_OTHER requires a sched_priority of 0 */

	for (Cnt=0;Cnt < md->actual_threads;Cnt++) {

Queuing work: 
	dgl_AddTail((struct dgl_List *)&md->PendingComputation,(struct dgl_Node *)calcfcn); // This is just a simple linked-list add. 

Worker thread loop: 
	for (;;) {
		todo=(struct MathFcn *)dgl_RemHead((struct dgl_List *)&md->PendingComputation); // this is a simple linked-list remove
		if (todo) {
			todo->CalcFcn(md,todo); // Actual work done here.

			dgl_AddHead((struct dgl_List *)&md->CompletedComputation,(struct dgl_Node *)todo); // simple linked list add

			write(md->parentnotifypipe[1]," ",1); //# Notify parent process of completion
			pthread_mutex_lock(&md->WorkNotifyMutex); //# Must be locked before returning to main loop

		else {
			/* Wait for something to do */

Comment 3 sdh4 2010-01-11 16:35:28 UTC
Oops. Add this code for dequeuing the completed computation:

	fcn=(struct MathFcn *)dgl_RemTail((struct dgl_List *)&md->CompletedComputation); // Simple linked-list remove

Other than these few snippets NOTHING touches this particular mutex.

Comment 4 Andreas Schwab 2010-01-11 16:45:10 UTC
Please provide a _complete_ test case.

Comment 5 sdh4 2010-01-19 03:44:59 UTC
Created attachment 385307 [details]
Deadlock test case. Note the suggested compile parameters

Here is a test case. On a Core I7 motherboard (64 bit OS), three trials lead to: 

1st trial: the program stop at 500 its
2nd trial: stop at 1000 its
3rd trial: stop at 1200 its.

I am unable to reproduce (i.e. runs forever) on my core duo laptop (32 bit) or a dual quad core Opteron (64 bit).

Comment 6 Andreas Schwab 2010-02-15 13:31:34 UTC
Cannot reproduce.

Comment 7 sdh4 2010-02-15 17:47:37 UTC
It seems quite repeatable on this end. 

Ending up in the deadlock state might be very dependent on timing details of context switches, CPU core assignments, etc. Could be motherboard dependent.

Would a coredump of the testcase from kill -SEGV be helpful?

Comment 8 sdh4 2010-08-16 15:14:01 UTC
New information on this bug: 
  * The problem seems to be related to priority inheritance. Removing pthread_mutexattr_setprotocol(&md->MutexAttr,PTHREAD_PRIO_INHERIT) seems to work around the problem.
  * It has been observed on at least two different ASUS P6TD Deluxe motherboards with Core I7 920 CPUs. 
  * The problem has also been observed (using the previously attached test-case) under Red Hat Enterprise 6 public beta 2.

Comment 9 Bug Zapper 2010-11-04 01:42:25 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 

Comment 10 sdh4 2010-11-04 04:14:15 UTC
As noted above
 * The problem has also been observed (using the previously attached
test-case) under Red Hat Enterprise 6 public beta 2.

So I'm changing product to "Red Hat Enterprise 6"

Comment 12 Andreas Schwab 2010-11-19 16:46:06 UTC
Looks like a futex bug.

Comment 13 sdh4 2010-12-09 16:15:48 UTC
Confirmed on RHEL6 release version on a dual quad-core Opteron (64 bit), 

using the testcase above (attachment 385307 [details]).

Comment 14 Konrad Karl 2011-01-11 17:14:08 UTC
confirmed on Fedora 13 on x86_64 on a Intel(R) Core(TM)2 CPU 
Kontron Board ICH8 chipset.

(Linux version,glibc-2.12.2-1.x86_64)

when running "taskset -c 0 ./deadlockbug" it hangs immediately (iter=0)
(without PTHREAD_PRIO_INHERIT on the mutex it does not hang)

using both cpu's I see numbers from 200..500 or similar.

this could explain some random hangs I observed in one of my programs
(had attributed them to an oversight of mine but now...)


Comment 15 RHEL Product and Program Management 2011-04-04 01:44:31 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 16 Frantisek Hrbata 2011-08-16 14:10:41 UTC
Created attachment 518495 [details]
simplified reproducer

Comment 17 Frantisek Hrbata 2011-08-16 14:11:26 UTC
Created attachment 518496 [details]
makefile for simplified reproducer

Comment 18 Frantisek Hrbata 2011-08-16 14:12:46 UTC
Created attachment 518497 [details]
proposed patch

Comment 19 Frantisek Hrbata 2011-08-16 14:23:28 UTC
This is bug in glibc, not kernel. I attached patch with proposed solution. It seems to me that requeue_pi in libc could not never work, but I could be wrong of course. I do not know what has changed during the time. Anyway I think that the requeue_pi code deserves a review.

Here follows description from the attached patch:

Current implementation of wait_requeue_pi in libc does not handle situation when
kernel returns -EAGAIN. Kernel tests if the actual futex(cond_futex) value is as
expected(handled as 3rd($rdx) parameter to futex syscall). If it's not, it means
some other thread increased the cond_futex value and we need to call
wait_requeue_pi again. Not handling this situation means:

1) incorrect locking
Even thread not owning mutex is waken. In the current implementation if
wait_requeue_pi fails, the code path continues with non pi futex_wait, which is
obviously wrong. This leads to the situation where pthread_cond_wait returns,
but the mutex is held by different thread, so the mutex protection does not work
at all.

2) deadlock
This is a consequence of 1). If a thread is woken and it should not be, because
it does not hold the mutex, the woken_seq is increased anyway. This lead to
a deadlock, because if correct thread is waken, which actually owns the mutex,
it fails on woken_seq < wakeup_seq test. This means restart of pthread_cond_wait,
but the thread already owns the mutex.

pthread_cond_signal               pthread_cond_wait

                                  wait_requeue_pi is OK(this thread owns mutex)
                                  woken_seq >= wakeup_seq
                                  wait_requeue_pi again
                                  wait on cond_futex
lock mutex
signal cond_futex
unlock mutex

pthread_cond_signal is waiting on mutex which holds pthread_cond_wait thread and
the pthread_cond_wait thread owning the mutex waits on cond_futex which is
signaled from pthread_cond_signal.

Comment 20 Frantisek Hrbata 2011-08-16 15:49:32 UTC
Oops, I'm looking at the patch again and it will need to hold the cond_lock while increasing cond_futex and setting %edx. Now there is a race. I'll wait to see what do you think about it and I can fix that if desired. But I'm expecting that some libc guru will come with something better.

Comment 21 Frantisek Hrbata 2011-08-16 17:34:15 UTC
Created attachment 518541 [details]
proposed patch V2 with cond_lock

second version of the proposed patch with cond_lock

Comment 22 Frantisek Hrbata 2011-08-17 10:26:48 UTC
Just one more quick note. The patch is for x86_64 only. I haven't checked other archs.

Comment 30 Jeff Law 2011-12-22 16:00:13 UTC
Patch is causing problems in Fedora & Debian.  Disabled while issues are resolved.

Comment 33 Jeff Law 2012-01-09 20:40:18 UTC
For future reference, to reproduce the problems, install glibc with the 552960 patch on F16.  Then :

AUDIODRIVER=pulseaudio play -n -c1  synth whitenoise band -n 100 20 \
        band -n 50 20 gain +25 fade h 1 864000 1

Fails maybe 1 in 20 times, typically within the first 2-3 seconds.  Unfortunately all this code interacts poorly with gdb, so it's been quite difficult to determine what's going on on the user side.  Kernel side, printks are my best friend.

Comment 34 Jeff Law 2012-01-13 04:46:42 UTC
It looks like the EAGAIN path in the upstream fix is failing to bump total_seq.  With that fixed, the simplified test referenced in c#14 and c#15 runs millions of times.  And a test utilizing "play" from the sox package runs forever as well.

I'm going to need to sit down a look more closely at how the total_seq counter is used, but we may have this nailed down.

Comment 36 Jeff Law 2012-02-29 18:05:10 UTC
Unfortunately after making my patch to fix the total_seq counter available for wider testing, additional issues have been reported.

At this point I do not believe we can safely address this bug without a fairly high chance of introducing new regressions which I consider unacceptable for RHEL 6.3.  Thus I'm going to regretfully have to change this to a dev_nak and queue it for RHEL 6.4.

Comment 37 RHEL Product and Program Management 2012-07-10 07:23:43 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 38 RHEL Product and Program Management 2012-07-10 23:17:24 UTC
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.

Comment 40 Siddhesh Poyarekar 2012-09-10 14:12:48 UTC
I'm trying to look at why the repeated call for the (In reply to comment #19)
> Current implementation of wait_requeue_pi in libc does not handle situation
> when
> kernel returns -EAGAIN. Kernel tests if the actual futex(cond_futex) value
> is as
> expected(handled as 3rd($rdx) parameter to futex syscall). If it's not, it
> means
> some other thread increased the cond_futex value and we need to call
> wait_requeue_pi again. Not handling this situation means:

The trouble with this justification is that EAGAIN means the above only in case of FUTEX_CMP_REQUEUE, at least according to the man page for futex.

Anyway, I picked this up from the point of Andreas' patch, which is the following:


and used the reproducer in the upstream bug report to try and get the reason for the EAGAIN:


by using systemtap to figure out where the EAGAIN is coming from and I have narrowed it down to futex_wait_setup so far. The probe to see this is:

probe kernel.function("futex_wait_setup").return {
        if (execname() == "ld-linux-x86-64") {
                printf ("futex_wait_setup returned %ld\n", $return);
                print_backtrace ();

where the next-to-last output is seen as:

futex_wait_setup returned -11
Returning from:  0xffffffff810aed90 : futex_wait_setup+0x0/0xf0 [kernel]
Returning to  :  0xffffffff810af751 : futex_wait_requeue_pi+0x161/0x410 [kernel]
 0xffffffff810b0970 : do_futex+0x2f0/0xa50 [kernel]
 0xffffffff810b11da : sys_futex+0x10a/0x1a0 [kernel]
 0xffffffff81604729 : system_call_fastpath+0x16/0x1b [kernel]

the last being the one that is hung.

This is intriguing because I don't think futex_wait_setup is supposed to return an EAGAIN at all - it is documented in the code as only being able to return EWOULDBLOCK and EFAULT. I'm still trying to figure out what this means since there must be some case I may have missed.

Comment 41 Siddhesh Poyarekar 2012-09-11 11:42:41 UTC
I feel stupid - EWOULDBLOCK is EAGAIN, so that is how we get the EAGAIN.

Comment 43 Siddhesh Poyarekar 2012-10-17 12:56:54 UTC
*** Bug 854725 has been marked as a duplicate of this bug. ***

Comment 44 Siddhesh Poyarekar 2012-10-17 13:04:42 UTC
Created attachment 628788 [details]
Consolidated patch backported from upstream

Comment 45 IBM Bug Proxy 2012-10-17 13:08:29 UTC
Created attachment 628789 [details]
output of custom ps command during deadlock

Comment 46 IBM Bug Proxy 2012-10-17 13:08:38 UTC
Created attachment 628790 [details]
'ps -ALo pid,tid,pri,rtprio,stat,status,wchan:30,cmd' taken while the threads were "frozen"

Comment 47 IBM Bug Proxy 2012-10-17 13:08:46 UTC
Created attachment 628791 [details]
'ps -ALo pid,tid,pri,rtprio,stat,status,wchan:30,ucomm' taken while the threads were "frozen"

Comment 48 IBM Bug Proxy 2012-10-17 13:08:53 UTC
Created attachment 628792 [details]
'ps -ALo pid,tid,pri,rtprio,stat,status,wchan:30,cmd' taken while the threads were "frozen"

Comment 49 IBM Bug Proxy 2012-10-17 13:09:04 UTC
Created attachment 628796 [details]
ftrace events trace captured during the test

Comment 50 IBM Bug Proxy 2012-10-24 17:23:10 UTC
------- Comment From mkravetz@us.ibm.com 2012-10-24 17:14 EDT-------
Will this patch apply to the latest published version of glibc for RHEL 6?  I believe the version is:


Wanted to ask before I attempt to build a test RPM.

Comment 51 IBM Bug Proxy 2012-10-25 00:42:44 UTC
------- Comment From mkravetz@us.ibm.com 2012-10-25 00:37 EDT-------

The simplified reproducer hangs for me (always at a different count).  Perhaps I built the test RPMs incorrectly..

Does anyone from Red Hat have test RPMS available (i686 and x86_64)?

Comment 52 Siddhesh Poyarekar 2012-10-25 01:35:13 UTC
I have resubmitted a test build, so I should be able to get you test packages soon. The fix is scheduled for inclusion in rhel-6.5, so you won't see the fix in any of the published builds.

Comment 54 Siddhesh Poyarekar 2012-10-25 03:19:25 UTC
I have uploaded the test packages here:


Comment 55 IBM Bug Proxy 2012-10-26 00:33:08 UTC
------- Comment From mkravetz@us.ibm.com 2012-10-26 00:23 EDT-------
Thank you for the test images.  They appear to work well.

The IBM Java group should now perform some validation with these images in their test environment.

Comment 56 IBM Bug Proxy 2012-11-05 22:42:40 UTC
------- Comment From tpnoonan@us.ibm.com 2012-11-05 22:39 EDT-------
hi red hat, ibm's WebSphere RealTime will not work on MRG 2.x  due to this defect, can the fix for this defect be considered for rhel6.4 instead of rhel6.5? thanks

Comment 58 Siddhesh Poyarekar 2012-11-09 14:12:31 UTC
Please get in touch with your support contacts if you need the fix expedited.

Comment 59 Joseph Kachuck 2012-11-16 14:57:06 UTC
Per Comment 56 I am requesting this for exception for RHEL 6.4.

Thank You
Joe Kachuck

Comment 60 IBM Bug Proxy 2012-11-16 17:08:23 UTC
------- Comment From tpnoonan@us.ibm.com 2012-11-16 16:41 EDT-------
(In reply to comment #46)
> hi red hat, ibm's WebSphere RealTime will not work on MRG 2.x  due to this
> defect, can the fix for this defect be considered for rhel6.4 instead of
> rhel6.5? thanks

Our product has only been certified on the MRG 1.3 release which is
no longer supported by Red Hat."

Comment 64 Jeff Law 2012-11-23 15:41:30 UTC
This is not suitable for RHEL 6.4; it needs considerably more upstream and Fedora testing.  Getting this wrong has serious consequences for our customer base.

The upstream exposure is still relatively small at the moment as it's limited to upstream developer builds and Fedora rawhide.  We have one report which might be related to installing Siddhesh's patches into rawhide (we're still waiting a core file from the reporter for analysis).

This is really a 6.5 issue.

Comment 66 IBM Bug Proxy 2012-11-23 19:43:38 UTC
------- Comment From mstoodle@ca.ibm.com 2012-11-23 19:32 EDT-------
Ok, let me make sure I've got this right (I'm going to take some of your points out of order after starting off with my own point):
1) it's broken right now (hence this bug)
2) "The upstream exposure is still relatively small at the moment"

but from there, you went to :
3) "Getting this wrong has serious consequences for our customer base"
4) "This is not suitable for RHEL 6.4"
and capping off with:
5) "This is really a 6.5 issue"

I'm having trouble understanding the 3->4->5 sequence given 1 and 2, but maybe that's because I'm not appreciating the scope of the patch.

Does the patch affect non-MRG customers as well whereas this bug only concerns MRG customers?  Is that the issue?

How far away from RHEL 6.5 release are we? For the meantime...if we officially request a patch on RHEL 6.3 for this bug, will you officially support any customer using MRG 2.x on RHEL 6.3 with that patch (don't worry, we would send them to you directly to get the patch :) ).

Comment 67 Jeff Law 2012-11-27 19:34:34 UTC
The problem is there have been several attempts to fix this problem over the last few years, each of which has caused regressions of one form or another.  In every case I would consider the regressions caused by attempts to fix this bug to actually be worse than the original bug that's being fixed here (in terms of scope of the problem, number of users affected, etc).

Those regressions were not caught by the existing test suites, but by wide scale deployments of the patches by way of Debian, Fedora & Ubuntu.

Given the history of "fixes" for this problem causing regressions, the lack of widespread testing of the current proposed fix, the fact that this potentially affects every program using pthread condition variables and the fact that we're very late in the RHEL 6.4 release cycle, I can't in good conscience propose this fix be included in RHEL 6.4.

RHEL 6.4 hasn't even been released yet, so it's probably safe to assume it'll be several months before RHEL 6.5 would be released.

The support implications for MRG 2.x are something you'd need to discuss with your support contacts.

Comment 71 sdh4 2013-08-29 13:06:47 UTC
Is there any way to detect the fix for this bug from userland, so that we can tell whether it's safe to use priority-inherited mutexes?

Comment 72 Siddhesh Poyarekar 2013-08-30 02:22:22 UTC
Comment 44 has test cases that can be adapted to test independently.  Just replace 'do_test' with 'main' and compile as you normally would.

Comment 73 errata-xmlrpc 2013-11-21 10:38:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 74 IBM Bug Proxy 2013-11-22 14:52:21 UTC
------- Comment From mstoodle@ca.ibm.com 2013-11-22 14:48 EDT-------
Is there a minimum MRG 2 level needed to use RHEL 6.5?

Comment 75 Beth Uptagrafft 2013-11-22 16:32:25 UTC
The most recent Realtime release is the supported version and we use the most recent RHEL available at the time for our testing. RHEL6.5 was used for testing our MRG-2.4 release, which features the 3.8 kernel.

Note You need to log in before you can comment on or make changes to this bug.