1302389 – Deadlocks Occurring in Shared Mutex Code

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1302389 - Deadlocks Occurring in Shared Mutex Code

Summary: Deadlocks Occurring in Shared Mutex Code

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	glibc
Sub Component:
Version:	6.7
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Torvald Riegel
QA Contact:	qe-baseos-tools-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1269194
TreeView+	depends on / blocked

Reported:	2016-01-27 17:05 UTC by Alan Matsuoka
Modified:	2020-06-11 12:47 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-04-03 15:48:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
reproducer (7.75 KB, text/x-csrc) 2016-01-27 17:05 UTC, Alan Matsuoka	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Sourceware	19004	0	P2	NEW	tst-robust8 fails intermittently on x86_64	2020-05-05 23:02:23 UTC
Sourceware	19402	0	P2	RESOLVED	Deadlock with robust shared mutex and asynchronous termination	2020-05-05 23:02:23 UTC

Description Alan Matsuoka 2016-01-27 17:05:07 UTC

Description of problem:
The test code creates an anonymous block of shared memory, initializes a robust process-shared mutex in the memory and forks child processes. Each child process publishes its process ID in the shared memory and then acquires and releases the mutex before exiting normally. The parent picks a child process, waits for it to publish its PID and then terminates it via kill. This creates a race condition where the child may or may not receive an asynchronous signal while acquiring, holding or releasing the mutex. The parent attempts to keep a certain number of child processes running by waiting for finished children and forking new ones.

The observed behaviour on multi-core hosts is that the program sometimes deadlocks, with “ps -l” reporting several processes in the tree blocked on the futex WCHAN. The expected behaviour is that the program runs to completion in a reasonable amount of time.

We’ve also noticed that in some cases applying “strace -p” to one of the blocked child processes causes the program to proceed further, although not necessarily to completion. Therefore, as a debugging aid the parent process also forks a “dummy” child which does not interact with the mutex but does have the shared memory mapped. This allows gdb to attach to this particular child process without interfering with any blocked futex system calls, and inspect the state of the mutex. See example session below:

$ gcc -Wall -std=gnu99 -g -o robust_test robust_test.c -lpthread
$ ./robust_test &
[1] 21937
$ Done 0 iterations, 11 pre_lock, 3 locks, 2 unlocks, 2 post_unlock 0 cleanups, 1 not done
Done 5 iterations, 21 pre_lock, 8 locks, 7 unlocks, 7 post_unlock 0 cleanups, 6 not done
Done 10 iterations, 46 pre_lock, 28 locks, 27 unlocks, 27 post_unlock 0 cleanups, 11 not done
Done 15 iterations, 76 pre_lock, 56 locks, 55 unlocks, 53 post_unlock 2 cleanups, 14 not done

$ ps -l
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
0 S 780379 20009 45137  0 80   0 -  6749 wait   pts/14   00:00:00 bash
0 S 780379 21937 20009  0 80   0 -  1526 futex_ pts/14   00:00:00 robust_test
1 S 780379 21938 21937  0 80   0 -  1525 pipe_w pts/14   00:00:00 robust_test
1 S 780379 22003 21937  0 80   0 -  1526 futex_ pts/14   00:00:00 robust_test
1 Z 780379 22007 21937  0 80   0 -     0 exit   pts/14   00:00:00 robust_test <defunct>
1 Z 780379 22008 21937  0 80   0 -     0 exit   pts/14   00:00:00 robust_test <defunct>
1 S 780379 22009 21937  0 80   0 -  1526 futex_ pts/14   00:00:00 robust_test
1 S 780379 22010 21937  0 80   0 -  1526 futex_ pts/14   00:00:00 robust_test
1 S 780379 22011 21937  0 80   0 -  1526 futex_ pts/14   00:00:00 robust_test
1 S 780379 22013 21937  0 80   0 -  1526 futex_ pts/14   00:00:00 robust_test
1 Z 780379 22014 21937  0 80   0 -     0 exit   pts/14   00:00:00 robust_test <defunct>
0 R 780379 22015 20009  0 80   0 -  2820 -      pts/14   00:00:00 ps
0 S 780379 45137 42638  0 80   0 -  6779 wait   pts/14   00:00:00 bash

$ strace -p 22013
Process 22013 attached - interrupt to quit
futex(0x7f6a66369000, FUTEX_WAIT, 2147505585, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f6a66369000, FUTEX_WAKE, 1)    = 1
exit_group(0)                           = ?
Process 22013 detached
$ Done 20 iterations, 105 pre_lock, 81 locks, 80 unlocks, 76 post_unlock 4 cleanups, 17 not done
Done 25 iterations, 115 pre_lock, 86 locks, 85 unlocks, 81 post_unlock 4 cleanups, 22 not done
Done 30 iterations, 139 pre_lock, 105 locks, 104 unlocks, 100 post_unlock 4 cleanups, 27 not done
Done 35 iterations, 163 pre_lock, 124 locks, 123 unlocks, 119 post_unlock 4 cleanups, 32 not done

$ gdb -p 21938
GNU gdb (GDB) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 21938
Reading symbols from robust_test...done.
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
0x00000035ab00e740 in __read_nocancel () from /lib64/libpthread.so.0
(gdb) up
#1  0x0000000000401389 in main (argc=1, argv=0x7fff038d30d8) at robust_test.c:251
251               read(shared->pipe_fds[0], &dummy, sizeof(dummy));
(gdb) print shared->top_mutex
$1 = {
  __data = {
    __lock = 0, 
    __count = 1, 
    __owner = 0, 
    __nusers = 0, 
    __kind = 144, 
    __spins = 0, 
    __list = {
      __prev = 0x0, 
      __next = 0x0
    }
  }, 
  __size = "\000\000\000\000\001", '\000' <repeats 11 times>, "\220", '\000' <repeats 22 times>, 
  __align = 4294967296
}

Version-Release number of selected component (if applicable):

6.7 
How reproducible:
Always on hosts with multiple CPU cores. Occasionally on hosts with two CPU  cores.

Host has glibc-2.12-1.166.el6_7.3 and 2.6.32-573.7.1.el6.x86_64

Steps to Reproduce:
1.compile and run attached reproducer
2.
3.

Actual results:

runs into deadlock 
Expected results:

run to completion

Additional info:

Issue : Program going into deadlock (waiting on futex) . Please read through case description and comment #2 (which explains 2 scenarios)
This happens on system with two CPU's. However, the second test case (comment #2) happens on both single/multiple CPU system.

This is reproducible on my test RHEL 6.7 system with latest glibc and kernel versions.

Need help in analysing the program and take this ahead to Engineering if this turns out to be a bug.
kernel-2.6.32-573.8.1.el6.x86_64
glibc-2.12-1.166.el6_7.3.x86_64

# ./robust_test 
Done 0 iterations, 6 pre_lock, 2 locks, 1 unlocks, 1 post_unlock 0 cleanups, 0 not done
Done 5 iterations, 31 pre_lock, 21 locks, 20 unlocks, 19 post_unlock 1 cleanups, 4 not done
Done 10 iterations, 42 pre_lock, 26 locks, 25 unlocks, 24 post_unlock 1 cleanups, 9 not done
Done 15 iterations, 52 pre_lock, 31 locks, 30 unlocks, 29 post_unlock 1 cleanups, 14 not done
Done 20 iterations, 62 pre_lock, 36 locks, 35 unlocks, 34 post_unlock 1 cleanups, 19 not done
Done 25 iterations, 72 pre_lock, 41 locks, 40 unlocks, 39 post_unlock 1 cleanups, 24 not done
Done 30 iterations, 94 pre_lock, 58 locks, 57 unlocks, 56 post_unlock 1 cleanups, 29 not done
Done 35 iterations, 109 pre_lock, 68 locks, 67 unlocks, 66 post_unlock 1 cleanups, 34 not done
Done 40 iterations, 119 pre_lock, 73 locks, 72 unlocks, 71 post_unlock 1 cleanups, 39 not done
Done 45 iterations, 129 pre_lock, 78 locks, 77 unlocks, 76 post_unlock 1 cleanups, 44 not done
Success 144 pre_lock, 96 locks, 96 unlocks, 95 post_unlock 1 cleanups, 48 not done

# ./robust_test 
Done 0 iterations, 5 pre_lock, 2 locks, 1 unlocks, 1 post_unlock 0 cleanups, 0 not done
Done 5 iterations, 19 pre_lock, 8 locks, 7 unlocks, 6 post_unlock 0 cleanups, 4 not done
Done 10 iterations, 30 pre_lock, 14 locks, 13 unlocks, 12 post_unlock 0 cleanups, 9 not done
Done 15 iterations, 40 pre_lock, 19 locks, 18 unlocks, 17 post_unlock 0 cleanups, 14 not done
Done 20 iterations, 51 pre_lock, 25 locks, 24 unlocks, 23 post_unlock 0 cleanups, 19 not done
Done 25 iterations, 61 pre_lock, 30 locks, 29 unlocks, 28 post_unlock 0 cleanups, 24 not done
Done 30 iterations, 71 pre_lock, 35 locks, 34 unlocks, 33 post_unlock 0 cleanups, 29 not done

This happens occassionally, you may need to repeat ./robust_test run a few times. (happens once in every 5-6 run)
# ps axwll
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
0     0  2774  2333  20   0   6096   512 futex_ S+   pts/0      0:00 ./robust_test
1     0  2775  2774  20   0   6092   156 pipe_w S+   pts/0      0:00 ./robust_test
1     0  2811  2774  20   0      0     0 exit   Z+   pts/0      0:00 [robust_test] <defunct>
1     0  2812  2774  20   0   6096   184 futex_ S+   pts/0      0:00 ./robust_test
1     0  2813  2774  20   0   6096   184 futex_ S+   pts/0      0:00 ./robust_test
1     0  2814  2774  20   0      0     0 exit   Z+   pts/0      0:00 [robust_test] <defunct>
1     0  2815  2774  20   0   6096   184 futex_ S+   pts/0      0:00 ./robust_test
1     0  2816  2774  20   0   6096   184 futex_ S+   pts/0      0:00 ./robust_test
1     0  2817  2774  20   0   6096   184 futex_ S+   pts/0      0:00 ./robust_test
1     0  2818  2774  20   0      0     0 exit   Z+   pts/0      0:00 [robust_test] <defunct>


Running this under valgrind showed that there are recursive locks 

valgrind -v --tool=helgrind --log-file=test3.log ./robust_test

==36495== Thread #1: Attempt to re-lock a non-recursive lock I already hold

==36495== ---Thread-Announcement------------------------------------------
==36495==
==36495== Thread #1 is the program's root thread
==36495==
==36495== ----------------------------------------------------------------
==36495==
==36495== Thread #1: Attempt to re-lock a non-recursive lock I already hold
==36495==    at 0x4A0EAB9: pthread_mutex_lock (in /opt/rh/devtoolset-3/root/usr/lib64/valgrind/vgprelo
ad_helgrind-amd64-linux.so)
==36495==    by 0x40106A: lock_mutex (robust_test.c:172)
==36495==    by 0x401175: run_child (robust_test.c:212)
==36495==    by 0x4011D2: fork_child (robust_test.c:227)
==36495==    by 0x401450: main (robust_test.c:297)
==36495==  Lock was previously acquired
==36495==    at 0x4A0EB61: pthread_mutex_lock (in /opt/rh/devtoolset-3/root/usr/lib64/valgrind/vgprelo
ad_helgrind-amd64-linux.so)
==36495==    by 0x40106A: lock_mutex (robust_test.c:172)
==36495==    by 0x4013BA: main (robust_test.c:287)
==36495==

Changing the code in the attached reproducer 
[alanm@vmw102 C01551754]$ diff -dcp robust_test.c robust_testR.c 
*** robust_test.c       2015-12-10 09:50:09.000000000 -0500
--- robust_testR.c      2015-12-10 10:40:08.804550857 -0500
*************** void init_mutex(pthread_mutex_t* mutex)
*** 71,77 ****
  {
        pthread_mutexattr_t attr;
        CHECKED(pthread_mutexattr_init, &attr);
!       CHECKED(pthread_mutexattr_settype, &attr, PTHREAD_MUTEX_NORMAL);
        CHECKED(pthread_mutexattr_setpshared, &attr, PTHREAD_PROCESS_SHARED);
        CHECKED(pthread_mutexattr_setrobust, &attr, PTHREAD_MUTEX_ROBUST);
  
--- 71,77 ----
  {
        pthread_mutexattr_t attr;
        CHECKED(pthread_mutexattr_init, &attr);
!       CHECKED(pthread_mutexattr_settype, &attr, PTHREAD_MUTEX_RECURSIVE);
        CHECKED(pthread_mutexattr_setpshared, &attr, PTHREAD_PROCESS_SHARED);
        CHECKED(pthread_mutexattr_setrobust, &attr, PTHREAD_MUTEX_ROBUST);

still caused it to go into deadlock.

There are a couple of upstream BZ's that seem to apply to this.
https://sourceware.org/bugzilla/show_bug.cgi?id=19004
and https://sourceware.org/bugzilla/show_bug.cgi?id=19402

Comment 1 Alan Matsuoka 2016-01-27 17:05:52 UTC

Created attachment 1118834 [details]
reproducer

Comment 27 Carlos O'Donell 2017-04-03 15:48:10 UTC

Summary:
========

RHEL 6.10 is in production phase 3 and only selected Urgent Priority Bug Fixes will be considered. The upstream bugs that have been fixed are all pre-existing issues in RHEL 6 and not new regressions. Fixing these bugs in RHEL 6 could have significant destabilizing effects on the robust mutex implementation. It is with this consideration that I'm marking this issue as CLOSED/WONTFIX for RHEL 6.

We will continue to enhance RHEL 7 with any fixes required for robust mutex support.

Technical details:
==================

The process shared robust mutex support was audited by the Red Hat Platform Tools Base OS team (glibc team) and it was found that there were several key defects that could cause deadlocks and hangs in the implementation under certain conditions.

The team has since then fixed several upstream issues including:

Fix lost wakeups with robust mutex:
https://sourceware.org/bugzilla/show_bug.cgi?id=20973

Fix x86 robust mutex assembly code:
https://sourceware.org/bugzilla/show_bug.cgi?id=20985

Deadlock with robust shared mutex:
https://sourceware.org/bugzilla/show_bug.cgi?id=19402

Correction of compiler barriers around robust shared mutex code:
Commit 8f9450a0b7a9e78267e8ae1ab1000ebca08e473e

The most complex problem here is that the existing x86 assembly optimized robust mutex code is simply wrong with regards to condition handling and it's removal allows the use of the generic C version and reliance on modern compiler optimizations to achieve an optimal balance between provably correct and performance. This kind of change is not possible to make in the RHEL 6 release since such a change would drastically change the behaviour of the robust mutex.

To compound things there is still one more case which is too costly to be fixed without kernel help. One might call it a design flaw in the shared algorithm between the kernel and userspace:
"File corruption race condition in robust mutex unlocking"
https://sourceware.org/bugzilla/show_bug.cgi?id=14485

Bug 14485 is not intended to be alarmist and the race window is considered exceedingly small, but still there, like other latent bugs we don't know about in the implementation.

Thus the amount of change required, and future design constraints, make this risky for RHEL 6. The risks are quite hight the changes could destabilize products already built upon RHEL 6. In RHEL 7 we have a lot more flexibility when it comes to fixing these issues (except 14485 where it needs a broader fix upstream).

As a final note, here is Torvald Riegel's excellent technical summary of the current status:
https://sourceware.org/ml/libc-alpha/2016-12/msg00950.html

Again, we recommend working with Red Hat to ensure RHEL 7 meets your needs with respect to robust mutexes.

Note You need to log in before you can comment on or make changes to this bug.