Hide Forgot
Description of problem: The test code creates an anonymous block of shared memory, initializes a robust process-shared mutex in the memory and forks child processes. Each child process publishes its process ID in the shared memory and then acquires and releases the mutex before exiting normally. The parent picks a child process, waits for it to publish its PID and then terminates it via kill. This creates a race condition where the child may or may not receive an asynchronous signal while acquiring, holding or releasing the mutex. The parent attempts to keep a certain number of child processes running by waiting for finished children and forking new ones. The observed behaviour on multi-core hosts is that the program sometimes deadlocks, with “ps -l” reporting several processes in the tree blocked on the futex WCHAN. The expected behaviour is that the program runs to completion in a reasonable amount of time. We’ve also noticed that in some cases applying “strace -p” to one of the blocked child processes causes the program to proceed further, although not necessarily to completion. Therefore, as a debugging aid the parent process also forks a “dummy” child which does not interact with the mutex but does have the shared memory mapped. This allows gdb to attach to this particular child process without interfering with any blocked futex system calls, and inspect the state of the mutex. See example session below: $ gcc -Wall -std=gnu99 -g -o robust_test robust_test.c -lpthread $ ./robust_test & [1] 21937 $ Done 0 iterations, 11 pre_lock, 3 locks, 2 unlocks, 2 post_unlock 0 cleanups, 1 not done Done 5 iterations, 21 pre_lock, 8 locks, 7 unlocks, 7 post_unlock 0 cleanups, 6 not done Done 10 iterations, 46 pre_lock, 28 locks, 27 unlocks, 27 post_unlock 0 cleanups, 11 not done Done 15 iterations, 76 pre_lock, 56 locks, 55 unlocks, 53 post_unlock 2 cleanups, 14 not done $ ps -l F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 0 S 780379 20009 45137 0 80 0 - 6749 wait pts/14 00:00:00 bash 0 S 780379 21937 20009 0 80 0 - 1526 futex_ pts/14 00:00:00 robust_test 1 S 780379 21938 21937 0 80 0 - 1525 pipe_w pts/14 00:00:00 robust_test 1 S 780379 22003 21937 0 80 0 - 1526 futex_ pts/14 00:00:00 robust_test 1 Z 780379 22007 21937 0 80 0 - 0 exit pts/14 00:00:00 robust_test <defunct> 1 Z 780379 22008 21937 0 80 0 - 0 exit pts/14 00:00:00 robust_test <defunct> 1 S 780379 22009 21937 0 80 0 - 1526 futex_ pts/14 00:00:00 robust_test 1 S 780379 22010 21937 0 80 0 - 1526 futex_ pts/14 00:00:00 robust_test 1 S 780379 22011 21937 0 80 0 - 1526 futex_ pts/14 00:00:00 robust_test 1 S 780379 22013 21937 0 80 0 - 1526 futex_ pts/14 00:00:00 robust_test 1 Z 780379 22014 21937 0 80 0 - 0 exit pts/14 00:00:00 robust_test <defunct> 0 R 780379 22015 20009 0 80 0 - 2820 - pts/14 00:00:00 ps 0 S 780379 45137 42638 0 80 0 - 6779 wait pts/14 00:00:00 bash $ strace -p 22013 Process 22013 attached - interrupt to quit futex(0x7f6a66369000, FUTEX_WAIT, 2147505585, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0x7f6a66369000, FUTEX_WAKE, 1) = 1 exit_group(0) = ? Process 22013 detached $ Done 20 iterations, 105 pre_lock, 81 locks, 80 unlocks, 76 post_unlock 4 cleanups, 17 not done Done 25 iterations, 115 pre_lock, 86 locks, 85 unlocks, 81 post_unlock 4 cleanups, 22 not done Done 30 iterations, 139 pre_lock, 105 locks, 104 unlocks, 100 post_unlock 4 cleanups, 27 not done Done 35 iterations, 163 pre_lock, 124 locks, 123 unlocks, 119 post_unlock 4 cleanups, 32 not done $ gdb -p 21938 GNU gdb (GDB) 7.10 Copyright (C) 2015 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word". Attaching to process 21938 Reading symbols from robust_test...done. Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. 0x00000035ab00e740 in __read_nocancel () from /lib64/libpthread.so.0 (gdb) up #1 0x0000000000401389 in main (argc=1, argv=0x7fff038d30d8) at robust_test.c:251 251 read(shared->pipe_fds[0], &dummy, sizeof(dummy)); (gdb) print shared->top_mutex $1 = { __data = { __lock = 0, __count = 1, __owner = 0, __nusers = 0, __kind = 144, __spins = 0, __list = { __prev = 0x0, __next = 0x0 } }, __size = "\000\000\000\000\001", '\000' <repeats 11 times>, "\220", '\000' <repeats 22 times>, __align = 4294967296 } Version-Release number of selected component (if applicable): 6.7 How reproducible: Always on hosts with multiple CPU cores. Occasionally on hosts with two CPU cores. Host has glibc-2.12-1.166.el6_7.3 and 2.6.32-573.7.1.el6.x86_64 Steps to Reproduce: 1.compile and run attached reproducer 2. 3. Actual results: runs into deadlock Expected results: run to completion Additional info: Issue : Program going into deadlock (waiting on futex) . Please read through case description and comment #2 (which explains 2 scenarios) This happens on system with two CPU's. However, the second test case (comment #2) happens on both single/multiple CPU system. This is reproducible on my test RHEL 6.7 system with latest glibc and kernel versions. Need help in analysing the program and take this ahead to Engineering if this turns out to be a bug. kernel-2.6.32-573.8.1.el6.x86_64 glibc-2.12-1.166.el6_7.3.x86_64 # ./robust_test Done 0 iterations, 6 pre_lock, 2 locks, 1 unlocks, 1 post_unlock 0 cleanups, 0 not done Done 5 iterations, 31 pre_lock, 21 locks, 20 unlocks, 19 post_unlock 1 cleanups, 4 not done Done 10 iterations, 42 pre_lock, 26 locks, 25 unlocks, 24 post_unlock 1 cleanups, 9 not done Done 15 iterations, 52 pre_lock, 31 locks, 30 unlocks, 29 post_unlock 1 cleanups, 14 not done Done 20 iterations, 62 pre_lock, 36 locks, 35 unlocks, 34 post_unlock 1 cleanups, 19 not done Done 25 iterations, 72 pre_lock, 41 locks, 40 unlocks, 39 post_unlock 1 cleanups, 24 not done Done 30 iterations, 94 pre_lock, 58 locks, 57 unlocks, 56 post_unlock 1 cleanups, 29 not done Done 35 iterations, 109 pre_lock, 68 locks, 67 unlocks, 66 post_unlock 1 cleanups, 34 not done Done 40 iterations, 119 pre_lock, 73 locks, 72 unlocks, 71 post_unlock 1 cleanups, 39 not done Done 45 iterations, 129 pre_lock, 78 locks, 77 unlocks, 76 post_unlock 1 cleanups, 44 not done Success 144 pre_lock, 96 locks, 96 unlocks, 95 post_unlock 1 cleanups, 48 not done # ./robust_test Done 0 iterations, 5 pre_lock, 2 locks, 1 unlocks, 1 post_unlock 0 cleanups, 0 not done Done 5 iterations, 19 pre_lock, 8 locks, 7 unlocks, 6 post_unlock 0 cleanups, 4 not done Done 10 iterations, 30 pre_lock, 14 locks, 13 unlocks, 12 post_unlock 0 cleanups, 9 not done Done 15 iterations, 40 pre_lock, 19 locks, 18 unlocks, 17 post_unlock 0 cleanups, 14 not done Done 20 iterations, 51 pre_lock, 25 locks, 24 unlocks, 23 post_unlock 0 cleanups, 19 not done Done 25 iterations, 61 pre_lock, 30 locks, 29 unlocks, 28 post_unlock 0 cleanups, 24 not done Done 30 iterations, 71 pre_lock, 35 locks, 34 unlocks, 33 post_unlock 0 cleanups, 29 not done This happens occassionally, you may need to repeat ./robust_test run a few times. (happens once in every 5-6 run) # ps axwll F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 0 0 2774 2333 20 0 6096 512 futex_ S+ pts/0 0:00 ./robust_test 1 0 2775 2774 20 0 6092 156 pipe_w S+ pts/0 0:00 ./robust_test 1 0 2811 2774 20 0 0 0 exit Z+ pts/0 0:00 [robust_test] <defunct> 1 0 2812 2774 20 0 6096 184 futex_ S+ pts/0 0:00 ./robust_test 1 0 2813 2774 20 0 6096 184 futex_ S+ pts/0 0:00 ./robust_test 1 0 2814 2774 20 0 0 0 exit Z+ pts/0 0:00 [robust_test] <defunct> 1 0 2815 2774 20 0 6096 184 futex_ S+ pts/0 0:00 ./robust_test 1 0 2816 2774 20 0 6096 184 futex_ S+ pts/0 0:00 ./robust_test 1 0 2817 2774 20 0 6096 184 futex_ S+ pts/0 0:00 ./robust_test 1 0 2818 2774 20 0 0 0 exit Z+ pts/0 0:00 [robust_test] <defunct> Running this under valgrind showed that there are recursive locks valgrind -v --tool=helgrind --log-file=test3.log ./robust_test ==36495== Thread #1: Attempt to re-lock a non-recursive lock I already hold ==36495== ---Thread-Announcement------------------------------------------ ==36495== ==36495== Thread #1 is the program's root thread ==36495== ==36495== ---------------------------------------------------------------- ==36495== ==36495== Thread #1: Attempt to re-lock a non-recursive lock I already hold ==36495== at 0x4A0EAB9: pthread_mutex_lock (in /opt/rh/devtoolset-3/root/usr/lib64/valgrind/vgprelo ad_helgrind-amd64-linux.so) ==36495== by 0x40106A: lock_mutex (robust_test.c:172) ==36495== by 0x401175: run_child (robust_test.c:212) ==36495== by 0x4011D2: fork_child (robust_test.c:227) ==36495== by 0x401450: main (robust_test.c:297) ==36495== Lock was previously acquired ==36495== at 0x4A0EB61: pthread_mutex_lock (in /opt/rh/devtoolset-3/root/usr/lib64/valgrind/vgprelo ad_helgrind-amd64-linux.so) ==36495== by 0x40106A: lock_mutex (robust_test.c:172) ==36495== by 0x4013BA: main (robust_test.c:287) ==36495== Changing the code in the attached reproducer [alanm@vmw102 C01551754]$ diff -dcp robust_test.c robust_testR.c *** robust_test.c 2015-12-10 09:50:09.000000000 -0500 --- robust_testR.c 2015-12-10 10:40:08.804550857 -0500 *************** void init_mutex(pthread_mutex_t* mutex) *** 71,77 **** { pthread_mutexattr_t attr; CHECKED(pthread_mutexattr_init, &attr); ! CHECKED(pthread_mutexattr_settype, &attr, PTHREAD_MUTEX_NORMAL); CHECKED(pthread_mutexattr_setpshared, &attr, PTHREAD_PROCESS_SHARED); CHECKED(pthread_mutexattr_setrobust, &attr, PTHREAD_MUTEX_ROBUST); --- 71,77 ---- { pthread_mutexattr_t attr; CHECKED(pthread_mutexattr_init, &attr); ! CHECKED(pthread_mutexattr_settype, &attr, PTHREAD_MUTEX_RECURSIVE); CHECKED(pthread_mutexattr_setpshared, &attr, PTHREAD_PROCESS_SHARED); CHECKED(pthread_mutexattr_setrobust, &attr, PTHREAD_MUTEX_ROBUST); still caused it to go into deadlock. There are a couple of upstream BZ's that seem to apply to this. https://sourceware.org/bugzilla/show_bug.cgi?id=19004 and https://sourceware.org/bugzilla/show_bug.cgi?id=19402
Created attachment 1118834 [details] reproducer
Summary: ======== RHEL 6.10 is in production phase 3 and only selected Urgent Priority Bug Fixes will be considered. The upstream bugs that have been fixed are all pre-existing issues in RHEL 6 and not new regressions. Fixing these bugs in RHEL 6 could have significant destabilizing effects on the robust mutex implementation. It is with this consideration that I'm marking this issue as CLOSED/WONTFIX for RHEL 6. We will continue to enhance RHEL 7 with any fixes required for robust mutex support. Technical details: ================== The process shared robust mutex support was audited by the Red Hat Platform Tools Base OS team (glibc team) and it was found that there were several key defects that could cause deadlocks and hangs in the implementation under certain conditions. The team has since then fixed several upstream issues including: Fix lost wakeups with robust mutex: https://sourceware.org/bugzilla/show_bug.cgi?id=20973 Fix x86 robust mutex assembly code: https://sourceware.org/bugzilla/show_bug.cgi?id=20985 Deadlock with robust shared mutex: https://sourceware.org/bugzilla/show_bug.cgi?id=19402 Correction of compiler barriers around robust shared mutex code: Commit 8f9450a0b7a9e78267e8ae1ab1000ebca08e473e The most complex problem here is that the existing x86 assembly optimized robust mutex code is simply wrong with regards to condition handling and it's removal allows the use of the generic C version and reliance on modern compiler optimizations to achieve an optimal balance between provably correct and performance. This kind of change is not possible to make in the RHEL 6 release since such a change would drastically change the behaviour of the robust mutex. To compound things there is still one more case which is too costly to be fixed without kernel help. One might call it a design flaw in the shared algorithm between the kernel and userspace: "File corruption race condition in robust mutex unlocking" https://sourceware.org/bugzilla/show_bug.cgi?id=14485 Bug 14485 is not intended to be alarmist and the race window is considered exceedingly small, but still there, like other latent bugs we don't know about in the implementation. Thus the amount of change required, and future design constraints, make this risky for RHEL 6. The risks are quite hight the changes could destabilize products already built upon RHEL 6. In RHEL 7 we have a lot more flexibility when it comes to fixing these issues (except 14485 where it needs a broader fix upstream). As a final note, here is Torvald Riegel's excellent technical summary of the current status: https://sourceware.org/ml/libc-alpha/2016-12/msg00950.html Again, we recommend working with Red Hat to ensure RHEL 7 meets your needs with respect to robust mutexes.