Bug 865716

Summary: Fix race in takeover of a dead futex in futex_lock_pi
Product: [Fedora] Fedora Reporter: Siddhesh Poyarekar <spoyarek>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 19CC: gansalmon, itamar, jforbes, jonathan, kernel-maint, madhu.chinakonda, mnewsome
Target Milestone: ---Keywords: Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-08 04:32:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Siddhesh Poyarekar 2012-10-12 08:47:31 UTC
Description of problem:
The futex_lock_pi operation used by pthread_mutex_lock for robust, PI mutexes has a race when trying to take over a dead futex.  The higher level problem was reported in upstream glibc bugzilla:

http://sourceware.org/bugzilla/show_bug.cgi?id=14076

The race is between the futex_lock_pi_atomic and handle_futex_death.  futex_lock_pi_atomic assumes that a robust futex with TID==0 is fine for a direct takeover.  This assumption is wrong when there are waiters on the futex, since handle_futex_death wakes a blocked task with futex_wake.

How reproducible:
Consistently.

Steps to Reproduce:
Compile and run:

http://sourceware.org/bugzilla/attachment.cgi?id=6442

$ gcc -D_GNU_SOURCE futexCase1_r1.c -o futexCase1_r1 -lpthread
$ ./futexCase1_r1
  
Actual results:

8279: created mutex: 0xf7f1a000
8419: pthread_mutex_consistent_np failed: 0xf7f1a000 22 Invalid argument
8438: pthread_mutex_consistent_np failed: 0xf7f1a000 22 Invalid argument
8439: pthread_mutex_consistent_np failed: 0xf7f1a000 22 Invalid argument
…
8279: Done! lock concurrency: 0, max: 7
$

Expected results:

8279: created mutex: 0xf7f1a000
8279: Done! lock concurrency: 0, max: 7

Additional info:

I have posted a fix that works for me on lkml:

http://lkml.indiana.edu/hypermail/linux/kernel/1210.1/02508.html

Comment 1 Fedora End Of Life 2013-04-03 15:41:29 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 2 Justin M. Forbes 2013-04-05 19:14:53 UTC
Is this still an issue with the 3.9 kernels in F19?

Comment 3 Siddhesh Poyarekar 2013-04-08 04:32:19 UTC
This was fixed upstream with 59fa6245192159ab5e1e17b8e31f15afa9cff4bf, which has been in since 3.7.  It works fine on my F17 (3.8.4) now, so closing this as fixed.