Bug 141282

Summary: nptl futex_wait fix
Product: Red Hat Enterprise Linux 3 Reporter: Marty Wesley <mwesley>
Component: kernelAssignee: Ingo Molnar <mingo>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: jakub, peterm, petrides, riel, supermike, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
URL: http://post-office.corp.redhat.com/archives/tech-list/2004-November/msg00793.html
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-05-18 13:28:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 132991    

Description Marty Wesley 2004-11-30 02:00:33 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041111 Firefox/1.0

Description of problem:
Post from Jakub about this upstream patch:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10-rc1/2.6.10-rc1-mm5/broken-out/futex_wait-fix.patch

Both 2.4.x and 2.6.x futex.c is broken (racy).
Unfortunately Jamie Lokier does not acknowledge that (yet), though
he promissed he'll reply next day and has not done that in more than a
week
already.
For 2.4.x I believe it is easily fixable, like with the patch below
(untested though).  That is because we hold a spinlock accross the uaccess
anyway (by pinning the page first, then taking the lock, uaccessing,
etc.).
For 2.6.x the fix is harder, but I have posted some ideas to lkml.

Here is a mail I sent in September:

On Tue, Sep 07, 2004 at 08:04:55AM +0200, Ingo Molnar wrote:
> 
> Seto san, please find Jakub's reply below. He too agrees with your
> analysis and patch. Please send it to Andrew for upstream inclusion, i
> think it should get into 2.6.9.

Well, I think it would be cheaper to just move the __queue_me call down,
as in (untested RHEL3 patch below).
The 2.6.9 patch will likely be quite different though.

--- linux-2.4.21-20.EL/kernel/futex.c	2004-08-18 20:29:54.000000000 -0400
+++ linux-2.4.21-20.EL/kernel/futex.c	2004-09-07 03:33:34.035447554 -0400
@@ -346,23 +346,26 @@ static inline int futex_wait(unsigned lo
 		unlock_futex_mm();
 		return -EFAULT;
 	}
-	__queue_me(&q, page, uaddr, offset, -1, NULL);
 
 	/*
 	 * Page is pinned, but may no longer be in this address space.
 	 * It cannot schedule, so we access it with the spinlock held.
 	 */
-	if (!access_ok(VERIFY_READ, uaddr, 4))
-		goto out_fault;
+	if (!access_ok(VERIFY_READ, uaddr, 4)) {
+		ret = -EFAULT;
+		goto out_unlock;
+	}
 	kaddr = kmap_atomic(page, KM_USER0);
 	curval = *(int*)(kaddr + offset);
 	kunmap_atomic(kaddr, KM_USER0);
 
 	if (curval != val) {
-		unlock_futex_mm();
 		ret = -EWOULDBLOCK;
-		goto out;
+		goto out_unlock;
 	}
+
+	__queue_me(&q, page, uaddr, offset, -1, NULL);
+
 	/*
 	 * The get_user() above might fault and schedule so we
 	 * cannot just set TASK_INTERRUPTIBLE state when queueing
@@ -372,10 +375,9 @@ static inline int futex_wait(unsigned lo
 	 */
 	add_wait_queue(&q.waiters, &wait);
 	set_current_state(TASK_INTERRUPTIBLE);
-	if (!list_empty(&q.list)) {
-		unlock_futex_mm();
-		time = schedule_timeout(time);
-	}
+	BUG_ON (list_empty(&q.list));
+	unlock_futex_mm();
+	time = schedule_timeout(time);
 	set_current_state(TASK_RUNNING);
 	/*
 	 * NOTE: we don't remove ourselves from the waitqueue because
@@ -395,10 +397,10 @@ out:
 
 	return ret;
 
-out_fault:
+out_unlock:
 	unlock_futex_mm();
-	ret = -EFAULT;
-	goto out;
+	put_page(q.page);
+	return ret;
 }
 
 long do_futex(unsigned long uaddr, int op, int val, unsigned long
timeout,


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
 

Additional info:

Comment 2 Ernie Petrides 2005-01-29 05:54:48 UTC
A fix for this problem has just been committed to the RHEL3 U5
patch pool this evening (in kernel version 2.4.21-27.10.EL).


Comment 6 Tim Powers 2005-05-18 13:28:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-294.html


Comment 7 mike 2005-09-12 16:41:07 UTC
pls see Bug 166292
patch does not seem to fix issues, (in 2.4.21-32.0.1.ELsmp)