Bug 400541
Summary: | hang due to __pause_nocancel - pi-futex fix may be needed from 2.6.21.7 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | David Holmes <David.Holmes> | ||||
Component: | realtime-kernel | Assignee: | Steven Rostedt <srostedt> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 1.0 | CC: | roland.westrelin, tglx, williams | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | 2.6.24.4-30.el5rt | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2008-04-07 14:20:59 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
David Holmes
2007-11-27 04:27:45 UTC
We are currently rebasing to 2.6.21.7 (previous was from 2.6.21.5) David, we'll let you know when it's available. Could you test it (then) to see if it solves the issues for you. We'll run it through our testsuite tonight. If it passes, we should have it available tomorrow. Of course if we have issues with it, it may take a bit more time. We will install and test as soon as practical. Thanks. We've done some kernel level tracing. We've found that: The failures occur when 3 threads (t1, t2, t3) operate on the same futex f. Here is the chain of events leading to the problem: - t1 is the owner of f - t2 tries to acquire f. It fails in userland so it uses the futex syscall with the FUTEX_LOCK_PI. t2 blocks on the kernel PI mutex associated with the futex in the rt_mutex_timed_lock() call in the futex_lock_pi() function. - t1 releases f. It uses the futex syscall with the FUTEX_UNLOCK_PI command. It finds t2 waiting on the futex and elects it as next owner of the futex. It sets f's user land value to the tid of t2 and releases the kernel PI mutex. - in the meantime, t2 receives a signal and returns from rt_mutex_timed_lock() with -EINTR. It does not own the kernel PI mutex. - t3 tries to acquire f. f's userland value contains t2's tid so f is not free. t3 enters the kernel with the FUTEX_LOCK_PI command and grabs the kernel PI mutex which is free (t2 failed to acquire it and t1 released it). - t2 now exits the futex_lock_pi() function and the kernel. It grabs the spinlock, but because rt_mutex_timed_lock() returns with an error and because it cannot grab the kernel PI mutex, the userland value of the futex is not modified: it still contains t2's tid. - t2 attempts the FUTEX_LOCK_PI command again because the previous attempt failed with a EINTR. One of the first check performed in futex_lock_pi() is against the userland value of the futex. It contains t2's tid. The futex syscall returns with EDEADLK. When the libc sees this error code, it hangs t2. The futex will eventually be back to a consistent state. t3 will exit from futex_lock_pi(). In the process, because it owns the kernel PI mutex while not being the recorded owner of the futex, the futex state will be fixed. An attempted fix follows (patch against redhat rt 2.6.21 kernel). When t2 detects on exit from futex_lock_pi() that it is recorded as owner of the futex while not owning the kernel PI mutex, it changes the userland futex value to that of the kernel PI mutex. --- kernel-2.6.21/linux-2.6.21.i686/kernel/futex.c 2008-01-04 14:54:55.000000000 +0000 +++ kernel-2.6.21-futexfix/linux-2.6.21.i686/kernel/futex.c 2008-01-04 14:02:01.000000000 +0000 @@ -532,22 +532,22 @@ * the refcount and return its pi_state: */ pi_state = this->pi_state; /* * Userspace might have messed up non PI and PI futexes */ if (unlikely(!pi_state)) return -EINVAL; WARN_ON(!atomic_read(&pi_state->refcount)); - WARN_ON(pid && pi_state->owner && - pi_state->owner->pid != pid); +/* WARN_ON(pid && pi_state->owner && */ +/* pi_state->owner->pid != pid); */ atomic_inc(&pi_state->refcount); *ps = pi_state; return 0; } } /* * We are the first waiter - try to look up the real owner and attach @@ -1905,20 +1905,41 @@ * Paranoia check. If we did not take the lock * in the trylock above, then we should not be * the owner of the rtmutex, neither the real * nor the pending one: */ if (rt_mutex_owner(&q.pi_state->pi_mutex) == curr) printk(KERN_ERR "futex_lock_pi: ret = %d " "pi-mutex: %p pi-state %p\n", ret, q.pi_state->pi_mutex.owner, q.pi_state->owner); + + if(q.pi_state->owner == curr) { + int ret; + struct task_struct *owner = rt_mutex_owner(&q.pi_state->pi_mutex); + u32 newtid = owner->pid | FUTEX_WAITERS; + u32 uval, curval, newval; + + ret = get_futex_value_locked(&uval, uaddr); + while (!ret) { + newval = (uval & FUTEX_OWNER_DIED) | newtid; + newval |= (uval & FUTEX_WAITER_REQUEUED); + + curval = cmpxchg_futex_value_locked(uaddr, uval, newval); + + if (curval == -EFAULT) + ret = -EFAULT; + if (curval == uval) + break; + uval = curval; + } + } } } /* Unqueue and drop the lock */ unqueue_me_pi(&q); futex_unlock_mm(fshared); return ret != -EINTR ? ret : -ERESTARTNOINTR; out_unlock_release_sem: Your analysis is correct. We have a transient state where the user space value is wrong. The fix is not completely correct, as it creates a new - although extremly tight - race window - due to the unlocked access to the rtmutex owner. Not sure yet, whether it matters or not. I have a closer look. Thanks, tglx Created attachment 291060 [details]
mainline fix
The attached patch is the fix for mainline. Roland confirmed that it fixes the
bug in mainline. Clark has a back port for rhel-rt for the new release.
can we confirm that the latest Red Hat RT kernel (2.6.24.1-24.el5rt) has the mainline fix? I just confirmed that this patch is in the 2.6.24.4-30.el5rt kernel Roland, if you concur, I think we can close this. Clark I confirm that the bug is fixed in 2.6.24.4-30.el5rt, It can be closed. We commenced more extensive testing on 2.6.24-30.el5rt and are finding a new failure - pthread_mutex_unlock is returning EPERM in cases where we definitely have the mutex locked. I noticed this fix operated on the "owner" a bit and was wondering whether it may have caused this new problem? |