The migrate timers code has a while (spin_trylock()); loop so that the spin lock that is converted to a mutex wont schedule out, because preemption is disabled at this point. This makes the mutex act more like a spinlock. But! If the task preempts the holder of this lock, and the holder of this lock will have preemption disabled (because that's what the RT kernel does to spin locks converted to mutexes; it disables migration when the lock is taken), this task will spin forever. The task has preemption disabled, it preempted the holder of the lock which is pinned to the current CPU, and now this task will spin waiting for the one it preempted to finish. But this task will never give up the CPU to let the other task finish. Dead lock!
Created attachment 567093 [details] Use cpu_local_var() and let the spin lock block (3.2-rt) Patch to fix 3.2-rt
Created attachment 567094 [details] Use cpu_local_var() and let the spin lock block (3.0-rt) Patch to fix 3.0-rt
The fix for this is equivalent to 7864ac1 git describe --contains 7864ac1 v3.2.14-rt24~11 Modifying the changelog in kernel-rt.spec to document this.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: spin_trylock in migrate_timers disables preemption Consequence: Deadlock Fix: Allow the lock to block (sleep), and protect data by disabling cpu migration. Result: Works as expected - no deadlock.
The 3.2 version of this patch was picked up and added to 3.2.14-rt24 (upstream stable).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-1282.html