The lglocks on PREEMPT_RT_FULL use for_each_online_cpu() to grab and release the per CPU lglocks. But if a task comes in and takes this lock(s), and then a CPU is taken offline, when it releases the locks it will release all but the lock that represents the CPU that went offline. Now if the CPU comes back online, the lock is again active. But this time, it has an owner that never released it. If another task takes the lglocks, it will block on this lock. It can even block the original owner of the lock.
Created attachment 567097 [details] Use a separate cpumask for taking lglocks, not the online mask The non-RT code for taking lglocks uses its own cpumask to take the locks. A CPU bit is set in the mask when it comes online and is never released. That means the locks will be taken for CPUs that are offline. But they are also released when the CPU is offline and it doesn't cause the issue where a lock may be left with an owner that abandoned it. This patch converts the RT side to simulate the non-RT and fixes the deadlocks.
This patch is equivalent to 7837aec git describe --contains 7837aec v3.2.14-rt24~7 Updating kernel-rt.spec to reflect this.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: RT Versions of lglock uses for_each_online_cpu() Consequence: locks can be reactived after a cpu comes online again, but with an owner that hasn't released it. This causes various problems such as blocking the original owner of the lock. Fix: Convert the RT versions to use the lglock specific cpumasks Result: Locks will be taken for CPUs that are offline. But they are also released when the CPU is offline and it doesn't cause the issue where a lock may be left with an owner that abandoned it.
This patch was added to 3.2.14-rt24 (upstream stable-rt).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-1282.html