Bug 799389 - lglocks can be taken and never released on cpu offline and onlining
Summary: lglocks can be taken and never released on cpu offline and onlining
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: 2.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: 2.2
: ---
Assignee: John Kacur
QA Contact: David Sommerseth
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-02 16:13 UTC by Steven Rostedt
Modified: 2016-05-22 23:34 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: RT Versions of lglock uses for_each_online_cpu() Consequence: locks can be reactived after a cpu comes online again, but with an owner that hasn't released it. This causes various problems such as blocking the original owner of the lock. Fix: Convert the RT versions to use the lglock specific cpumasks Result: Locks will be taken for CPUs that are offline. But they are also released when the CPU is offline and it doesn't cause the issue where a lock may be left with an owner that abandoned it.
Clone Of:
Environment:
Last Closed: 2012-09-19 18:03:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Use a separate cpumask for taking lglocks, not the online mask (3.62 KB, patch)
2012-03-02 16:15 UTC, Steven Rostedt
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:1282 0 normal SHIPPED_LIVE Moderate: kernel-rt security, bug fix, and enhancement update 2012-09-19 22:02:30 UTC

Description Steven Rostedt 2012-03-02 16:13:09 UTC
The lglocks on PREEMPT_RT_FULL use for_each_online_cpu() to grab and release the per CPU lglocks. But if a task comes in and takes this lock(s), and then a CPU is taken offline, when it releases the locks it will release all but the lock that represents the CPU that went offline. Now if the CPU comes back online, the lock is again active. But this time, it has an owner that never released it. If another task takes the lglocks, it will block on this lock. It can even block the original owner of the lock.

Comment 1 Steven Rostedt 2012-03-02 16:15:52 UTC
Created attachment 567097 [details]
Use a separate cpumask for taking lglocks, not the online mask

The non-RT code for taking lglocks uses its own cpumask to take the locks. A CPU bit is set in the mask when it comes online and is never released. That means the locks will be taken for CPUs that are offline. But they are also released when the CPU is offline and it doesn't cause the issue where a lock may be left with an owner that abandoned it.

This patch converts the RT side to simulate the non-RT and fixes the deadlocks.

Comment 2 John Kacur 2012-05-23 13:09:25 UTC
This patch is equivalent to 7837aec
git describe --contains 7837aec
v3.2.14-rt24~7

Updating kernel-rt.spec to reflect this.

Comment 5 John Kacur 2012-06-20 21:44:38 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: RT Versions of lglock uses for_each_online_cpu()
Consequence: locks can be reactived after a cpu comes online again, but with an owner that hasn't released it. This causes various problems such as blocking the original owner of the lock.
Fix: Convert the RT versions to use the lglock specific cpumasks
Result: Locks will be taken for CPUs that are offline. But they are also released when the CPU is offline and it doesn't cause the issue where a lock may be left with an owner that abandoned it.

Comment 7 Steven Rostedt 2012-07-03 14:27:30 UTC
This patch was added to 3.2.14-rt24 (upstream stable-rt).

Comment 10 errata-xmlrpc 2012-09-19 18:03:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1282.html


Note You need to log in before you can comment on or make changes to this bug.