Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 799389 - lglocks can be taken and never released on cpu offline and onlining
lglocks can be taken and never released on cpu offline and onlining
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
2.2
Unspecified Unspecified
unspecified Severity unspecified
: 2.2
: ---
Assigned To: John Kacur
David Sommerseth
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-02 11:13 EST by Steven Rostedt
Modified: 2016-05-22 19:34 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: RT Versions of lglock uses for_each_online_cpu() Consequence: locks can be reactived after a cpu comes online again, but with an owner that hasn't released it. This causes various problems such as blocking the original owner of the lock. Fix: Convert the RT versions to use the lglock specific cpumasks Result: Locks will be taken for CPUs that are offline. But they are also released when the CPU is offline and it doesn't cause the issue where a lock may be left with an owner that abandoned it.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-09-19 14:03:33 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Use a separate cpumask for taking lglocks, not the online mask (3.62 KB, patch)
2012-03-02 11:15 EST, Steven Rostedt
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:1282 normal SHIPPED_LIVE Moderate: kernel-rt security, bug fix, and enhancement update 2012-09-19 18:02:30 EDT

  None (edit)
Description Steven Rostedt 2012-03-02 11:13:09 EST
The lglocks on PREEMPT_RT_FULL use for_each_online_cpu() to grab and release the per CPU lglocks. But if a task comes in and takes this lock(s), and then a CPU is taken offline, when it releases the locks it will release all but the lock that represents the CPU that went offline. Now if the CPU comes back online, the lock is again active. But this time, it has an owner that never released it. If another task takes the lglocks, it will block on this lock. It can even block the original owner of the lock.
Comment 1 Steven Rostedt 2012-03-02 11:15:52 EST
Created attachment 567097 [details]
Use a separate cpumask for taking lglocks, not the online mask

The non-RT code for taking lglocks uses its own cpumask to take the locks. A CPU bit is set in the mask when it comes online and is never released. That means the locks will be taken for CPUs that are offline. But they are also released when the CPU is offline and it doesn't cause the issue where a lock may be left with an owner that abandoned it.

This patch converts the RT side to simulate the non-RT and fixes the deadlocks.
Comment 2 John Kacur 2012-05-23 09:09:25 EDT
This patch is equivalent to 7837aec
git describe --contains 7837aec
v3.2.14-rt24~7

Updating kernel-rt.spec to reflect this.
Comment 5 John Kacur 2012-06-20 17:44:38 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: RT Versions of lglock uses for_each_online_cpu()
Consequence: locks can be reactived after a cpu comes online again, but with an owner that hasn't released it. This causes various problems such as blocking the original owner of the lock.
Fix: Convert the RT versions to use the lglock specific cpumasks
Result: Locks will be taken for CPUs that are offline. But they are also released when the CPU is offline and it doesn't cause the issue where a lock may be left with an owner that abandoned it.
Comment 7 Steven Rostedt 2012-07-03 10:27:30 EDT
This patch was added to 3.2.14-rt24 (upstream stable-rt).
Comment 10 errata-xmlrpc 2012-09-19 14:03:33 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1282.html

Note You need to log in before you can comment on or make changes to this bug.