Bug 799389

Summary: lglocks can be taken and never released on cpu offline and onlining
Product: Red Hat Enterprise MRG Reporter: Steven Rostedt <srostedt>
Component: realtime-kernelAssignee: John Kacur <jkacur>
Status: CLOSED ERRATA QA Contact: David Sommerseth <davids>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.2CC: bhu, jkacur, jkastner, lgoncalv, ovasik
Target Milestone: 2.2   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: RT Versions of lglock uses for_each_online_cpu() Consequence: locks can be reactived after a cpu comes online again, but with an owner that hasn't released it. This causes various problems such as blocking the original owner of the lock. Fix: Convert the RT versions to use the lglock specific cpumasks Result: Locks will be taken for CPUs that are offline. But they are also released when the CPU is offline and it doesn't cause the issue where a lock may be left with an owner that abandoned it.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-19 18:03:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Use a separate cpumask for taking lglocks, not the online mask none

Description Steven Rostedt 2012-03-02 16:13:09 UTC
The lglocks on PREEMPT_RT_FULL use for_each_online_cpu() to grab and release the per CPU lglocks. But if a task comes in and takes this lock(s), and then a CPU is taken offline, when it releases the locks it will release all but the lock that represents the CPU that went offline. Now if the CPU comes back online, the lock is again active. But this time, it has an owner that never released it. If another task takes the lglocks, it will block on this lock. It can even block the original owner of the lock.

Comment 1 Steven Rostedt 2012-03-02 16:15:52 UTC
Created attachment 567097 [details]
Use a separate cpumask for taking lglocks, not the online mask

The non-RT code for taking lglocks uses its own cpumask to take the locks. A CPU bit is set in the mask when it comes online and is never released. That means the locks will be taken for CPUs that are offline. But they are also released when the CPU is offline and it doesn't cause the issue where a lock may be left with an owner that abandoned it.

This patch converts the RT side to simulate the non-RT and fixes the deadlocks.

Comment 2 John Kacur 2012-05-23 13:09:25 UTC
This patch is equivalent to 7837aec
git describe --contains 7837aec
v3.2.14-rt24~7

Updating kernel-rt.spec to reflect this.

Comment 5 John Kacur 2012-06-20 21:44:38 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: RT Versions of lglock uses for_each_online_cpu()
Consequence: locks can be reactived after a cpu comes online again, but with an owner that hasn't released it. This causes various problems such as blocking the original owner of the lock.
Fix: Convert the RT versions to use the lglock specific cpumasks
Result: Locks will be taken for CPUs that are offline. But they are also released when the CPU is offline and it doesn't cause the issue where a lock may be left with an owner that abandoned it.

Comment 7 Steven Rostedt 2012-07-03 14:27:30 UTC
This patch was added to 3.2.14-rt24 (upstream stable-rt).

Comment 10 errata-xmlrpc 2012-09-19 18:03:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1282.html