Bug 177934

Summary: dlm_release_lockspace from app A can cause app B to break
Product: [Retired] Red Hat Cluster Suite Reporter: Lon Hohberger <lhh>
Component: dlmAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: ccaulfie, cluster-maint
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2006-0558 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 21:26:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Untested patch
none
Tested Patch none

Description Lon Hohberger 2006-01-16 16:22:43 UTC
Description of problem:

If you call dlm_release_lockspace() in libdlm from userland from an application
A, application B's reference to the open lockspace will become invalid if
application B has no locks granted at the time of the release.

e.g.

Program A        Program B
Create LS 'Foo'
                 Open LS 'Foo'
Acquire lock
Release lock
Release LS
                 Acquire lock   <-- returns -1 / ENOENT

Version-Release number of selected component (if applicable): Current 1/13/2006
CVS - STABLE/RHEL4

How reproducible: 100% 

Expected results: Application B should not have its handle invalidated.

Additional info: A simple, effective workaround is to have app. B detect the
ENOENT failure, close/reopen/recreate the lockspace, and retry the lock request.
 This takes a few seconds, but works in testing.

Other possible solutions:
- Use AUTOFREE or be able to set this flag from libdlm?
- Have libdlm return EBUSY if there are other lockspace users when
dlm_release_lockspace is called.
- Use a reference count on create/open for the number of local users of a LS,
and decrement the count when it reaches 0.

Comment 1 Lon Hohberger 2006-01-16 17:10:48 UTC
"Use a reference count on create/open for the number of local users of a LS,
and decrement the count when it reaches 0."

should be:

"Use a reference count on create/open for the number of local users of a LS,
and decrement the count on release.  Only actually fully release the LS when the
refcnt reaches 0."


Comment 2 Christine Caulfield 2006-01-17 15:43:55 UTC
Created attachment 123300 [details]
Untested patch

Here's an (untested) patch that might do the job. It needs testing for all the
open/close/delete /open/delete/close etc conditions of course.

Comment 3 Christine Caulfield 2006-01-23 11:54:46 UTC
Created attachment 123572 [details]
Tested Patch

I've tested this patch and it "works for me" (tm)

Comment 4 Lon Hohberger 2006-02-21 15:01:54 UTC
Patch works for me too.

Comment 5 Christine Caulfield 2006-02-22 09:04:50 UTC
Fix in -rSTABLE:

Checking in device.c;
/cvs/cluster/cluster/dlm-kernel/src/device.c,v  <--  device.c
new revision: 1.24.2.1.4.1.2.7; previous revision: 1.24.2.1.4.1.2.6
done

fix in -rRHEL4 (for U4)

Checking in device.c;
/cvs/cluster/cluster/dlm-kernel/src/device.c,v  <--  device.c
new revision: 1.24.2.7; previous revision: 1.24.2.6
done


Comment 8 Red Hat Bugzilla 2006-08-10 21:26:59 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0558.html