Bug 177934 - dlm_release_lockspace from app A can cause app B to break
Summary: dlm_release_lockspace from app A can cause app B to break
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: dlm
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-01-16 16:22 UTC by Lon Hohberger
Modified: 2009-04-16 20:00 UTC (History)
2 users (show)

Fixed In Version: RHBA-2006-0558
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-08-10 21:26:59 UTC
Embargoed:


Attachments (Terms of Use)
Untested patch (1.76 KB, patch)
2006-01-17 15:43 UTC, Christine Caulfield
no flags Details | Diff
Tested Patch (1.51 KB, patch)
2006-01-23 11:54 UTC, Christine Caulfield
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2006:0558 0 normal SHIPPED_LIVE dlm-kernel bug fix update 2006-08-10 04:00:00 UTC

Description Lon Hohberger 2006-01-16 16:22:43 UTC
Description of problem:

If you call dlm_release_lockspace() in libdlm from userland from an application
A, application B's reference to the open lockspace will become invalid if
application B has no locks granted at the time of the release.

e.g.

Program A        Program B
Create LS 'Foo'
                 Open LS 'Foo'
Acquire lock
Release lock
Release LS
                 Acquire lock   <-- returns -1 / ENOENT

Version-Release number of selected component (if applicable): Current 1/13/2006
CVS - STABLE/RHEL4

How reproducible: 100% 

Expected results: Application B should not have its handle invalidated.

Additional info: A simple, effective workaround is to have app. B detect the
ENOENT failure, close/reopen/recreate the lockspace, and retry the lock request.
 This takes a few seconds, but works in testing.

Other possible solutions:
- Use AUTOFREE or be able to set this flag from libdlm?
- Have libdlm return EBUSY if there are other lockspace users when
dlm_release_lockspace is called.
- Use a reference count on create/open for the number of local users of a LS,
and decrement the count when it reaches 0.

Comment 1 Lon Hohberger 2006-01-16 17:10:48 UTC
"Use a reference count on create/open for the number of local users of a LS,
and decrement the count when it reaches 0."

should be:

"Use a reference count on create/open for the number of local users of a LS,
and decrement the count on release.  Only actually fully release the LS when the
refcnt reaches 0."


Comment 2 Christine Caulfield 2006-01-17 15:43:55 UTC
Created attachment 123300 [details]
Untested patch

Here's an (untested) patch that might do the job. It needs testing for all the
open/close/delete /open/delete/close etc conditions of course.

Comment 3 Christine Caulfield 2006-01-23 11:54:46 UTC
Created attachment 123572 [details]
Tested Patch

I've tested this patch and it "works for me" (tm)

Comment 4 Lon Hohberger 2006-02-21 15:01:54 UTC
Patch works for me too.

Comment 5 Christine Caulfield 2006-02-22 09:04:50 UTC
Fix in -rSTABLE:

Checking in device.c;
/cvs/cluster/cluster/dlm-kernel/src/device.c,v  <--  device.c
new revision: 1.24.2.1.4.1.2.7; previous revision: 1.24.2.1.4.1.2.6
done

fix in -rRHEL4 (for U4)

Checking in device.c;
/cvs/cluster/cluster/dlm-kernel/src/device.c,v  <--  device.c
new revision: 1.24.2.7; previous revision: 1.24.2.6
done


Comment 8 Red Hat Bugzilla 2006-08-10 21:26:59 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0558.html



Note You need to log in before you can comment on or make changes to this bug.