Bug 280161 - Killing a userland process can leave locks hanging around
Killing a userland process can leave locks hanging around
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: dlm-kernel (Show other bugs)
4
All Linux
high Severity low
: ---
: ---
Assigned To: Christine Caulfield
Cluster QE
: ZStream
Depends On:
Blocks: 319671
  Show dependency treegraph
 
Reported: 2007-09-06 05:07 EDT by Christine Caulfield
Modified: 2010-10-22 14:28 EDT (History)
3 users (show)

See Also:
Fixed In Version: RHBA-2007-0995
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-21 16:55:55 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to fix (1.14 KB, patch)
2007-09-10 03:51 EDT, Christine Caulfield
no flags Details | Diff

  None (edit)
Description Christine Caulfield 2007-09-06 05:07:49 EDT
Description of problem:

This problem was seen at PostFinance. Running a script such as:
$ for i in `seq 100 500`
$ do 
$  ./dlmtest -mEX -d0 testlock$i
$ done

on two nodes, then interrupting one on them can leave a lock in the locksapace
even when there is no processes associated with it.


Version-Release number of selected component (if applicable):
RHEL 4U5

How reproducible:
By them, easily. It happens in 2 or 3 iterations as most.
By me, I can't. 

Additional info:
Their systems are 8 processors, my tests were only run on dual process machines
so that might have some bearing.

The dlm_debug log contains the error:

dlm_unlock: xxx busy 1

which indicates that an unlock was attempted while the lock was in progress.
Comment 1 Christine Caulfield 2007-09-10 03:51:30 EDT
Created attachment 191461 [details]
Patch to fix

I managed to reproduce the problem on my roth nodes though it takes many
iterations.

Here's the patch (still including the printk) that fixes it for me.
Comment 2 Christine Caulfield 2007-09-12 04:40:32 EDT
Checked into RHEL4 branch

Checking in device.c;
/cvs/cluster/cluster/dlm-kernel/src/Attic/device.c,v  <--  device.c
new revision: 1.24.2.10; previous revision: 1.24.2.9
done
Comment 6 Christine Caulfield 2007-10-01 03:42:33 EDT
Setting to MODIFIED as the patch is in the dlm-kernel-2.6.9-48 package. If the
customer fails it then set it back to ASSIGNED again.
Comment 12 errata-xmlrpc 2007-11-21 16:55:55 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0995.html
Comment 13 Charlie Brady 2007-12-04 16:48:36 EST
(In reply to comment #1)
> Created an attachment (id=191461) [edit]
> Patch to fix
> 
> I managed to reproduce the problem on my roth nodes though it takes many
> iterations.
> 
> Here's the patch (still including the printk) that fixes it for me.

Paul, this patch appears to be missing from the STABLE branch in CVS. It's in
the RHEL46 branch (of course).
Comment 14 Christine Caulfield 2007-12-05 03:46:21 EST
Checking in device.c;
/cvs/cluster/cluster/dlm-kernel/src/Attic/device.c,v  <--  device.c
new revision: 1.24.2.1.4.1.2.10; previous revision: 1.24.2.1.4.1.2.9
done

I didn't realise anyone was still using it!

Note You need to log in before you can comment on or make changes to this bug.