Bug 280161

Summary: Killing a userland process can leave locks hanging around
Product: [Retired] Red Hat Cluster Suite Reporter: Christine Caulfield <ccaulfie>
Component: dlm-kernelAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: high    
Version: 4CC: cfeist, cluster-maint, tao
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0995 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-21 21:55:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 319671    
Attachments:
Description Flags
Patch to fix none

Description Christine Caulfield 2007-09-06 09:07:49 UTC
Description of problem:

This problem was seen at PostFinance. Running a script such as:
$ for i in `seq 100 500`
$ do 
$  ./dlmtest -mEX -d0 testlock$i
$ done

on two nodes, then interrupting one on them can leave a lock in the locksapace
even when there is no processes associated with it.


Version-Release number of selected component (if applicable):
RHEL 4U5

How reproducible:
By them, easily. It happens in 2 or 3 iterations as most.
By me, I can't. 

Additional info:
Their systems are 8 processors, my tests were only run on dual process machines
so that might have some bearing.

The dlm_debug log contains the error:

dlm_unlock: xxx busy 1

which indicates that an unlock was attempted while the lock was in progress.

Comment 1 Christine Caulfield 2007-09-10 07:51:30 UTC
Created attachment 191461 [details]
Patch to fix

I managed to reproduce the problem on my roth nodes though it takes many
iterations.

Here's the patch (still including the printk) that fixes it for me.

Comment 2 Christine Caulfield 2007-09-12 08:40:32 UTC
Checked into RHEL4 branch

Checking in device.c;
/cvs/cluster/cluster/dlm-kernel/src/Attic/device.c,v  <--  device.c
new revision: 1.24.2.10; previous revision: 1.24.2.9
done


Comment 6 Christine Caulfield 2007-10-01 07:42:33 UTC
Setting to MODIFIED as the patch is in the dlm-kernel-2.6.9-48 package. If the
customer fails it then set it back to ASSIGNED again.

Comment 12 errata-xmlrpc 2007-11-21 21:55:55 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0995.html


Comment 13 Charlie Brady 2007-12-04 21:48:36 UTC
(In reply to comment #1)
> Created an attachment (id=191461) [edit]
> Patch to fix
> 
> I managed to reproduce the problem on my roth nodes though it takes many
> iterations.
> 
> Here's the patch (still including the printk) that fixes it for me.

Paul, this patch appears to be missing from the STABLE branch in CVS. It's in
the RHEL46 branch (of course).

Comment 14 Christine Caulfield 2007-12-05 08:46:21 UTC
Checking in device.c;
/cvs/cluster/cluster/dlm-kernel/src/Attic/device.c,v  <--  device.c
new revision: 1.24.2.1.4.1.2.10; previous revision: 1.24.2.1.4.1.2.9
done

I didn't realise anyone was still using it!