Bug 280161

Summary:

Killing a userland process can leave locks hanging around

Product:

[Retired] Red Hat Cluster Suite

Reporter:

Christine Caulfield <ccaulfie>

Component:

dlm-kernel

Assignee:

Christine Caulfield <ccaulfie>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

low

Docs Contact:

Priority:

high

Version:

CC:

cfeist, cluster-maint, tao

Target Milestone:

---

Keywords:

ZStream

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

RHBA-2007-0995

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-11-21 21:55:55 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

319671

Attachments:

Description	Flags
Patch to fix	none

Description Christine Caulfield 2007-09-06 09:07:49 UTC

Description of problem:

This problem was seen at PostFinance. Running a script such as:
$ for i in `seq 100 500`
$ do 
$  ./dlmtest -mEX -d0 testlock$i
$ done

on two nodes, then interrupting one on them can leave a lock in the locksapace
even when there is no processes associated with it.


Version-Release number of selected component (if applicable):
RHEL 4U5

How reproducible:
By them, easily. It happens in 2 or 3 iterations as most.
By me, I can't. 

Additional info:
Their systems are 8 processors, my tests were only run on dual process machines
so that might have some bearing.

The dlm_debug log contains the error:

dlm_unlock: xxx busy 1

which indicates that an unlock was attempted while the lock was in progress.

Comment 1 Christine Caulfield 2007-09-10 07:51:30 UTC

Created attachment 191461 [details]
Patch to fix

I managed to reproduce the problem on my roth nodes though it takes many
iterations.

Here's the patch (still including the printk) that fixes it for me.

Comment 2 Christine Caulfield 2007-09-12 08:40:32 UTC

Checked into RHEL4 branch

Checking in device.c;
/cvs/cluster/cluster/dlm-kernel/src/Attic/device.c,v  <--  device.c
new revision: 1.24.2.10; previous revision: 1.24.2.9
done

Comment 6 Christine Caulfield 2007-10-01 07:42:33 UTC

Setting to MODIFIED as the patch is in the dlm-kernel-2.6.9-48 package. If the
customer fails it then set it back to ASSIGNED again.

Comment 12 errata-xmlrpc 2007-11-21 21:55:55 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0995.html

Comment 13 Charlie Brady 2007-12-04 21:48:36 UTC

(In reply to comment #1)
> Created an attachment (id=191461) [edit]
> Patch to fix
> 
> I managed to reproduce the problem on my roth nodes though it takes many
> iterations.
> 
> Here's the patch (still including the printk) that fixes it for me.

Paul, this patch appears to be missing from the STABLE branch in CVS. It's in
the RHEL46 branch (of course).

Comment 14 Christine Caulfield 2007-12-05 08:46:21 UTC

Checking in device.c;
/cvs/cluster/cluster/dlm-kernel/src/Attic/device.c,v  <--  device.c
new revision: 1.24.2.1.4.1.2.10; previous revision: 1.24.2.1.4.1.2.9
done

I didn't realise anyone was still using it!