Bug 306391 - Not able to unblock/cancel a thread waiting on a lock from another thread.
Summary: Not able to unblock/cancel a thread waiting on a lock from another thread.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: dlm-kernel
Version: 4
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-09-26 04:57 UTC by Norm Murray
Modified: 2018-10-19 22:23 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-04-24 14:45:58 UTC
Embargoed:


Attachments (Terms of Use)

Description Wade Mealing 2007-09-26 04:57:43 UTC
Description of problem:

Process blocked on pthread_join() when using dlm style locking, which should
terminate when the lock was canceled and released.


Version-Release number of selected component (if applicable):

dlm-1.0.3-1-x86_64

How reproducible:

Every time.

Steps to Reproduce:

1. save the attached program lvb2.c

2. compile it:
gcc -L/usr/lib64 -g -D_REENTRANT -o lvb2 lvb2.c -ldlm -lpthread

3. start lvb2 on one window:
$ ./lvb2
sleeping...
Lock ID is 103d0
converting to 5
convert enq succeeded

4. start a lvb2 on another window:
$ ./lvb2
sleeping...
Lock ID is 1020d
converting to 5

5. press <enter> on the second window:
$ ./lvb2
sleeping...
Lock ID is 1020d
converting to 5

unlocking...
unlocked.
  

Actual results:
The second lvb2 does not terminate although the lock was cancelled and released.
lvb2 is actually blocked on pthread_join().

Expected results:
the blocked thread of the second lvb2 must be unblocked by
dlm_unlock_wait and then the program should terminate.

Additional info:

Waiting on release of source code from customer before I can upload.

Comment 1 Christine Caulfield 2007-09-26 13:57:02 UTC
There are two things going on here. The first is that the customer is using the
synchronous call dlm_unlock_wait(CANCEL) to cancel the lock. This is wrong, it
should be an asynchronous dlm_unlock(CANCEL) so that the cancel AST is delivered
to the waiting process not to the cancelling one.

This exposes a bug in the DLM where the astparam is overwritten by the value
passed to dlm_unlock, whereas it should be preserved. With this bug, the waiting
routine gets passed a bogus parameter and the process segfaults.

I have checked in a patch to the RHEL4 branch to fix this behaviour. It will
also need looking into for RHEL5.

Checking in device.c;
/cvs/cluster/cluster/dlm-kernel/src/Attic/device.c,v  <--  device.c
new revision: 1.24.2.11; previous revision: 1.24.2.10
done


Comment 4 Christine Caulfield 2007-10-04 10:59:24 UTC
RHEL5 bug cloned as bz#318061

Comment 5 Christine Caulfield 2007-10-04 12:29:29 UTC
This seems to have found its way into 4.6


Note You need to log in before you can comment on or make changes to this bug.