Bug 306391 - Not able to unblock/cancel a thread waiting on a lock from another thread.
Not able to unblock/cancel a thread waiting on a lock from another thread.
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: dlm-kernel (Show other bugs)
4
All Linux
low Severity low
: ---
: ---
Assigned To: Christine Caulfield
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-09-26 00:57 EDT by Norm Murray
Modified: 2010-10-22 14:56 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-04-24 10:45:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Wade Mealing 2007-09-26 00:57:43 EDT
Description of problem:

Process blocked on pthread_join() when using dlm style locking, which should
terminate when the lock was canceled and released.


Version-Release number of selected component (if applicable):

dlm-1.0.3-1-x86_64

How reproducible:

Every time.

Steps to Reproduce:

1. save the attached program lvb2.c

2. compile it:
gcc -L/usr/lib64 -g -D_REENTRANT -o lvb2 lvb2.c -ldlm -lpthread

3. start lvb2 on one window:
$ ./lvb2
sleeping...
Lock ID is 103d0
converting to 5
convert enq succeeded

4. start a lvb2 on another window:
$ ./lvb2
sleeping...
Lock ID is 1020d
converting to 5

5. press <enter> on the second window:
$ ./lvb2
sleeping...
Lock ID is 1020d
converting to 5

unlocking...
unlocked.
  

Actual results:
The second lvb2 does not terminate although the lock was cancelled and released.
lvb2 is actually blocked on pthread_join().

Expected results:
the blocked thread of the second lvb2 must be unblocked by
dlm_unlock_wait and then the program should terminate.

Additional info:

Waiting on release of source code from customer before I can upload.
Comment 1 Christine Caulfield 2007-09-26 09:57:02 EDT
There are two things going on here. The first is that the customer is using the
synchronous call dlm_unlock_wait(CANCEL) to cancel the lock. This is wrong, it
should be an asynchronous dlm_unlock(CANCEL) so that the cancel AST is delivered
to the waiting process not to the cancelling one.

This exposes a bug in the DLM where the astparam is overwritten by the value
passed to dlm_unlock, whereas it should be preserved. With this bug, the waiting
routine gets passed a bogus parameter and the process segfaults.

I have checked in a patch to the RHEL4 branch to fix this behaviour. It will
also need looking into for RHEL5.

Checking in device.c;
/cvs/cluster/cluster/dlm-kernel/src/Attic/device.c,v  <--  device.c
new revision: 1.24.2.11; previous revision: 1.24.2.10
done
Comment 4 Christine Caulfield 2007-10-04 06:59:24 EDT
RHEL5 bug cloned as bz#318061
Comment 5 Christine Caulfield 2007-10-04 08:29:29 EDT
This seems to have found its way into 4.6

Note You need to log in before you can comment on or make changes to this bug.