Bug 129193 - cluster can hang if lock_gulmd logs out on mounted client
cluster can hang if lock_gulmd logs out on mounted client
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: michael conrad tadpol tilstra
GFS Bugs
Depends On:
  Show dependency treegraph
Reported: 2004-08-04 18:38 EDT by Adam "mantis" Manthei
Modified: 2010-01-11 21:55 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-05-25 12:41:08 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Drop all lock holds on node logout. (2.68 KB, patch)
2004-08-09 13:56 EDT, michael conrad tadpol tilstra
no flags Details | Diff

  None (edit)
Description Adam "mantis" Manthei 2004-08-04 18:38:07 EDT
Description of problem:
lock_gulmd can log out of the master on a client node while that
client has GFS mounted.  The result of this is that locks are still in
the locktable for that client which can cause the cluster to hang
while waiting for the logged out client to release its locks.  

Mike Tilstra had mentioned that a possible sollution for this problem
would be a locktable sweep that cleaned up a node's locks on shutdown.

Currently, the work around for this problem is to start lock_gulmd on
the client node and then force it to expire.  (lock_gulmd can not
force expire a node that is not logged in)

Version-Release number of selected component (if applicable):
GFS-modules-smp-6.0.0-1.2; GFS-6.0.0-1.2

How reproducible:

Steps to Reproduce:
1. start lock servers
2. mount clients
3. gulm_tool shutdown client1
4. cluster can now hang because locks are still in the locktable for
Actual results:
If the node tries to mount after it has logged out cleanly from the
lock_gulmd master AND rebooted, it will produce the following error:

lock_gulm: ERROR On lock 0x47040000000000181f587472696e312e67667300   
  Got a drop lcok request for a lock that we don't know of. state:0x3

Expected results:

Additional info:
Comment 1 Adam "mantis" Manthei 2004-08-05 09:44:25 EDT
As an aside, lock_gulmd logsout cleanly when it receives SIGTERM. 
Given that "bad things" can happen when the lock server logs out
cleanly while there are active resources (e.g. GFS or GNBD) it might
be justifiable to remove the possibility of accidently shooting your
self in the foot with SIGTERM by simply ignoring it altogether.  

Part of the original justification for having lock_gulmd shutdown
cleanly on sigterm was to handle machines shutting down and receiving
SIGTERM from killall.  Since this is generally run *after* networking
has been shutdown, the node will be fenced anyway since it will not be
able to issue a clean shutdown over the downed interface.  Now that
there are init.d scripts for lock_gulmd, it makes even more sense to
ignore SIGTERM.

Comment 2 michael conrad tadpol tilstra 2004-08-09 13:56:41 EDT
Created attachment 102529 [details]
Drop all lock holds on node logout.

This implements the droplocks on logout idea.  Not fully sure what all of the
side effects of this are yes.  So give it a whirl, see what breaks.
Comment 3 michael conrad tadpol tilstra 2004-10-14 11:47:40 EDT
CVS head ignores sigterm now.
Comment 4 Adam "mantis" Manthei 2004-10-14 11:51:32 EDT
OK, but what about RHEL3?
Comment 5 michael conrad tadpol tilstra 2004-10-14 13:58:53 EDT
sig_ign on sigterm in 6.0 sources now too.
Comment 6 michael conrad tadpol tilstra 2004-10-21 17:57:42 EDT
in RHEL3 now too.
Comment 7 michael conrad tadpol tilstra 2004-10-29 17:25:55 EDT
cvs head, core now gets locked by gulm kernel module.  until module
logs out, core will ignore shutdown reqs.
Comment 8 michael conrad tadpol tilstra 2004-12-01 16:26:23 EST
sigterm ignoring and core locking in 6.0.* now too.
Comment 9 Jay Turner 2005-05-25 12:41:09 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.