Description of problem:
lock_gulmd can log out of the master on a client node while that
client has GFS mounted. The result of this is that locks are still in
the locktable for that client which can cause the cluster to hang
while waiting for the logged out client to release its locks.
Mike Tilstra had mentioned that a possible sollution for this problem
would be a locktable sweep that cleaned up a node's locks on shutdown.
Currently, the work around for this problem is to start lock_gulmd on
the client node and then force it to expire. (lock_gulmd can not
force expire a node that is not logged in)
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. start lock servers
2. mount clients
3. gulm_tool shutdown client1
4. cluster can now hang because locks are still in the locktable for
If the node tries to mount after it has logged out cleanly from the
lock_gulmd master AND rebooted, it will produce the following error:
lock_gulm: ERROR On lock 0x47040000000000181f587472696e312e67667300
Got a drop lcok request for a lock that we don't know of. state:0x3
As an aside, lock_gulmd logsout cleanly when it receives SIGTERM.
Given that "bad things" can happen when the lock server logs out
cleanly while there are active resources (e.g. GFS or GNBD) it might
be justifiable to remove the possibility of accidently shooting your
self in the foot with SIGTERM by simply ignoring it altogether.
Part of the original justification for having lock_gulmd shutdown
cleanly on sigterm was to handle machines shutting down and receiving
SIGTERM from killall. Since this is generally run *after* networking
has been shutdown, the node will be fenced anyway since it will not be
able to issue a clean shutdown over the downed interface. Now that
there are init.d scripts for lock_gulmd, it makes even more sense to
Created attachment 102529 [details]
Drop all lock holds on node logout.
This implements the droplocks on logout idea. Not fully sure what all of the
side effects of this are yes. So give it a whirl, see what breaks.
CVS head ignores sigterm now.
OK, but what about RHEL3?
sig_ign on sigterm in 6.0 sources now too.
in RHEL3 now too.
cvs head, core now gets locked by gulm kernel module. until module
logs out, core will ignore shutdown reqs.
sigterm ignoring and core locking in 6.0.* now too.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.