Description of problem: lock_gulmd can log out of the master on a client node while that client has GFS mounted. The result of this is that locks are still in the locktable for that client which can cause the cluster to hang while waiting for the logged out client to release its locks. Mike Tilstra had mentioned that a possible sollution for this problem would be a locktable sweep that cleaned up a node's locks on shutdown. Currently, the work around for this problem is to start lock_gulmd on the client node and then force it to expire. (lock_gulmd can not force expire a node that is not logged in) Version-Release number of selected component (if applicable): GFS-modules-smp-6.0.0-1.2; GFS-6.0.0-1.2 How reproducible: always Steps to Reproduce: 1. start lock servers 2. mount clients 3. gulm_tool shutdown client1 4. cluster can now hang because locks are still in the locktable for client1 Actual results: If the node tries to mount after it has logged out cleanly from the lock_gulmd master AND rebooted, it will produce the following error: lock_gulm: ERROR On lock 0x47040000000000181f587472696e312e67667300 Got a drop lcok request for a lock that we don't know of. state:0x3 Expected results: Additional info:
As an aside, lock_gulmd logsout cleanly when it receives SIGTERM. Given that "bad things" can happen when the lock server logs out cleanly while there are active resources (e.g. GFS or GNBD) it might be justifiable to remove the possibility of accidently shooting your self in the foot with SIGTERM by simply ignoring it altogether. Part of the original justification for having lock_gulmd shutdown cleanly on sigterm was to handle machines shutting down and receiving SIGTERM from killall. Since this is generally run *after* networking has been shutdown, the node will be fenced anyway since it will not be able to issue a clean shutdown over the downed interface. Now that there are init.d scripts for lock_gulmd, it makes even more sense to ignore SIGTERM.
Created attachment 102529 [details] Drop all lock holds on node logout. This implements the droplocks on logout idea. Not fully sure what all of the side effects of this are yes. So give it a whirl, see what breaks.
CVS head ignores sigterm now.
OK, but what about RHEL3?
sig_ign on sigterm in 6.0 sources now too.
in RHEL3 now too.
cvs head, core now gets locked by gulm kernel module. until module logs out, core will ignore shutdown reqs.
sigterm ignoring and core locking in 6.0.* now too.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-466.html