Bug 129193 - cluster can hang if lock_gulmd logs out on mounted client
Summary: cluster can hang if lock_gulmd logs out on mounted client
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs   
(Show other bugs)
Version: 3
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: michael conrad tadpol tilstra
QA Contact: GFS Bugs
Depends On:
TreeView+ depends on / blocked
Reported: 2004-08-04 22:38 UTC by Adam "mantis" Manthei
Modified: 2010-01-12 02:55 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-05-25 16:41:08 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Drop all lock holds on node logout. (2.68 KB, patch)
2004-08-09 17:56 UTC, michael conrad tadpol tilstra
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2005:466 normal SHIPPED_LIVE GFS bug fix update 2005-05-25 04:00:00 UTC

Description Adam "mantis" Manthei 2004-08-04 22:38:07 UTC
Description of problem:
lock_gulmd can log out of the master on a client node while that
client has GFS mounted.  The result of this is that locks are still in
the locktable for that client which can cause the cluster to hang
while waiting for the logged out client to release its locks.  

Mike Tilstra had mentioned that a possible sollution for this problem
would be a locktable sweep that cleaned up a node's locks on shutdown.

Currently, the work around for this problem is to start lock_gulmd on
the client node and then force it to expire.  (lock_gulmd can not
force expire a node that is not logged in)

Version-Release number of selected component (if applicable):
GFS-modules-smp-6.0.0-1.2; GFS-6.0.0-1.2

How reproducible:

Steps to Reproduce:
1. start lock servers
2. mount clients
3. gulm_tool shutdown client1
4. cluster can now hang because locks are still in the locktable for
Actual results:
If the node tries to mount after it has logged out cleanly from the
lock_gulmd master AND rebooted, it will produce the following error:

lock_gulm: ERROR On lock 0x47040000000000181f587472696e312e67667300   
  Got a drop lcok request for a lock that we don't know of. state:0x3

Expected results:

Additional info:

Comment 1 Adam "mantis" Manthei 2004-08-05 13:44:25 UTC
As an aside, lock_gulmd logsout cleanly when it receives SIGTERM. 
Given that "bad things" can happen when the lock server logs out
cleanly while there are active resources (e.g. GFS or GNBD) it might
be justifiable to remove the possibility of accidently shooting your
self in the foot with SIGTERM by simply ignoring it altogether.  

Part of the original justification for having lock_gulmd shutdown
cleanly on sigterm was to handle machines shutting down and receiving
SIGTERM from killall.  Since this is generally run *after* networking
has been shutdown, the node will be fenced anyway since it will not be
able to issue a clean shutdown over the downed interface.  Now that
there are init.d scripts for lock_gulmd, it makes even more sense to
ignore SIGTERM.

Comment 2 michael conrad tadpol tilstra 2004-08-09 17:56:41 UTC
Created attachment 102529 [details]
Drop all lock holds on node logout.

This implements the droplocks on logout idea.  Not fully sure what all of the
side effects of this are yes.  So give it a whirl, see what breaks.

Comment 3 michael conrad tadpol tilstra 2004-10-14 15:47:40 UTC
CVS head ignores sigterm now.

Comment 4 Adam "mantis" Manthei 2004-10-14 15:51:32 UTC
OK, but what about RHEL3?

Comment 5 michael conrad tadpol tilstra 2004-10-14 17:58:53 UTC
sig_ign on sigterm in 6.0 sources now too.

Comment 6 michael conrad tadpol tilstra 2004-10-21 21:57:42 UTC
in RHEL3 now too.

Comment 7 michael conrad tadpol tilstra 2004-10-29 21:25:55 UTC
cvs head, core now gets locked by gulm kernel module.  until module
logs out, core will ignore shutdown reqs.

Comment 8 michael conrad tadpol tilstra 2004-12-01 21:26:23 UTC
sigterm ignoring and core locking in 6.0.* now too.

Comment 9 Jay Turner 2005-05-25 16:41:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.