Bug 153321

Summary: node locks can deadlock cluster
Product: [Retired] Red Hat Cluster Suite Reporter: Adam "mantis" Manthei <amanthei>
Component: gulmAssignee: michael conrad tadpol tilstra <mtilstra>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: cluster-maint, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-05-25 16:41:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 154397, 160494    
Attachments:
Description Flags
logs of test run none

Description Adam "mantis" Manthei 2005-04-04 19:24:38 UTC
Description of problem:
The node locks for gulm can deadlock the cluster.  This is something that one of
our customers was seeing in the field.  They had a large nodecount (~24 nodes)
with mulitple filesystems (~8 gfs) and were using GNBD mulipathing.  I think I
have been able to reproduce their problem using only a single SLM server and two
clients with a single filesystem.

Version-Release number of selected component (if applicable):
GFS-6.0.2.8
GFS-modules-6.0.2.8
kernel-2.4.21-31.EL

How reproducible:
Always

Steps to Reproduce:
1. start with clean server state with an SLM server

2. start lock_gulmd on two nodes in the cluster (node-A and node-B) using
fence_manual for each

3. mount the same GFS on both nodes

4. crash node-A

5. reboot node-A, but do not fence it just yet

6. crash node-B

7. fence_ack_manual node-A before node-B expires

8. login with node-A now

9. mount gfs on node-A

10. allow node-B to expire

11 fence node-B

At this point, the cluster is deadlocked because node-A is waiting on it's
expired nodelock.
  
Actual results:
GFS is locked up (recovery isn't happening)

Expected results:
recovery should continue

Additional info:

Comment 1 Adam "mantis" Manthei 2005-04-04 19:30:34 UTC
Created attachment 112680 [details]
logs of test run

I believe that this is the lock blocking all further activity

#=================
key	       : 'R0ZTIE4EZ2ZzMQAadHJpbi0wOC5sYWIubXNwLnJlZGhhdC5jb20A'
ExK	       : GFS , N, 4, gfs1, 26, trin-08.lab.msp.redhat.com
state	       : gio_lck_st_Unlock
LVBlen	       : 0
LVB	       :
HolderCount    : 0
Holders        : 
LVBHolderCount : 0
LVBHolders     : 
ExpiredCount   : 1
ExpiredHolders : [ trin-08.lab.msp.redhat.com ]
reply_waiter   :
Waiters        :
 - key	       : 'R0ZTIE4EZ2ZzMQAadHJpbi0wOC5sYWIubXNwLnJlZGhhdC5jb20A'
   ExK	       : GFS , N, 4, gfs1, 26, trin-08.lab.msp.redhat.com
   name        : trin-08.lab.msp.redhat.com
   state       : gio_lck_st_Exclusive
   flags       : Cachable 
   LVB	       :
   Slave_rply  : 0x0
   Slave_sent  : 0x0
   idx	       : 4
High_Waiters   :
Action_Waiters :
State_Waiters  :

Comment 2 Adam "mantis" Manthei 2005-04-04 19:39:40 UTC
One thing that I did that was a little non standard in reproducing this bug was
that I set the allowed misses to 100 and used manual fencing that way I could
better control the timing of things.  I'm still not sure why the customer was
seeing this issue, especially given the higher node count.  (I would think that
more nodes would mean the less likely this case is to pop up!)

I think that there might be another bug hiding here in here too.  I have seen on
my larger setups (~15 nodes) cases were the clients aren't being informed that
there are expired nodes when dealing with large numbers of clients and client
failure testing.  This is an issue that belongs in another bug, but I have not
been able to figure out what the frell is going on  (I was also doing some
rather unsupported and risky things at the time, so that may have merely been an
artifact of the test I was running).

Comment 3 michael conrad tadpol tilstra 2005-04-06 19:30:38 UTC
The nodelocks were added to work around a deadlock when a node remounted after failure and tried 
replay its own journal. (#1206)  They also helped deal with the version of the jid mapping code that was 
present.  Knowing that I've fixed the jid mapper in future versions, and guessing that the previous work 
around is no longer needed, I back ported the jid mapping code from 6.1.  This code has no nodelocks 
(or listlocks).  Given your test above, this seems to have fixed things.  Am running some other tests to 
see if this is a workable solution.


Comment 4 Kiersten (Kerri) Anderson 2005-04-11 19:43:32 UTC
*** Bug 154397 has been marked as a duplicate of this bug. ***

Comment 5 michael conrad tadpol tilstra 2005-05-10 18:37:37 UTC
Fix commited into rhel3 branch.  Nodelocks have been removed.  The steps above
now work.  Also ran a bunch of basic recovery itterations.

Comment 6 Jay Turner 2005-05-25 16:41:13 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-466.html