Bug 454355

Summary: dlm: 3 nodes looking for a lock which does not exist?
Product: Red Hat Enterprise Linux 5 Reporter: Lon Hohberger <lhh>
Component: kernelAssignee: David Teigland <teigland>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: low Docs Contact:
Priority: low    
Version: 5.3CC: cluster-maint, cstpierr, edamato, tao
Target Milestone: rc   
Target Release: 5.5   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 21:13:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Backtrace of rgmanager
none
debugfs DLM information on the rgmanager lockspace none

Description Lon Hohberger 2008-07-07 21:57:13 UTC
Description of problem:

I did some support with a community user on #linux-cluster of what started out
seeming like an rgmanager problem, but ended up looking very much like a DLM bug.

* rgmanager-2.0.38-2.el5_2.1
* kernel-2.6.18-92.1.1.el5xen in Xen domU 

(All cluster nodes are domU)

Clustat (rgmanager utility to get info about running services) was timing out. 
In the past, this has been caused by a number of things.

Comment 1 Lon Hohberger 2008-07-07 21:58:12 UTC
Created attachment 311207 [details]
Backtrace of rgmanager

Thread 8 is stuck waiting for a reply from the DLM.

Comment 2 Lon Hohberger 2008-07-07 21:59:18 UTC
Created attachment 311208 [details]
debugfs DLM information on the rgmanager lockspace

This is from all 4 nodes.  Several are looking for the master holder of the
"usrm::vf" lock.  None are reported to be the master.

Comment 3 Chris St. Pierre 2008-07-07 22:03:44 UTC
As requested, group_tool -v on all nodes.

Since it sounds like we'll be doing some detailed troubleshooting on this, might 
as well use actual node names.

# Node: Chico.  Status: Functional.
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]

# Node: Zeppo.  Status: Functional.
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]

# Node: Harpo.  Status: Functional.  /sys/kernel/debug/dlm/rgmanager* was empty
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]

# Node: Groucho.  Status: rgmanager hosed.
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]