Bug 454355 - dlm: 3 nodes looking for a lock which does not exist?
dlm: 3 nodes looking for a lock which does not exist?
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
low Severity low
: rc
: 5.5
Assigned To: David Teigland
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-07-07 17:57 EDT by Lon Hohberger
Modified: 2010-10-22 22:40 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 17:13:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Backtrace of rgmanager (9.00 KB, text/plain)
2008-07-07 17:58 EDT, Lon Hohberger
no flags Details
debugfs DLM information on the rgmanager lockspace (1.54 KB, text/plain)
2008-07-07 17:59 EDT, Lon Hohberger
no flags Details

  None (edit)
Description Lon Hohberger 2008-07-07 17:57:13 EDT
Description of problem:

I did some support with a community user on #linux-cluster of what started out
seeming like an rgmanager problem, but ended up looking very much like a DLM bug.

* rgmanager-2.0.38-2.el5_2.1
* kernel-2.6.18-92.1.1.el5xen in Xen domU 

(All cluster nodes are domU)

Clustat (rgmanager utility to get info about running services) was timing out. 
In the past, this has been caused by a number of things.
Comment 1 Lon Hohberger 2008-07-07 17:58:12 EDT
Created attachment 311207 [details]
Backtrace of rgmanager

Thread 8 is stuck waiting for a reply from the DLM.
Comment 2 Lon Hohberger 2008-07-07 17:59:18 EDT
Created attachment 311208 [details]
debugfs DLM information on the rgmanager lockspace

This is from all 4 nodes.  Several are looking for the master holder of the
"usrm::vf" lock.  None are reported to be the master.
Comment 3 Chris St. Pierre 2008-07-07 18:03:44 EDT
As requested, group_tool -v on all nodes.

Since it sounds like we'll be doing some detailed troubleshooting on this, might 
as well use actual node names.

# Node: Chico.  Status: Functional.
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]

# Node: Zeppo.  Status: Functional.
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]

# Node: Harpo.  Status: Functional.  /sys/kernel/debug/dlm/rgmanager* was empty
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]

# Node: Groucho.  Status: rgmanager hosed.
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]

Note You need to log in before you can comment on or make changes to this bug.