454355 – dlm: 3 nodes looking for a lock which does not exist?

Bug 454355 - dlm: 3 nodes looking for a lock which does not exist?

Summary: dlm: 3 nodes looking for a lock which does not exist?

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	5.5
Assignee:	David Teigland
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-07-07 21:57 UTC by Lon Hohberger
Modified:	2018-10-20 03:18 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-09-02 21:13:09 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Backtrace of rgmanager (9.00 KB, text/plain) 2008-07-07 21:58 UTC, Lon Hohberger	no flags	Details
debugfs DLM information on the rgmanager lockspace (1.54 KB, text/plain) 2008-07-07 21:59 UTC, Lon Hohberger	no flags	Details
View All

Description Lon Hohberger 2008-07-07 21:57:13 UTC

Description of problem:

I did some support with a community user on #linux-cluster of what started out
seeming like an rgmanager problem, but ended up looking very much like a DLM bug.

* rgmanager-2.0.38-2.el5_2.1
* kernel-2.6.18-92.1.1.el5xen in Xen domU 

(All cluster nodes are domU)

Clustat (rgmanager utility to get info about running services) was timing out. 
In the past, this has been caused by a number of things.

Comment 1 Lon Hohberger 2008-07-07 21:58:12 UTC

Created attachment 311207 [details]
Backtrace of rgmanager

Thread 8 is stuck waiting for a reply from the DLM.

Comment 2 Lon Hohberger 2008-07-07 21:59:18 UTC

Created attachment 311208 [details]
debugfs DLM information on the rgmanager lockspace

This is from all 4 nodes.  Several are looking for the master holder of the
"usrm::vf" lock.  None are reported to be the master.

Comment 3 Chris St. Pierre 2008-07-07 22:03:44 UTC

As requested, group_tool -v on all nodes.

Since it sounds like we'll be doing some detailed troubleshooting on this, might 
as well use actual node names.

# Node: Chico.  Status: Functional.
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]

# Node: Zeppo.  Status: Functional.
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]

# Node: Harpo.  Status: Functional.  /sys/kernel/debug/dlm/rgmanager* was empty
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]

# Node: Groucho.  Status: rgmanager hosed.
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]

Note You need to log in before you can comment on or make changes to this bug.