448108 – rgmanager can get stuck forever with gulm

Bug 448108 - rgmanager can get stuck forever with gulm

Summary: rgmanager can get stuck forever with gulm

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	rgmanager
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Assignee:	Lon Hohberger
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-05-23 14:31 UTC by Corey Marthaler
Modified:	2009-04-16 20:22 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHBA-2008-0791
Clone Of:
Environment:
Last Closed:	2008-07-25 19:16:22 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0791	0	normal	SHIPPED_LIVE	rgmanager bug fix and enhancement update	2008-07-25 19:14:58 UTC

Description Corey Marthaler 2008-05-23 14:31:37 UTC

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:
I hit this while running 4.7 rgmanager regression tests. 
It appears that a lock will not be released from gulm after the process holding
it dies. This will cause the lock to hang if it is attempted again (like in the
case of service relocation).


Steps to Reproduce:
1. magma_tool lock mylock
2. ^Z and kill the process
3. try the lock again... it's stuck

Comment 1 Corey Marthaler 2008-05-23 14:33:22 UTC

RPM versions:
rgmanager-debuginfo-1.9.78-1
rgmanager-1.9.78-1
magma-debuginfo-1.0.8-1
magma-1.0.8-1
magma-plugins-1.0.14-1
magma-plugins-debuginfo-1.0.14-1
magma-devel-1.0.8-1
gulm-devel-1.0.10-0
gulm-1.0.10-0
gulm-debuginfo-1.0.10-0

Comment 2 Lon Hohberger 2008-05-23 15:32:24 UTC

So, this is a bug in gulm which I think is affecting rgmanager.

If you kill a process which has a gulm lock, the lock is never released.

Comment 3 Lon Hohberger 2008-05-23 15:35:19 UTC

Note: it may be a "works as intended" method of operation to support lock
failover.  I'm looking at the gulm code to see if Slave-side caching of locks is
done by-connection (I expect it is, but I don't know).

If so, I am contemplating making a patch to gulm which will allow locks to be
dropped if the connection from Slave->client (on the same host) or
Master->client (on the same host) dies.

The reason we can't do client->Master (on a different host) is because failover
of support of the masters requires the ability to reconnect as-needed.

Comment 4 RHEL Program Management 2008-05-23 15:41:42 UTC

This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 5 Lon Hohberger 2008-05-27 22:01:43 UTC

http://sources.redhat.com/git/?p=cluster.git;a=commit;h=e18bde1e5732937e4c7b3e536dfea5bb183f14c2

Comment 6 Lon Hohberger 2008-05-27 22:02:58 UTC

Modified.

Comment 7 Lon Hohberger 2008-05-28 13:40:09 UTC

As it turns out, this bugzilla was due to an incorrect unlock check in
rgmanager; a two-line patch fixes it.

Gulm does hold locks open even if the process dies, but I do not believe this is
a bug - rather, I believe this is a "works as intended".

Comment 8 Corey Marthaler 2008-06-02 18:23:11 UTC

Fix verified in rgmanager-1.9.80-1.

Comment 10 errata-xmlrpc 2008-07-25 19:16:22 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0791.html

Note You need to log in before you can comment on or make changes to this bug.