242798 – RFE: If HA lvm server lost connection to SAN, relocate to next machine

Bug 242798 - RFE: If HA lvm server lost connection to SAN, relocate to next machine

Summary: RFE: If HA lvm server lost connection to SAN, relocate to next machine

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	rgmanager
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Jonathan Earl Brassow
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	440144
TreeView+	depends on / blocked

Reported:	2007-06-05 20:30 UTC by Corey Marthaler
Modified:	2009-04-16 20:34 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHBA-2008-0791
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-07-25 19:15:07 UTC
Embargoed:

Attachments	(Terms of Use)
proposed patch to add self_fence to lvm resource agent (1.51 KB, patch) 2007-12-07 13:15 UTC, Marco Ceci	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0791	0	normal	SHIPPED_LIVE	rgmanager bug fix and enhancement update	2008-07-25 19:14:58 UTC

Description Corey Marthaler 2007-06-05 20:30:43 UTC

Description of problem:
Not sure if this is possible, but could a service be attempted on the next
server in line in a failure domain if the service is failed due to a loss of SAN
connectivity on the current running node? Or for that matter, a loss of SAN
connectivity on the node being relocated to?

Currenlty you will just see a failure and the following message: 
<err> stop: Could not match /dev/foo/bar with a real device

Version-Release number of selected component (if applicable):
rgmanager-1.9.67-0

Comment 1 Marco Ceci 2007-12-07 13:12:31 UTC

I was thinking that a solution to this would be to have a self_fence mechanism
similar to the one we have for the filesystem agent. Using this flag we can
trigger a reboot of the node who is not able to clean his tags. If a reboot is
performed the node is no more part of the cluster so the other node can proceed
in taking the ownership of the LVs.
I have modified lvm.sh adding the self_fence tag and the logic to perform the
reboot in case the node is not able to clean up his tags. I am attaching here
the patch created against the RHEL5.1 lvm.sh script.

What is you opinion about this approach to resolve this problem? Could you
review the script to see if there might be any problem with it?
I have tested it and it worked for me but obviously I would like to hear you
opinion on this matter before saying anything to customer.

Comment 2 Marco Ceci 2007-12-07 13:15:09 UTC

Created attachment 280961 [details]
proposed patch to add self_fence to lvm resource agent

Comment 5 Corey Marthaler 2008-04-01 18:51:29 UTC

Has this patch made it into a build for QA to test?

Comment 8 errata-xmlrpc 2008-07-25 19:15:07 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0791.html

Note You need to log in before you can comment on or make changes to this bug.