Bug 242798 - RFE: If HA lvm server lost connection to SAN, relocate to next machine
Summary: RFE: If HA lvm server lost connection to SAN, relocate to next machine
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: rgmanager (Show other bugs)
(Show other bugs)
Version: 4
Hardware: All Linux
low
low
Target Milestone: ---
Assignee: Jonathan Earl Brassow
QA Contact: Cluster QE
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: 440144
TreeView+ depends on / blocked
 
Reported: 2007-06-05 20:30 UTC by Corey Marthaler
Modified: 2009-04-16 20:34 UTC (History)
2 users (show)

Fixed In Version: RHBA-2008-0791
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-25 19:15:07 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
proposed patch to add self_fence to lvm resource agent (1.51 KB, patch)
2007-12-07 13:15 UTC, Marco Ceci
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0791 normal SHIPPED_LIVE rgmanager bug fix and enhancement update 2008-07-25 19:14:58 UTC

Description Corey Marthaler 2007-06-05 20:30:43 UTC
Description of problem:
Not sure if this is possible, but could a service be attempted on the next
server in line in a failure domain if the service is failed due to a loss of SAN
connectivity on the current running node? Or for that matter, a loss of SAN
connectivity on the node being relocated to?

Currenlty you will just see a failure and the following message: 
<err> stop: Could not match /dev/foo/bar with a real device

Version-Release number of selected component (if applicable):
rgmanager-1.9.67-0

Comment 1 Marco Ceci 2007-12-07 13:12:31 UTC
I was thinking that a solution to this would be to have a self_fence mechanism
similar to the one we have for the filesystem agent. Using this flag we can
trigger a reboot of the node who is not able to clean his tags. If a reboot is
performed the node is no more part of the cluster so the other node can proceed
in taking the ownership of the LVs.
I have modified lvm.sh adding the self_fence tag and the logic to perform the
reboot in case the node is not able to clean up his tags. I am attaching here
the patch created against the RHEL5.1 lvm.sh script.

What is you opinion about this approach to resolve this problem? Could you
review the script to see if there might be any problem with it?
I have tested it and it worked for me but obviously I would like to hear you
opinion on this matter before saying anything to customer.

Comment 2 Marco Ceci 2007-12-07 13:15:09 UTC
Created attachment 280961 [details]
proposed patch to add self_fence to lvm resource agent

Comment 5 Corey Marthaler 2008-04-01 18:51:29 UTC
Has this patch made it into a build for QA to test?

Comment 8 errata-xmlrpc 2008-07-25 19:15:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0791.html



Note You need to log in before you can comment on or make changes to this bug.