Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 516085

Summary: fence_scsi: support 2 node clusters
Product: Red Hat Enterprise Linux 5 Reporter: Ryan O'Hara <rohara>
Component: cmanAssignee: Ryan O'Hara <rohara>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.5CC: ccaulfie, cluster-maint, djansa, edamato, gbarros, jcapel, jkortus, liko, tao, tdunnon
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: cman-2.0.115-6.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 08:38:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 499522, 520823    
Attachments:
Description Flags
Proposed patch for 2 node support with fence_scsi. none

Description Ryan O'Hara 2009-08-06 16:49:17 UTC
Modify fence_scsi to support 2 node clusters.

Historically, SAN fencing agents have not been supported in 2 node clusters. However, I believe there is a solution that would allow fence_scsi to work with 2 node clusters.

A description of the problem and the solution will be provided below.

Comment 1 Ryan O'Hara 2009-08-06 17:06:11 UTC
First, it is absolutely required that a node (IT nexus, actually) be registered with device in order to unregister a key from that device. This detail is key to the solution.

In the current version of fence_scsi, there are two problems that prevent fence_scsi from working on 2 node clusters.

1. If a node is asked to fence (remove a key) for a device that it is not registered with, the node performing the fence operation will attempt to register with that device on-the-fly (at fence time). This was done in order to prevent fencing failures. It is true that this case should never happen.

2. For SAN environments with multiple LUNs (the common case), it is absolutely crucial that the list of LUNs (devices) that we need to unregister the key must be ordered consistently on all nodes. This is not guaranteed by lvm since devices names are vary from one node to the next. The reason this is needed is to prevent interleaving of fence operation (sg_persist, unregister key) among devices. Since 2 node fencing is a race, when the two nodes attempt to fence one another is might be possible for each node to fence the other from a subset of the devices. We want to avoid this.

The solution for problem #1 is to remove the bit of code that registers with a device at fence time (if needed to continue with fencing). Instead, if a node is asked to fence (remove a key) from a device for which it is not registered, fencing will fail.

The solution for problem #2 is to sort the list of devices extracted within the fence_scsi agent using a vgs command. Sorting alphabetically by device name is not sufficient, so instead the agent will extract the device name (pv_name) and uuid (pv_uuid) and build a hash which is keyed on the uuid. The uuid will be consistent on all nodes, and we will sort by uuid. This will insure that devices are ordered identically on each node.

With these two changes in place, fence_scsi should work in a 2 node cluster. At fence time, a race will occur -- the first node to successfully fence the other will win. The first node to fence will remove the other node's key from the device(s). The second node will not be able to fence the first because it is no longer registered with the device, and it will fail.

Comment 2 Ryan O'Hara 2009-08-06 17:11:45 UTC
Created attachment 356553 [details]
Proposed patch for 2 node support with fence_scsi.

This patch removes the registration on-the-fly at fence time. It also contains code to extract uuid when building list of devices, which we use to sort the device list. There are a few other very minor code cleanup changes.

Comment 4 Ryan O'Hara 2009-08-31 13:42:14 UTC
commit 66c513bfc91bdd325c3620b4da9c66d5028fcf23

Comment 7 Christine Caulfield 2009-10-14 07:41:39 UTC
Sorry, I should have set this to MODIFIED a while ago.

Comment 8 Perry Myers 2009-10-17 04:55:40 UTC
*** Bug 527546 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2010-03-30 08:38:52 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0266.html