RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 741339 - fence_scsi: fence_scsi.dev file gets unlinked on each unfence operation
Summary: fence_scsi: fence_scsi.dev file gets unlinked on each unfence operation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: fence-agents
Version: 6.3
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: ---
Assignee: Ryan O'Hara
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 756082
TreeView+ depends on / blocked
 
Reported: 2011-09-26 15:54 UTC by Ryan O'Hara
Modified: 2012-06-20 14:40 UTC (History)
3 users (show)

Fixed In Version: fence-agents-3.1.5-11
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-20 14:40:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Remove unlinke for fence_scsi.dev file (1.53 KB, patch)
2011-09-27 22:29 UTC, Ryan O'Hara
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:0943 0 normal SHIPPED_LIVE fence-agents bug fix and enhancement update 2012-06-19 21:00:16 UTC

Description Ryan O'Hara 2011-09-26 15:54:27 UTC
The fence_scsi agent creates a file (/var/run/cluster/fence_scsi.dev) that contains a list of devices that the node registered with during the unfence operation. This file is unlinked for every unfence action, which creates a problem if you use multiple fence device entries in cluster.conf, the fence_scsi.dev file will contain only the devices that the node registered with during the most recent unfence operation.

This is best explained with an example. Consider the following cluster.conf file:

<?xml version="1.0"?>
<cluster config_version="1" name="foobar">
    <cman two_node="1" expected_votes="1" cluster_id="77"/>
    <fence_daemon post_fail_delay="0" post_join_delay="30"/>
    <clusternodes>
        <clusternode name="foo" votes="1" nodeid="3">
            <fence>
            <method name="scsi">
                <device name="scsi_1" key="3"/>
                <device name="scsi_2" key="3"/>
                <device name="scsi_3" key="3"/>
            </method>
            </fence>
            <unfence>
                <device name="scsi_1" key="3" action="on"/>
                <device name="scsi_2" key="3" action="on"/>
                <device name="scsi_3" key="3" action="on"/>
            </unfence>
        </clusternode>
        <clusternode name="bar" votes="1" nodeid="4">
            <fence>
            <method name="scsi">
                <device name="scsi_1" key="4"/>
                <device name="scsi_2" key="4"/>
                <device name="scsi_3" key="4"/>
            </method>
            </fence>
            <unfence>
                <device name="scsi_1" key="4" action="on"/>
                <device name="scsi_2" key="4" action="on"/>
                <device name="scsi_3" key="4" action="on"/>
            </unfence>
        </clusternode>
    </clusternodes>
    <fencedevices>
        <fencedevice agent="fence_scsi" name="scsi_1"
         devices="/dev/sdb,/dev/sdc"
         logfile="/tmp/fence_scsi.log"/>
        <fencedevice agent="fence_scsi" name="scsi_2"
         devices="/dev/sdd,/dev/sde"
         logfile="/tmp/fence_scsi.log"/>
        <fencedevice agent="fence_scsi" name="scsi_3"
         devices="/dev/sdf,/dev/sdg"
         logfile="/tmp/fence_scsi.log"/>
    </fencedevices>
    <rm>
        <failoverdomains/>
        <resources/>
    </rm>
</cluster>

This is a valid cluster.conf file in which multiple fencedevice entries exist for the fence_scsi agent, each containing a different list of devices. When unfencing occurs, the fence_scsi agent will be called three times. Each time fence_scsi registers some devices, the fence_scsi.dev file will be unlinked. The result is that once unfencing is complete, the fence_scsi.dev file will contain:

/dev/sdf
/dev/sdg

The expected result is that fence_scsi.dev will contain all the devices:

/dev/sdb
/dev/sdc
/dev/sdd
/dev/sde
/dev/sdf
/dev/sdg

Note that this problem only occurs when devices are manually defined and they are listed in multiple fencedevice entries.

The fence_scsi.dev file is only used by the fence_scsi_check watchdog script. This file provides a list of devices that fence_scsi_check should check periodically for registrations. If the fence_scsi_check watchdog script is not being used, this problem has no effect.

Comment 3 Ryan O'Hara 2011-09-27 22:29:27 UTC
Created attachment 525221 [details]
Remove unlinke for fence_scsi.dev file

This patch removes the unlink call that deletes the fence_scsi.dev file on each unfence (action=on) operation. As explained in comment #1, there is a specific case where unfencing can result in multiple calls of 'fence_scsi -o on ...', where each of those calls would unlink the fence_scsi.dev file.

The fence_scsi.dev file is used to keep track of what devices the local node is currently registered with. Currently it is only used by the fence_scsi_check watchdog script.

Rather than remove the fence_scsi.dev, this patch will check to see if the device currently being registered already exists in the fence_scsi.dev file. If it does not, write it to the file.

Note that removing the unlink is safe because the fence_scsi.dev file exists in /var/run/cluster/ directory and therefore will be removed on reboot.

Comment 4 Ryan O'Hara 2011-09-27 22:42:40 UTC
Test result:

* Without the patch, use a cluster.conf file similar to the one in comment #1. The key is to have multiple fencedevice entries for fence_scsi that contain different devices.

# service cman start
Starting cluster: 
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman...                                        [  OK  ]
   Waiting for quorum...                                   [  OK  ]
   Starting fenced...                                      [  OK  ]
   Starting dlm_controld...                                [  OK  ]
   Starting gfs_controld...                                [  OK  ]
   Unfencing self...                                       [  OK  ]
   Joining fence domain...                                 [  OK  ]

# cat /var/run/cluster/fence_scsi.dev
/dev/sdf
/dev/sdg

Here we can see the problem -- only /dev/sdf and /dev/sdg exist in the file because the file was unlinked each time fence_scsi was called with action=on.

* With the patch, same configuration. Before running this test, the fence_scsi.dev file can be removed manually or by rebooting the machine.

# rm -f /var/run/cluster/fence_scsi.dev

# service cman start
Starting cluster: 
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman...                                        [  OK  ]
   Waiting for quorum...                                   [  OK  ]
   Starting fenced...                                      [  OK  ]
   Starting dlm_controld...                                [  OK  ]
   Starting gfs_controld...                                [  OK  ]
   Unfencing self...                                       [  OK  ]
   Joining fence domain...                                 [  OK  ]

# cat /var/run/cluster/fence_scsi.dev
/dev/sdb
/dev/sdc
/dev/sdd
/dev/sde
/dev/sdf
/dev/sdg

* Now we should be able to run 'service cman retart' or 'service cman start' without getting duplicate entries in the /var/run/cluster/fence_scsi.dev file.

# service cman restart
Stopping cluster: 
   Leaving fence domain...                                 [  OK  ]
   Stopping gfs_controld...                                [  OK  ]
   Stopping dlm_controld...                                [  OK  ]
   Stopping fenced...                                      [  OK  ]
   Stopping cman...                                        [  OK  ]
   Waiting for corosync to shutdown:                       [  OK  ]
   Unloading kernel modules...                             [  OK  ]
   Unmounting configfs...                                  [  OK  ]
Starting cluster: 
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman...                                        [  OK  ]
   Waiting for quorum...                                   [  OK  ]
   Starting fenced...                                      [  OK  ]
   Starting dlm_controld...                                [  OK  ]
   Starting gfs_controld...                                [  OK  ]
   Unfencing self...                                       [  OK  ]
   Joining fence domain...                                 [  OK  ]

# cat /var/run/cluster/fence_scsi.dev
/dev/sdb
/dev/sdc
/dev/sdd
/dev/sde
/dev/sdf
/dev/sdg

Comment 7 Ryan O'Hara 2011-12-19 16:43:44 UTC
Pushed to RHEL6 branch upstream.

commit 909d5b2c40b7f9b233a0aa5e19f3d5c83d0577c4

Comment 11 Nate Straz 2012-05-25 14:59:42 UTC
Verified against fence-agents-3.1.5-17.el6.x86_64


[root@smoke-02 ~]# cat /var/run/cluster/fence_scsi.dev
/dev/sdb
/dev/sdc

cluster.conf fragments:

  <clusternodes>
    <clusternode name="smoke-02" votes="1" nodeid="2">
      <fence>
        <method name="scsi">
                <device name="scsi_1" key="2"/>
                <device name="scsi_2" key="2"/>
        </method>
      </fence>
      <unfence>
        <device name="scsi_2" key="1" action="on"/>
        <device name="scsi_1" key="2" action="on"/>
      </unfence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice agent="fence_scsi" name="scsi_1" devices="/dev/sdb"/>
    <fencedevice agent="fence_scsi" name="scsi_2" devices="/dev/sdc"/>
  </fencedevices>

Comment 12 errata-xmlrpc 2012-06-20 14:40:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0943.html


Note You need to log in before you can comment on or make changes to this bug.