When fence_scsi attempts to fence a node (ie. remove a node's key from all devices), the agent first checks to see that the key to be removed is actually registered with the device. The reason for this is that if the key to be removed is not registered with a device, it will appear that fencing failed, when in reality there was nothing to do (since the key was not registered). This check is done by getting a list of keys registered with a device and storing the list in a hash. This hash needs to be cleared each time we get a new list of keys for a device, otherwise we may have stale data in the hash. This could cause fence_scsi to incorrectly report failure.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Fixed in RHEL5. Turns out the problem was not that we needed to clear the key_list hash, but that the get_key_list subroutine did not correcntly declare the key_list. Because Perl does dynamic scoping, the undeclared key_list variable was assumed to be global. The result is/was that the key_list hash was never getting updated correctly. Simple, one line fix.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0189.html