Bug 673546

Summary: Fencing fails to operate correcly with pcmk_host_check=none and pcmk_host_check=static-list
Product: Red Hat Enterprise Linux 6 Reporter: Gary Smith <gasmith>
Component: pacemakerAssignee: Andrew Beekhof <abeekhof>
Status: CLOSED NOTABUG QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.0CC: alain.moulle, cluster-maint, Jean-Olivier.Gerphagnon
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-31 15:04:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Gary Smith 2011-01-28 17:03:22 UTC
Description of problem:


On RHEL6 with pacemaker and fence-agents, fencing fails to operate correctly when configured to use fence_ipmilan with either of these 2 methods:

1) pcmk_host_check=none
* Fencing does not occur (even on a 2 node cluster)


2) pcmk_host_check=static-list
   pcmk_host_list='node1 node2 node3 node4'
* The fencing operation does not occur on the correct node.


Version-Release number of selected component (if applicable):
pacemaker-1.1.2-7.el6.x86_64
fence-agents-3.0.12-8.el6.x86_64


How reproducible:
Force a fence on node2 by stopping the network interface (ifdown on the heartbeat interface)
 
Actual results:
node4 is fenced instead of node2

Expected results:
node2 should be fenced

Additional info:
http://www.gossamer-threads.com/lists/linuxha/users/67266
http://www.gossamer-threads.com/lists/linuxha/users/65098

Attached files for analysis within fence-pb-260111.tar:
crm-configure-show
- shows the crm configuration of the HA cluster

crm_mon.after-ifdown-eth0-on-perou3 
- shows the crm monitoring just after the ifdown

crm_mon.before.ifdown-eth0-on-perou3
- shows the crm monitoring before the ifdown on perou3

syslog.perou2.during-fencing-pb
- shows the status of node perou2

syslog.perou6.during-fencing-pb
-  shows the status of node perou6

Comment 2 Andrew Beekhof 2011-01-31 12:10:34 UTC
IIRC, IPMI devices can only fence the machine of which they are a part.
So this device definition looks wrong:

primitive restofenceperou2 stonith:fence_ipmilan \
	params ipaddr="10.11.0.103" login="administrator" passwd="administrator" pcmk_host_check="static-list" pcmk_host_list="perou2 perou3 perou6 perou7" action="reboot"

Contrary to the advice in:
  http://www.gossamer-threads.com/lists/linuxha/users/67410#67410
each device is advertising that it can fence _all_ nodes in the cluster.

Set pcmk_host_list (for each device) to _only_ the host name associated with the device's ipaddr instead.

For example (guessing at the node/ip mapping):

primitive restofenceperou2 stonith:fence_ipmilan \
	params ipaddr="10.11.0.103" login="administrator" passwd="administrator" pcmk_host_check="static-list" pcmk_host_list="perou2" action="reboot" \
	meta target-role="Started"
primitive restofenceperou3 stonith:fence_ipmilan \
	params ipaddr="10.11.0.104" login="administrator" passwd="administrator" pcmk_host_check="static-list" pcmk_host_list="perou3" action="reboot" \
	meta target-role="Started"
primitive restofenceperou6 stonith:fence_ipmilan \
	params ipaddr="10.11.0.107" login="administrator" passwd="administrator" pcmk_host_check="static-list" pcmk_host_list="perou6" action="reboot" \
	meta target-role="Started"
primitive restofenceperou7 stonith:fence_ipmilan \
	params ipaddr="10.11.0.108" login="administrator" passwd="administrator" pcmk_host_check="static-list" pcmk_host_list="perou7" action="reboot"

Comment 3 Andrew Beekhof 2011-01-31 15:04:47 UTC
Feedback indicates that things are now working. Closing.