Hide Forgot
Description of problem: Pacemaker has it's own default for pcmk_reboot_action which is "reboot". It calls the agent with that as an action default, which might not always be optimal. The agents act in a way that by default they perform what you would expect to get a node fenced. Most of the time this is actually the same (i.e. "reboot"), but sometimes this is different. The only agents I found to behave differently are fence_scsi and fence_brocade. If pacemaker used agents defaults (i.e. action undefined), these agents would work without any pcmk_reboot_action configuration, just like they did in cman times. Version-Release number of selected component (if applicable): pacemaker-1.1.10-12.el6.x86_64 How reproducible: always Steps to Reproduce: 1. configure fence_scsi without any pcmk_reboot_action 2. fence the node 3. see agents failing and keys still registered Actual results: pcmk_reboot_action necessary for certain agents Expected results: no pcmk_reboot_action necessary, pacemaker uses agents' own defaults (i.e. no action parameter) Additional info:
This was fixed in 1.1.10-10
Dammit, wrong bug
I disagree here. Allowing each agent to have its own default results inconsistent and non-obvious behaviour (as well as increasing the number of places to check for the value being used). The two agents should be doing something sane for 'reboot' instead. There are a number of agents that fake "reboot" by sending "off" + "on", do they not report success as long as "off" succeeds? This wouldn't seem much different. We'd also not be able to push such a change upstream as changing the Pacemaker defaults would cause compatibility issues with other tools and agents.
Marek, fence_brocade has been rewritted since this bug was filed and fence_scsi has to be ported/fixed for rhel7. Can we address those issues without possibly introducing regressions vs current deployed setups in RHEL6? Otherwise we will need to document it for 6 and I´d like to see it fixed properly in 7 with a consistent reboot action across all agents.
@Fabio, Those bugs should be fixed now. But we do not want to have action 'reboot' everywhere (and we never had) because fabric fence agents (e.g. switches, scsi, ... + kdump which is kind of special one as it does not have 'on') can not have reboot action. -- possible solutions: 1) add this information to fence agents XML 2) make cluster aware that the orders of default actions is reboot/off - cluster already has <actions> in XML, so this should not be a problem. I prefer, first version because this is not cluster specific.
There is another option which I did not mention and is available/ready and supported. When action is not specified then default action is used, what is reboot/off according to fence agent. Is this acceptable?
probably not as we need sane default for "reboot" action (see the summary and the history of this bug). Pacemaker always calls it with reboot and expects the agent to handle that.
(In reply to Jaroslav Kortus from comment #9) > probably not as we need sane default for "reboot" action (see the summary > and the history of this bug). Pacemaker always calls it with reboot and > expects the agent to handle that. If no reboot is advertised in the agent's metadata xml, we'll revert to 'off'. https://github.com/ClusterLabs/pacemaker/commit/8383a38a478ed6473ff2179596335ed4de583cfa I put a big warning message in there so we'd know the 'reboot' to 'off' substitution took place because some agent didn't support 'reboot'. This seems like the path of least resistance in that it allows these couple of agents to work properly with the current pacemaker defaults. If people want the warnings to go away, they can change the default off operation. -- Vossel