Red Hat Bugzilla – Bug 1469255
stonith-action=poweroff leads to failure in fence-agent
Last modified: 2018-03-02 18:14:35 EST
Description of problem:
According to documentation stonith-action is either set to reboot or poweroff.
In the poweroff-case this is propagated 1:1 into the RHCS-fence-agents which can't handle that action.
Version-Release number of selected component (if applicable):
found with upstream-master but there shouldn't be a difference to 1.1.17
Steps to Reproduce:
1. setup a config with a RHCS fencing-agent (e.g. fence_sbd)
2. pcs property set stonith-action=poweroff
3. have pacemaker trigger fencing e.g. by cutting the networking connection
Jul 10 10:52:46  bsul0799 stonith-ng: warning: log_action: fence_sbd stderr: [ Failed: Unrecognised action 'poweroff' ]
fence-agent properly turns off the fenced node
a test with 'pcs stonith fence ...' doesn't show the problem
stonith-action=off leads to pacemaker initiated fencing working properly but fencing with pcs is failing
qa-ack+: setting stonith-action=poweroff must work for all the fence agents
moved to rhel-7.6 due to effort constraints
Really? Shouldn't be that hard to map 'poweroff' to 'off' inside the stonith library
Pacemaker currently accepts the values "reboot", "off", or "poweroff" for stonith-action.
LHA-style external/* agents (which are supported upstream, but not in RHEL) do support "poweroff". Remapping "poweroff" to "off" globally would break those.
I see two reasonable approaches:
1. Drop support for stonith-action=poweroff. If someone wants to use poweroff with LHA agents, they must set stonith-action=off and pcmk_off_action=poweroff. (This is the cleanest and easiest option development-wise, but involves some pain for LHA users.)
2. Remap stonith-action=poweroff to stonith-action=off, and for LHA agents, also assume pcmk_off_action=poweroff if not otherwise set. (This is easiest for all users.)
Suggestion for 3rd option:
3. Let pcs to do some checks during execution of commands 'pcs stonith create'
and 'pcs property set stonith-property=<value>'.
Before stonith resource is created, check if the value of 'stonith-action' is
supported by actions of fence agent.
Before 'stonith-action' is changed by 'pcs property', check if the given
value is coherent with actions of fence agent currently in use.
(In reply to Miroslav Lisik from comment #6)
> Suggestion for 3rd option:
> 3. Let pcs to do some checks during execution of commands 'pcs stonith
> and 'pcs property set stonith-property=<value>'.
> Before stonith resource is created, check if the value of 'stonith-action' is
> supported by actions of fence agent.
> Before 'stonith-action' is changed by 'pcs property', check if the given
> value is coherent with actions of fence agent currently in use.
That's feasible, but a little more complicated than that. If stonith-action=off, and a fence agent doesn't support "off", it can still be used as long as its pcmk_off_action is set to an action it does support (and similarly for reboot). Also, users should be allowed to add devices that aren't used by the cluster (yet), so --force should override the check.
Even with such checks, we'd still need one of the other two options to handle both RH and LHA fence agents upstream.
What is the assumed target timeframe for this?
If pacemaker 2+, then unless we want to drop support for external
stonith agents completely, I'd follow up on Ken's suggestion to
possibly handle that implicitly on validation schema upgrade
involving necessary changes in the configuration by the means
of XSL transformation per 2. from [comment 5].
(2. being justified [also] with possible needs to combine the stonith
Actually, thinking more about that in pacemaker 2 context, the
semantic aliases are just the burden we certainly want to get rid of
at the _user-facing level_ (so much easier not to have to explain
these gory details to the users!), hence it might be more beneficial
to ditch "poweroff" choice once for all there, despite it would be
applied under the hood. The 1:1 continuity for existing configurations
with this approach can also be reached on XSL level as already sketched
by Ken elsewhere. From [comment 9] standpoint, it should be equal
at the end of the day, but the overall simplicity would likely be
better. The pain point of 1. for LHA agents could then be possibly
mitigated by the high-level tools (crm, pcs).
Just thinking aloud :)
After further investigation, the situation is simpler. LHA agents don't take poweroff either -- the fence_legacy wrapper accepts poweroff, and maps it to off when calling the agent.
So, Pacemaker can always map poweroff to off, as originally thought. We can log a deprecation warning, and no schema transform is required.
Fixed by upstream commit ebc8737f