Red Hat Bugzilla – Bug 1461377
inconsisten timeout behaviour weather fencing triggered via pengine or manually using pcs
Last modified: 2018-05-17 15:49:14 EDT
Description of problem:
When pengine creates a transition with a fencing-request it puts in stonith-timeout that is passed to stonith-api by tengine.
When fencing is triggered manually via pcs it seems to use the default timeout implemented in stonith_admin (120s).
Version-Release number of selected component (if applicable):
Version : 0.9.158
Release : 2.el7
Steps to Reproduce:
1. configure a fencing-device with a large and reproducible timeout (e.g. 90s)
2. 'pcs property set stonith-timeout=60s'
3. fence a node using pcs ('pcs stonith fence node3') --> success
3. pull the power-cord so that pengine triggers fencing --> timeout
Manually triggered fencing succeeds while triggered by pengine times out.
Both cases should time out if stonith-timeout is set too low.
A timeout can be handed over to stonith_admin.
Could be raised against pacemaker as well but stonith_admin should stay a (guess it is at the moment) tool that can be used to test stonithd standalone.
When pcmk_reboot_timout-attribute (similar for other actions) is used the behaviour is already consistent as the timeout passed via stonith-api is overruled.
Can you describe more precisely what do you propose pcs should do?
Timeout can be set to specific stonith resources. But the user only specifies a node to be fenced in the "pcs stonith fence" command. In that case pcs should not look for a specific fencing device, take its timeout and pass it to stonith_admin as pcs does not know what device will be used to fence the node.
For me it makes sense that pacemaker should figure out and use whatever timeout is set in the CIB unless the timeout is overridden from the command line (stonith_admin --timeout) which is functionality pcs currently does not provide.
(In reply to Tomas Jelinek from comment #2)
> Can you describe more precisely what do you propose pcs should do?
> Timeout can be set to specific stonith resources. But the user only
> specifies a node to be fenced in the "pcs stonith fence" command. In that
> case pcs should not look for a specific fencing device, take its timeout and
> pass it to stonith_admin as pcs does not know what device will be used to
> fence the node.
> For me it makes sense that pacemaker should figure out and use whatever
> timeout is set in the CIB unless the timeout is overridden from the command
> line (stonith_admin --timeout) which is functionality pcs currently does not
Well, as described under additional info one could discuss where to solve this issue (pacemaker or pcs)...
When stonithd is getting the individual timeouts from the devices it is anyway working as desired already.
Just the case where stonith-timeout property would be used by pengine and 120s hardcoded is used by stonith_admin should be considered. Actually no distinction has to be made because stonithd would handle the overruling by individual timeout already.
As stonith_admin is at the moment just using the stonith-API it can't get the stonith-timeout property from the cib without adding usage of other APIs possibly breaking the standalone-capability.
Thus I would have suggested that pcs is getting the timeout from the cib as default behaviour with maybe --timeout to overrule that.
But I'm not familiar with the standalone capabilities of stonithd.
So I'm adding Ken.
Could imagine that stonith_admin adds the cib-access when compiled with pacemaker as well. Or maybe even nicer we could give 0 (or -1) as timeout via the stonith-API to tell stonithd to insert the value from the cib.
I do like the idea of using -1 to indicate "use the configured default". Stonithd already maintains a local copy of the entire CIB, so it should be easy to grab stonith-timeout from it when needed.
(In reply to Ken Gaillot from comment #4)
> I do like the idea of using -1 to indicate "use the configured default".
> Stonithd already maintains a local copy of the entire CIB, so it should be
> easy to grab stonith-timeout from it when needed.
yes, my favourite as well - just didn't think of it till I wrote this last comment ... let's do it in pacemaker
Due to time constraints, this will not make 7.5