Bug 1461377 - inconsisten timeout behaviour weather fencing triggered via pengine or manually using pcs
inconsisten timeout behaviour weather fencing triggered via pengine or manual...
Status: ASSIGNED
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pacemaker (Show other bugs)
7.4
Unspecified Unspecified
unspecified Severity unspecified
: rc
: 7.6
Assigned To: Klaus Wenninger
cluster-qe@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-14 06:31 EDT by Klaus Wenninger
Modified: 2017-10-09 13:45 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Klaus Wenninger 2017-06-14 06:31:48 EDT
Description of problem:
When pengine creates a transition with a fencing-request it puts in stonith-timeout that is passed to stonith-api by tengine.
When fencing is triggered manually via pcs it seems to use the default timeout implemented in stonith_admin (120s).

Version-Release number of selected component (if applicable):
Version     : 0.9.158
Release     : 2.el7

How reproducible:
100%

Steps to Reproduce:
1. configure a fencing-device with a large and reproducible timeout (e.g. 90s)
2. 'pcs property set stonith-timeout=60s'
3. fence a node using pcs ('pcs stonith fence node3') --> success
3. pull the power-cord so that pengine triggers fencing --> timeout

Actual results:

Manually triggered fencing succeeds while triggered by pengine times out.

Expected results:

Both cases should time out if stonith-timeout is set too low.


Additional info:
A timeout can be handed over to stonith_admin.
Could be raised against pacemaker as well but stonith_admin should stay a (guess it is at the moment) tool that can be used to test stonithd standalone.
When pcmk_reboot_timout-attribute (similar for other actions) is used the behaviour is already consistent as the timeout passed via stonith-api is overruled.
Comment 2 Tomas Jelinek 2017-06-14 08:06:28 EDT
Klaus,

Can you describe more precisely what do you propose pcs should do?

Timeout can be set to specific stonith resources. But the user only specifies a node to be fenced in the "pcs stonith fence" command. In that case pcs should not look for a specific fencing device, take its timeout and pass it to stonith_admin as pcs does not know what device will be used to fence the node.

For me it makes sense that pacemaker should figure out and use whatever timeout is set in the CIB unless the timeout is overridden from the command line (stonith_admin --timeout) which is functionality pcs currently does not provide.
Comment 3 Klaus Wenninger 2017-06-14 08:30:35 EDT
(In reply to Tomas Jelinek from comment #2)
> Klaus,
> 
> Can you describe more precisely what do you propose pcs should do?
> 
> Timeout can be set to specific stonith resources. But the user only
> specifies a node to be fenced in the "pcs stonith fence" command. In that
> case pcs should not look for a specific fencing device, take its timeout and
> pass it to stonith_admin as pcs does not know what device will be used to
> fence the node.
> 
> For me it makes sense that pacemaker should figure out and use whatever
> timeout is set in the CIB unless the timeout is overridden from the command
> line (stonith_admin --timeout) which is functionality pcs currently does not
> provide.

Well, as described under additional info one could discuss where to solve this issue (pacemaker or pcs)...
When stonithd is getting the individual timeouts from the devices it is anyway working as desired already.
Just the case where stonith-timeout property would be used by pengine and 120s hardcoded is used by stonith_admin should be considered. Actually no distinction has to be made because stonithd would handle the overruling by individual timeout already.
As stonith_admin is at the moment just using the stonith-API it can't get the stonith-timeout property from the cib without adding usage of other APIs possibly breaking the standalone-capability.
Thus I would have suggested that pcs is getting the timeout from the cib as default behaviour with maybe --timeout to overrule that.

But I'm not familiar with the standalone capabilities of stonithd.
So I'm adding Ken.
Could imagine that stonith_admin adds the cib-access when compiled with pacemaker as well. Or maybe even nicer we could give 0 (or -1) as timeout via the stonith-API to tell stonithd to insert the value from the cib.
Comment 4 Ken Gaillot 2017-06-14 10:27:55 EDT
I do like the idea of using -1 to indicate "use the configured default". Stonithd already maintains a local copy of the entire CIB, so it should be easy to grab stonith-timeout from it when needed.
Comment 5 Klaus Wenninger 2017-06-14 10:30:30 EDT
(In reply to Ken Gaillot from comment #4)
> I do like the idea of using -1 to indicate "use the configured default".
> Stonithd already maintains a local copy of the entire CIB, so it should be
> easy to grab stonith-timeout from it when needed.

yes, my favourite as well - just didn't think of it till I wrote this last comment ... let's do it in pacemaker
Comment 7 Ken Gaillot 2017-10-09 13:45:38 EDT
Due to time constraints, this will not make 7.5

Note You need to log in before you can comment on or make changes to this bug.