Bug 1386273 - stonith_admin --confirm succeeds for a nonexistent node
Summary: stonith_admin --confirm succeeds for a nonexistent node
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pacemaker
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Ken Gaillot
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-18 14:21 UTC by Tomas Jelinek
Modified: 2016-10-18 15:40 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-18 14:57:17 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Tomas Jelinek 2016-10-18 14:21:15 UTC
Description of problem:
Running "stonith_admin --confirm nonexistent_node" returns 0 and does not print any warning or error.


Version-Release number of selected component (if applicable):
pacemaker-1.1.15-10.el7.x86_64


How reproducible:
always, easily


Steps to Reproduce:
stonith_admin --confirm nonexistent_node


Actual results:
Exit code 0, no error messages.


Expected results:
Non-zero exit code, error message saying it is not possible to confirm fencing of a nonexistent node.


Additional info:
corosync.log:
Oct 18 16:18:38 [12585] rh72-node1 stonith-ng:   notice: handle_request:        Received manual confirmation that nonexistent is fenced
Oct 18 16:18:38 [12585] rh72-node1 stonith-ng:   notice: initiate_remote_stonith_op:    Initiating manual confirmation for nonexistent: 6eb84017-6c8c-4041-a323-a4e0ae75e38a
Oct 18 16:18:38 [12585] rh72-node1 stonith-ng:   notice: stonith_manual_ack:    Injecting manual confirmation that nonexistent is safely off/down
Oct 18 16:18:38 [12585] rh72-node1 stonith-ng:   notice: remote_op_done:        Operation off of nonexistent by a human for stonith_admin.13024: OK
Oct 18 16:18:38 [12589] rh72-node1       crmd:   notice: tengine_stonith_notify:        Peer nonexistent was terminated (off) by a human for rh72-node1: OK (ref=6eb84017-6c8c-4041-a323-a4e0ae75e38a) by client stonith_admin.13024

Comment 1 Ken Gaillot 2016-10-18 14:57:17 UTC
This is not a bug.

First, stonithd allows the user to use node names that aren't currently known at the cluster level, whether with --confirm or something like pcmk_host_list, because the node may be added to the cluster at any time, or it may have joined a partition that stonith_admin currently can't see (but a fence device can shoot).

Second (and not widely known), stonithd is designed to be usable for fencing anything, not just cluster nodes. A user can register arbitrary node names and arbitrary fence devices that can fence those nodes, and request that stonithd perform fencing. As long as some device is capable of fencing the node, stonithd doesn't care what the node name is or whether it is part of the cluster. The stonithd regression tests even use this behavior to set up imaginary fence scenarios.

Comment 2 Tomas Jelinek 2016-10-18 15:06:29 UTC
Thanks for quick answer and thorough explanation.

Considering the "pcs stonith confirm" command is pretty much just a wrapper for "stonith_admin --confirm", do you think it should or should not check if a node exists in a cluster? Based on your explanation pcs should not check it. If pacemaker cannot see a node, then pcs getting the list of nodes from pacemaker cannot see it either. The check would make it impossible to confirm the invisible node fenced.

However it might be a good idea to explain the behavior of the command in more details in pcs documentation.

Comment 3 Ken Gaillot 2016-10-18 15:40:03 UTC
Agreed. At most, pcs could indicate in its success/fail message whether the fenced node was a known cluster node or not.


Note You need to log in before you can comment on or make changes to this bug.