Bug 1386273

Summary: stonith_admin --confirm succeeds for a nonexistent node
Product: Red Hat Enterprise Linux 7 Reporter: Tomas Jelinek <tojeline>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED NOTABUG QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: abeekhof, cluster-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-18 14:57:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Tomas Jelinek 2016-10-18 14:21:15 UTC
Description of problem:
Running "stonith_admin --confirm nonexistent_node" returns 0 and does not print any warning or error.


Version-Release number of selected component (if applicable):
pacemaker-1.1.15-10.el7.x86_64


How reproducible:
always, easily


Steps to Reproduce:
stonith_admin --confirm nonexistent_node


Actual results:
Exit code 0, no error messages.


Expected results:
Non-zero exit code, error message saying it is not possible to confirm fencing of a nonexistent node.


Additional info:
corosync.log:
Oct 18 16:18:38 [12585] rh72-node1 stonith-ng:   notice: handle_request:        Received manual confirmation that nonexistent is fenced
Oct 18 16:18:38 [12585] rh72-node1 stonith-ng:   notice: initiate_remote_stonith_op:    Initiating manual confirmation for nonexistent: 6eb84017-6c8c-4041-a323-a4e0ae75e38a
Oct 18 16:18:38 [12585] rh72-node1 stonith-ng:   notice: stonith_manual_ack:    Injecting manual confirmation that nonexistent is safely off/down
Oct 18 16:18:38 [12585] rh72-node1 stonith-ng:   notice: remote_op_done:        Operation off of nonexistent by a human for stonith_admin.13024: OK
Oct 18 16:18:38 [12589] rh72-node1       crmd:   notice: tengine_stonith_notify:        Peer nonexistent was terminated (off) by a human for rh72-node1: OK (ref=6eb84017-6c8c-4041-a323-a4e0ae75e38a) by client stonith_admin.13024

Comment 1 Ken Gaillot 2016-10-18 14:57:17 UTC
This is not a bug.

First, stonithd allows the user to use node names that aren't currently known at the cluster level, whether with --confirm or something like pcmk_host_list, because the node may be added to the cluster at any time, or it may have joined a partition that stonith_admin currently can't see (but a fence device can shoot).

Second (and not widely known), stonithd is designed to be usable for fencing anything, not just cluster nodes. A user can register arbitrary node names and arbitrary fence devices that can fence those nodes, and request that stonithd perform fencing. As long as some device is capable of fencing the node, stonithd doesn't care what the node name is or whether it is part of the cluster. The stonithd regression tests even use this behavior to set up imaginary fence scenarios.

Comment 2 Tomas Jelinek 2016-10-18 15:06:29 UTC
Thanks for quick answer and thorough explanation.

Considering the "pcs stonith confirm" command is pretty much just a wrapper for "stonith_admin --confirm", do you think it should or should not check if a node exists in a cluster? Based on your explanation pcs should not check it. If pacemaker cannot see a node, then pcs getting the list of nodes from pacemaker cannot see it either. The check would make it impossible to confirm the invisible node fenced.

However it might be a good idea to explain the behavior of the command in more details in pcs documentation.

Comment 3 Ken Gaillot 2016-10-18 15:40:03 UTC
Agreed. At most, pcs could indicate in its success/fail message whether the fenced node was a known cluster node or not.