Bug 1327302

Summary:	Some pcs commands fail when run on Pacemaker Remote nodes
Product:	Red Hat Enterprise Linux 7	Reporter:	Ken Gaillot <kgaillot>
Component:	pcs	Assignee:	Tomas Jelinek <tojeline>
Status:	CLOSED DUPLICATE	QA Contact:	cluster-qe <cluster-qe>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	7.2	CC:	cfeist, cluster-maint, idevat, jpokorny, omular, tojeline
Target Milestone:	rc	Keywords:	Reopened
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-02-08 10:03:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ken Gaillot 2016-04-14 17:41:26 UTC

Currently, Red Hat documentation recommends installing and running pcs only on cluster nodes (not Pacemaker Remote nodes). The pcs commands successfully have an effect on the entire cluster, including Pacemaker Remote nodes, but they must be initiated from a cluster node command line.

The reason is that certain command-line tools require the full cluster stack, which is not available (and might not even be installed) on Pacemaker Remote nodes.

Affected commands are crm_attribute, crm_master, crm_node, and stonith_admin. In most cases, only some of the command's options do not work on Pacemaker Remote nodes.

Pacemaker upstream's long-term goal is to proxy the necessary functionality so that the commands "just work" on Pacemaker Remote nodes. In the meantime, it would be nice if pcs handled the situation more cleanly.

For example, those commands might not be installed at all on a Pacemaker Remote node, and pcs could detect this and report an error that the pcs command must be run from a cluster node. (Generally, this will not happen, because the pcs package depends on the pacemaker package which pulls in the entire cluster stack, but that dependency could change in the future, or someone could install upstream pcs from source.)

Or, the commands might be installed, but certain usages will fail on a Pacemaker Remote node. The most prominent example here is "crm_node -l", which "pcs status" uses. Currently, "pcs status" on a Pacemaker Remote node will print:

  Cluster name: 
  Error: unable to get list of pacemaker nodes

even though crm_mon works just fine. Perhaps if "crm_node -l" fails, pcs could instead print a warning about some information not being available on Pacemaker Remote nodes, and continue with the crm_mon output.

Comment 2 Tomas Jelinek 2016-04-15 07:26:23 UTC


*** This bug has been marked as a duplicate of bug 1289418 ***

Comment 3 Jan Pokorný [poki] 2017-02-07 21:08:41 UTC

Reopening as it, oddly, seems this very case ended up untested as
a matter of [bug 1289418] this one was originally closed as a dupe of.

With pacemaker-remote-1.1.15-11.el7_3.3.x86_64 and
pcs-0.9.152-10.el7.x86_64:

# pcs status
> Cluster name: mytest
> No such file or directory
> Error: unable to locate command: /usr/sbin/crm_node

The traceback looks like:

>  File "/usr/lib/python2.7/site-packages/pcs/app.py",
>  line 216, in main
>    cmd_map[command](argv)
>  File "/usr/lib/python2.7/site-packages/pcs/status.py",
>  line 24, in status_cmd
>    full_status()
>  File "/usr/lib/python2.7/site-packages/pcs/status.py",
>  line 99, in full_status
>    utils.corosyncPacemakerNodeCheck()
>  File "/usr/lib/python2.7/site-packages/pcs/utils.py",
>  line 1979, in corosyncPacemakerNodeCheck
>    pm_nodes = getPacemakerNodesID(allow_failure=True)

Interestingly enough:

# useradd auxuser -g haclient
# su -c 'pcs status' auxuser3
> Cluster name: mytest
> Please authenticate yourself to the local pcsd
> Username: hacluster
> Password:
> localhost: Authorized
>
> Stack: corosync
> Current DC: virt-036 (version 1.1.15-11.el7-e174ec8) - partition with quorum
> Last updated: Tue Feb  7 22:03:22 2017
> Last change: Tue Feb  7 21:38:40 2017 by hacluster via crmd on virt-036
>
> 2 nodes and 2 resources configured
>
> Online: [ virt-036 ]
> RemoteOnline: [ myremote ]
>
> Full list of resources:
>
>  myremote       (ocf::pacemaker:remote):        Started virt-036
>  mydelay2       (ocf::heartbeat:Delay): Started myremote
>
> Daemon Status:
>   corosync: inactive/disabled
>   pacemaker: inactive/disabled
>   pacemaker_remote: active/enabled
>   pcsd: active/enabled

Note that this remote node happens to have corosync installed, but it's
disabled and not running.

Comment 4 Tomas Jelinek 2017-02-08 10:03:26 UTC

This was tested and passed the tests. To reproduce this issue one needs to remove crm_node executable e.g. by running "rpm -e --nodeps pacemaker". However pcs depends on pacemaker and so this issue does not happen on standard installations. That is why this was not discovered.

Pcs relies on crm_node executable to be present on remote nodes. Therefore I do not think this is a pcs bug. (There is a plan to move crm_node to pacemaker-cli package which pcs can in the future depend on not to install full pacemaker stack on remote nodes.)

*** This bug has been marked as a duplicate of bug 1374175 ***