1327302 – Some pcs commands fail when run on Pacemaker Remote nodes

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1327302 - Some pcs commands fail when run on Pacemaker Remote nodes

Summary: Some pcs commands fail when run on Pacemaker Remote nodes

Keywords:
Status:	CLOSED DUPLICATE of bug 1374175
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pcs
Sub Component:
Version:	7.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Tomas Jelinek
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-04-14 17:41 UTC by Ken Gaillot
Modified:	2017-02-08 10:03 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-02-08 10:03:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1374175	0	medium	CLOSED	"crm_node -n" needs to return the right name on remote nodes	2021-06-10 11:31:20 UTC

Internal Links: 1374175

Description Ken Gaillot 2016-04-14 17:41:26 UTC

Currently, Red Hat documentation recommends installing and running pcs only on cluster nodes (not Pacemaker Remote nodes). The pcs commands successfully have an effect on the entire cluster, including Pacemaker Remote nodes, but they must be initiated from a cluster node command line.

The reason is that certain command-line tools require the full cluster stack, which is not available (and might not even be installed) on Pacemaker Remote nodes.

Affected commands are crm_attribute, crm_master, crm_node, and stonith_admin. In most cases, only some of the command's options do not work on Pacemaker Remote nodes.

Pacemaker upstream's long-term goal is to proxy the necessary functionality so that the commands "just work" on Pacemaker Remote nodes. In the meantime, it would be nice if pcs handled the situation more cleanly.

For example, those commands might not be installed at all on a Pacemaker Remote node, and pcs could detect this and report an error that the pcs command must be run from a cluster node. (Generally, this will not happen, because the pcs package depends on the pacemaker package which pulls in the entire cluster stack, but that dependency could change in the future, or someone could install upstream pcs from source.)

Or, the commands might be installed, but certain usages will fail on a Pacemaker Remote node. The most prominent example here is "crm_node -l", which "pcs status" uses. Currently, "pcs status" on a Pacemaker Remote node will print:

  Cluster name: 
  Error: unable to get list of pacemaker nodes

even though crm_mon works just fine. Perhaps if "crm_node -l" fails, pcs could instead print a warning about some information not being available on Pacemaker Remote nodes, and continue with the crm_mon output.

Comment 2 Tomas Jelinek 2016-04-15 07:26:23 UTC


*** This bug has been marked as a duplicate of bug 1289418 ***

Comment 3 Jan Pokorný [poki] 2017-02-07 21:08:41 UTC

Reopening as it, oddly, seems this very case ended up untested as
a matter of [bug 1289418] this one was originally closed as a dupe of.

With pacemaker-remote-1.1.15-11.el7_3.3.x86_64 and
pcs-0.9.152-10.el7.x86_64:

# pcs status
> Cluster name: mytest
> No such file or directory
> Error: unable to locate command: /usr/sbin/crm_node

The traceback looks like:

>  File "/usr/lib/python2.7/site-packages/pcs/app.py",
>  line 216, in main
>    cmd_map[command](argv)
>  File "/usr/lib/python2.7/site-packages/pcs/status.py",
>  line 24, in status_cmd
>    full_status()
>  File "/usr/lib/python2.7/site-packages/pcs/status.py",
>  line 99, in full_status
>    utils.corosyncPacemakerNodeCheck()
>  File "/usr/lib/python2.7/site-packages/pcs/utils.py",
>  line 1979, in corosyncPacemakerNodeCheck
>    pm_nodes = getPacemakerNodesID(allow_failure=True)

Interestingly enough:

# useradd auxuser -g haclient
# su -c 'pcs status' auxuser3
> Cluster name: mytest
> Please authenticate yourself to the local pcsd
> Username: hacluster
> Password:
> localhost: Authorized
>
> Stack: corosync
> Current DC: virt-036 (version 1.1.15-11.el7-e174ec8) - partition with quorum
> Last updated: Tue Feb  7 22:03:22 2017
> Last change: Tue Feb  7 21:38:40 2017 by hacluster via crmd on virt-036
>
> 2 nodes and 2 resources configured
>
> Online: [ virt-036 ]
> RemoteOnline: [ myremote ]
>
> Full list of resources:
>
>  myremote       (ocf::pacemaker:remote):        Started virt-036
>  mydelay2       (ocf::heartbeat:Delay): Started myremote
>
> Daemon Status:
>   corosync: inactive/disabled
>   pacemaker: inactive/disabled
>   pacemaker_remote: active/enabled
>   pcsd: active/enabled

Note that this remote node happens to have corosync installed, but it's
disabled and not running.

Comment 4 Tomas Jelinek 2017-02-08 10:03:26 UTC

This was tested and passed the tests. To reproduce this issue one needs to remove crm_node executable e.g. by running "rpm -e --nodeps pacemaker". However pcs depends on pacemaker and so this issue does not happen on standard installations. That is why this was not discovered.

Pcs relies on crm_node executable to be present on remote nodes. Therefore I do not think this is a pcs bug. (There is a plan to move crm_node to pacemaker-cli package which pcs can in the future depend on not to install full pacemaker stack on remote nodes.)

*** This bug has been marked as a duplicate of bug 1374175 ***

Note You need to log in before you can comment on or make changes to this bug.