1290512 – pcs doesn't support putting Pacemaker Remote nodes into standby

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1290512 - pcs doesn't support putting Pacemaker Remote nodes into standby

Summary: pcs doesn't support putting Pacemaker Remote nodes into standby

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pcs
Sub Component:
Version:	7.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Ivan Devat
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1374175
Blocks:
TreeView+	depends on / blocked

Reported:	2015-12-10 17:24 UTC by Ken Gaillot
Modified:	2016-11-03 20:56 UTC (History)
CC List:	5 users (show)
Fixed In Version:	pcs-0.9.151-1.el7
Doc Type:	Bug Fix
Doc Text:	Cause: pcs does not recognize node when user is putting pacemaker remote node into standby mode Consequence: pcs refuse to put pacemaker remote node into standby mode Fix: include pacemaker remote nodes among nodes that are known Result: pcs put pacemaker remote nodes into standby mode
Clone Of:
Clones:	1374175 (view as bug list)
Environment:
Last Closed:	2016-11-03 20:56:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
proposed fix (16.10 KB, patch) 2016-02-23 06:29 UTC, Ivan Devat	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2016:2596	0	normal	SHIPPED_LIVE	Moderate: pcs security, bug fix, and enhancement update	2016-11-03 12:11:34 UTC

Description Ken Gaillot 2015-12-10 17:24:42 UTC

Description of problem: pcs doesn't support putting Pacemaker Remote nodes into standby mode


Version-Release number of selected component (if applicable): 0.9.143-15.el7


How reproducible: Consistently/easily


Steps to Reproduce:
1. Create a Pacemaker cluster.
2. Configure a Pacemaker Remote node in the cluster.
3. Try to put the Pacemaker Remote node into standby mode using "pcs cluster standby <nodename>".

Actual results: Error message: "Error: node '<nodename>' does not appear to exist in configuration"

Expected results: Node is put into standby mode


Additional info: "crm_standby --node <nodename> -v on" works, and can be used as a workaround

Comment 1 Ivan Devat 2016-02-23 06:29:35 UTC

Created attachment 1129625 [details]
proposed fix

Comment 2 Ivan Devat 2016-02-23 07:41:03 UTC

Test:

Setup cluster and prepare remote node.
[vm-rhel72-1 ~] $ pcs resource create remote-node ocf:pacemaker:remote server="vm-rhel72-2"

Before Fix:
[vm-rhel72-1 ~] $ pcs cluster standby remote-node                                                                      
Error: node 'remote-node' does not appear to exist in configuration
[vm-rhel72-1 ~] $ pcs status nodes|grep remote-node
 Online: remote-node 
 
After Fix:
[vm-rhel72-1 ~] $ pcs cluster standby remote-node                                                                      
[vm-rhel72-1 ~] $ pcs status nodes|grep remote-node
 Standby: remote-node
 
 
There are other commands that also suffered from a similar problem:


1) pcs stonith level add + pcs stonith level verify
Before Fix:
[vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing
Error: remote-node is not currently a node (use --force to override)
[vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing --force
[vm-rhel72-1 ~] $ pcs stonith show
 xvm-fencing    (stonith:fence_xvm):    Started vm-rhel72-3
 Node: remote-node
  Level 1 - xvm-fencing
[vm-rhel72-1 ~] $ pcs stonith level verify
Error: remote-node is not currently a node
  
After Fix:
[vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing
[vm-rhel72-1 ~] $ pcs stonith show
 xvm-fencing    (stonith:fence_xvm):    Started vm-rhel72-3
 Node: remote-node
  Level 1 - xvm-fencing
[vm-rhel72-1 ~] $ pcs stonith level verify
                                        

2) pcs node (un)maintenance
Before Fix:
[vm-rhel72-1 ~] $ pcs node maintenance remote-node 
Error: Node 'remote-node' does not appear to exist in configuration
[vm-rhel72-1 ~] $ pcs node unmaintenance remote-node 
Error: Node 'remote-node' does not appear to exist in configuration

After Fix:
[vm-rhel72-1 ~] $ pcs node maintenance remote-node 
[vm-rhel72-1 ~] $ pcs status|grep remote-node:
RemoteNode remote-node: maintenance
[vm-rhel72-1 ~] $ pcs node unmaintenance remote-node 
[vm-rhel72-1 ~] $ pcs status|grep RemoteOnline:
RemoteOnline: [ remote-node ]

Comment 3 Mike McCune 2016-03-28 22:42:18 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 4 Ivan Devat 2016-05-31 12:01:42 UTC

Setup cluster and prepare remote node. For reproducing use fresh cib - when remote node is in cib, problem does not appear.

Before all:
[vm-rhel72-1 ~] $ pcs cluster setup --name devcluster vm-rhel72-1 vm-rhel72-3 --start
[vm-rhel72-1 ~] $ pcs resource create remote-node ocf:pacemaker:remote server="vm-rhel72-2"

1) "standby"
Before fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.143-15.el7.x86_64

[vm-rhel72-1 ~] $ pcs cluster standby remote-node
Error: node 'remote-node' does not appear to exist in configuration
[vm-rhel72-1 ~] $ pcs status nodes|grep remote-node
 Online: remote-node

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs 
pcs-0.9.151-1.el7.x86_64

[vm-rhel72-1 ~] $ pcs cluster standby remote-node                                                                      
[vm-rhel72-1 ~] $ pcs status nodes|grep remote-node
 Standby: remote-node

 
2) "stonith"
Before fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.143-15.el7.x86_64

[vm-rhel72-1 ~] $ pcs stonith create xvm-fencing fence_xvm pcmk_host_list="vm-rhel72-1 vm-rhel72-3"
[vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing
Error: remote-node is not currently a node (use --force to override)
[vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing --force
[vm-rhel72-1 ~] $ pcs stonith show
 xvm-fencing    (stonith:fence_xvm):    Started vm-rhel72-3
 Node: remote-node
  Level 1 - xvm-fencing
[vm-rhel72-1 ~] $ pcs stonith level verify
Error: remote-node is not currently a node

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs 
pcs-0.9.151-1.el7.x86_64
[vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing
[vm-rhel72-1 ~] $ pcs stonith show
 xvm-fencing    (stonith:fence_xvm):    Started vm-rhel72-3
 Node: remote-node
  Level 1 - xvm-fencing
[vm-rhel72-1 ~] $ pcs stonith level verify


3) "maintenance"
Before fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.143-15.el7.x86_64

Command "pcs node maintenance" does not exists in this package.

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs 
pcs-0.9.151-1.el7.x86_64
vm-rhel72-1 ~] $ pcs node maintenance remote-node
[vm-rhel72-1 ~] $ pcs status|grep remote-node:
RemoteNode remote-node: maintenance
[vm-rhel72-1 ~] $ pcs node unmaintenance remote-node
[vm-rhel72-1 ~] $ pcs status|grep RemoteOnline:
RemoteOnline: [ remote-node ]

Comment 9 Tomas Jelinek 2016-09-07 14:22:58 UTC

It took me a while to figure this out because it has been working just fine for me. This is what is going on:

Let's have three nodes with FQDNs rh72-node1, rh72-node2 and rh72-node3. The first and second are full-fledged nodes and the third one is used as a remote node.

If the remote node has been created like this
# pcs resource create rh72-node3 ocf:pacemaker:remote
everything works.

However it does not work if the node has been created like this
# pcs resource create remote-node3 ocf:pacemaker:remote server=rh72-node3

This is what happens:
[root@rh72-node3:~]# pcs node standby --debug
Running: /usr/sbin/crm_standby -v on
Finished running: /usr/sbin/crm_standby -v on
Return value: 1
--Debug Output Start--
Could not map name=rh72-node3 to a UUID
--Debug Output End--
Error: Could not map name=rh72-node3 to a UUID

This works perfectly fine on a full-fledged node but obviously not on remotes. OK, so we need to fill in the node name in pcs to make it work. Let's test it manually beforehand to see if it works:
[root@rh72-node3:~]# crm_node -n
rh72-node3
[root@rh72-node3:~]# crm_standby -v on -N rh72-node3
Could not map name=rh72-node3 to a UUID
[root@rh72-node3:~]# echo $?
1
[root@rh72-node3:~]# crm_mon -1bD
Online: [ rh72-node1 rh72-node2 ]
RemoteOnline: [ remote-node3 ]

Active resources:
<snipped>

This does not seem to be the correct way to get remote node name. Maybe we can get node id and then the name for the id:
[root@rh72-node3:~]# crm_node -i
[root@rh72-node3:~]# echo $?
1
No, that does not work either.

If I put the right node name, it works:
[root@rh72-node3:~]# crm_standby -v on -N remote-node3
[root@rh72-node3:~]# echo $?
0
[root@rh72-node3:~]# crm_mon -1bD
RemoteNode remote-node3: standby
Online: [ rh72-node1 rh72-node2 ]

Active resources:
<snipped>

But how to get the right name? The only other option I can think about is to look for an ocf:pacemaker:remote resource with server=<output of cmr_node -n>. It feels a little bit clumsy to me because there are other ways to create remote nodes (e.g. by remote-node meta attribute) and we would need to check them all. I would like getting the remote node name directly from pacemaker much better. The best would be when the name could be completely omitted as it is with full-fledged nodes.


I could really use some help from the pacemaker team here.

Comment 10 Ken Gaillot 2016-09-07 21:59:21 UTC

Ah, I didn't think about the node name vs uname issue.

We've had this sort of thing come up before, and I think it will require some changes on pacemaker's side. "crm_node -n" needs to return the right name on remote nodes, and crm_attribute (which crm_standby is just a wrapper for) needs to determine the local node name properly. So I suppose we need to clone this bz for pacemaker.

By the way, full cluster nodes can have a node name different from their uname, too (via "name:" in corosync.conf). I'm guessing crm_standby and crm_node detect the correct name in that case, so pcs doesn't have the same problem there.

Comment 11 Tomas Jelinek 2016-09-12 08:43:51 UTC

Moving to ON_QA as the bug is in pacemaker and cannot be fixed or worked around in pcs.

Comment 14 errata-xmlrpc 2016-11-03 20:56:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2596.html

Note You need to log in before you can comment on or make changes to this bug.