Bug 1290512

Summary:

pcs doesn't support putting Pacemaker Remote nodes into standby

Product:

Red Hat Enterprise Linux 7

Reporter:

Ken Gaillot <kgaillot>

Component:

pcs

Assignee:

Ivan Devat <idevat>

Status:

CLOSED ERRATA

QA Contact:

cluster-qe <cluster-qe>

Severity:

low

Docs Contact:

Priority:

medium

Version:

7.2

CC:

cfeist, cluster-maint, kgaillot, rsteiger, tojeline

Target Milestone:

Keywords:

FutureFeature

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

pcs-0.9.151-1.el7

Doc Type:

Bug Fix

Doc Text:

Cause: pcs does not recognize node when user is putting pacemaker remote node into standby mode Consequence: pcs refuse to put pacemaker remote node into standby mode Fix: include pacemaker remote nodes among nodes that are known Result: pcs put pacemaker remote nodes into standby mode

Story Points:

---

Clone Of:

Clones:

1374175 (view as bug list)

Environment:

Last Closed:

2016-11-03 20:56:06 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1374175

Bug Blocks:

Attachments:

Description	Flags
proposed fix	none

Description Ken Gaillot 2015-12-10 17:24:42 UTC

Description of problem: pcs doesn't support putting Pacemaker Remote nodes into standby mode


Version-Release number of selected component (if applicable): 0.9.143-15.el7


How reproducible: Consistently/easily


Steps to Reproduce:
1. Create a Pacemaker cluster.
2. Configure a Pacemaker Remote node in the cluster.
3. Try to put the Pacemaker Remote node into standby mode using "pcs cluster standby <nodename>".

Actual results: Error message: "Error: node '<nodename>' does not appear to exist in configuration"

Expected results: Node is put into standby mode


Additional info: "crm_standby --node <nodename> -v on" works, and can be used as a workaround

Comment 1 Ivan Devat 2016-02-23 06:29:35 UTC

Created attachment 1129625 [details]
proposed fix

Comment 2 Ivan Devat 2016-02-23 07:41:03 UTC

Test:

Setup cluster and prepare remote node.
[vm-rhel72-1 ~] $ pcs resource create remote-node ocf:pacemaker:remote server="vm-rhel72-2"

Before Fix:
[vm-rhel72-1 ~] $ pcs cluster standby remote-node                                                                      
Error: node 'remote-node' does not appear to exist in configuration
[vm-rhel72-1 ~] $ pcs status nodes|grep remote-node
 Online: remote-node 
 
After Fix:
[vm-rhel72-1 ~] $ pcs cluster standby remote-node                                                                      
[vm-rhel72-1 ~] $ pcs status nodes|grep remote-node
 Standby: remote-node
 
 
There are other commands that also suffered from a similar problem:


1) pcs stonith level add + pcs stonith level verify
Before Fix:
[vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing
Error: remote-node is not currently a node (use --force to override)
[vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing --force
[vm-rhel72-1 ~] $ pcs stonith show
 xvm-fencing    (stonith:fence_xvm):    Started vm-rhel72-3
 Node: remote-node
  Level 1 - xvm-fencing
[vm-rhel72-1 ~] $ pcs stonith level verify
Error: remote-node is not currently a node
  
After Fix:
[vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing
[vm-rhel72-1 ~] $ pcs stonith show
 xvm-fencing    (stonith:fence_xvm):    Started vm-rhel72-3
 Node: remote-node
  Level 1 - xvm-fencing
[vm-rhel72-1 ~] $ pcs stonith level verify
                                        

2) pcs node (un)maintenance
Before Fix:
[vm-rhel72-1 ~] $ pcs node maintenance remote-node 
Error: Node 'remote-node' does not appear to exist in configuration
[vm-rhel72-1 ~] $ pcs node unmaintenance remote-node 
Error: Node 'remote-node' does not appear to exist in configuration

After Fix:
[vm-rhel72-1 ~] $ pcs node maintenance remote-node 
[vm-rhel72-1 ~] $ pcs status|grep remote-node:
RemoteNode remote-node: maintenance
[vm-rhel72-1 ~] $ pcs node unmaintenance remote-node 
[vm-rhel72-1 ~] $ pcs status|grep RemoteOnline:
RemoteOnline: [ remote-node ]

Comment 3 Mike McCune 2016-03-28 22:42:18 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 4 Ivan Devat 2016-05-31 12:01:42 UTC

Setup cluster and prepare remote node. For reproducing use fresh cib - when remote node is in cib, problem does not appear.

Before all:
[vm-rhel72-1 ~] $ pcs cluster setup --name devcluster vm-rhel72-1 vm-rhel72-3 --start
[vm-rhel72-1 ~] $ pcs resource create remote-node ocf:pacemaker:remote server="vm-rhel72-2"

1) "standby"
Before fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.143-15.el7.x86_64

[vm-rhel72-1 ~] $ pcs cluster standby remote-node
Error: node 'remote-node' does not appear to exist in configuration
[vm-rhel72-1 ~] $ pcs status nodes|grep remote-node
 Online: remote-node

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs 
pcs-0.9.151-1.el7.x86_64

[vm-rhel72-1 ~] $ pcs cluster standby remote-node                                                                      
[vm-rhel72-1 ~] $ pcs status nodes|grep remote-node
 Standby: remote-node

 
2) "stonith"
Before fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.143-15.el7.x86_64

[vm-rhel72-1 ~] $ pcs stonith create xvm-fencing fence_xvm pcmk_host_list="vm-rhel72-1 vm-rhel72-3"
[vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing
Error: remote-node is not currently a node (use --force to override)
[vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing --force
[vm-rhel72-1 ~] $ pcs stonith show
 xvm-fencing    (stonith:fence_xvm):    Started vm-rhel72-3
 Node: remote-node
  Level 1 - xvm-fencing
[vm-rhel72-1 ~] $ pcs stonith level verify
Error: remote-node is not currently a node

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs 
pcs-0.9.151-1.el7.x86_64
[vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing
[vm-rhel72-1 ~] $ pcs stonith show
 xvm-fencing    (stonith:fence_xvm):    Started vm-rhel72-3
 Node: remote-node
  Level 1 - xvm-fencing
[vm-rhel72-1 ~] $ pcs stonith level verify


3) "maintenance"
Before fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.143-15.el7.x86_64

Command "pcs node maintenance" does not exists in this package.

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs 
pcs-0.9.151-1.el7.x86_64
vm-rhel72-1 ~] $ pcs node maintenance remote-node
[vm-rhel72-1 ~] $ pcs status|grep remote-node:
RemoteNode remote-node: maintenance
[vm-rhel72-1 ~] $ pcs node unmaintenance remote-node
[vm-rhel72-1 ~] $ pcs status|grep RemoteOnline:
RemoteOnline: [ remote-node ]

Comment 9 Tomas Jelinek 2016-09-07 14:22:58 UTC

It took me a while to figure this out because it has been working just fine for me. This is what is going on:

Let's have three nodes with FQDNs rh72-node1, rh72-node2 and rh72-node3. The first and second are full-fledged nodes and the third one is used as a remote node.

If the remote node has been created like this
# pcs resource create rh72-node3 ocf:pacemaker:remote
everything works.

However it does not work if the node has been created like this
# pcs resource create remote-node3 ocf:pacemaker:remote server=rh72-node3

This is what happens:
[root@rh72-node3:~]# pcs node standby --debug
Running: /usr/sbin/crm_standby -v on
Finished running: /usr/sbin/crm_standby -v on
Return value: 1
--Debug Output Start--
Could not map name=rh72-node3 to a UUID
--Debug Output End--
Error: Could not map name=rh72-node3 to a UUID

This works perfectly fine on a full-fledged node but obviously not on remotes. OK, so we need to fill in the node name in pcs to make it work. Let's test it manually beforehand to see if it works:
[root@rh72-node3:~]# crm_node -n
rh72-node3
[root@rh72-node3:~]# crm_standby -v on -N rh72-node3
Could not map name=rh72-node3 to a UUID
[root@rh72-node3:~]# echo $?
1
[root@rh72-node3:~]# crm_mon -1bD
Online: [ rh72-node1 rh72-node2 ]
RemoteOnline: [ remote-node3 ]

Active resources:
<snipped>

This does not seem to be the correct way to get remote node name. Maybe we can get node id and then the name for the id:
[root@rh72-node3:~]# crm_node -i
[root@rh72-node3:~]# echo $?
1
No, that does not work either.

If I put the right node name, it works:
[root@rh72-node3:~]# crm_standby -v on -N remote-node3
[root@rh72-node3:~]# echo $?
0
[root@rh72-node3:~]# crm_mon -1bD
RemoteNode remote-node3: standby
Online: [ rh72-node1 rh72-node2 ]

Active resources:
<snipped>

But how to get the right name? The only other option I can think about is to look for an ocf:pacemaker:remote resource with server=<output of cmr_node -n>. It feels a little bit clumsy to me because there are other ways to create remote nodes (e.g. by remote-node meta attribute) and we would need to check them all. I would like getting the remote node name directly from pacemaker much better. The best would be when the name could be completely omitted as it is with full-fledged nodes.


I could really use some help from the pacemaker team here.

Comment 10 Ken Gaillot 2016-09-07 21:59:21 UTC

Ah, I didn't think about the node name vs uname issue.

We've had this sort of thing come up before, and I think it will require some changes on pacemaker's side. "crm_node -n" needs to return the right name on remote nodes, and crm_attribute (which crm_standby is just a wrapper for) needs to determine the local node name properly. So I suppose we need to clone this bz for pacemaker.

By the way, full cluster nodes can have a node name different from their uname, too (via "name:" in corosync.conf). I'm guessing crm_standby and crm_node detect the correct name in that case, so pcs doesn't have the same problem there.

Comment 11 Tomas Jelinek 2016-09-12 08:43:51 UTC

Moving to ON_QA as the bug is in pacemaker and cannot be fixed or worked around in pcs.

Comment 14 errata-xmlrpc 2016-11-03 20:56:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2596.html