Bug 1290512
Summary: | pcs doesn't support putting Pacemaker Remote nodes into standby | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Ken Gaillot <kgaillot> | ||||
Component: | pcs | Assignee: | Ivan Devat <idevat> | ||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.2 | CC: | cfeist, cluster-maint, kgaillot, rsteiger, tojeline | ||||
Target Milestone: | rc | Keywords: | FutureFeature | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | pcs-0.9.151-1.el7 | Doc Type: | Bug Fix | ||||
Doc Text: |
Cause:
pcs does not recognize node when user is putting pacemaker remote node into standby mode
Consequence:
pcs refuse to put pacemaker remote node into standby mode
Fix:
include pacemaker remote nodes among nodes that are known
Result:
pcs put pacemaker remote nodes into standby mode
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1374175 (view as bug list) | Environment: | |||||
Last Closed: | 2016-11-03 20:56:06 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1374175 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Ken Gaillot
2015-12-10 17:24:42 UTC
Created attachment 1129625 [details]
proposed fix
Test: Setup cluster and prepare remote node. [vm-rhel72-1 ~] $ pcs resource create remote-node ocf:pacemaker:remote server="vm-rhel72-2" Before Fix: [vm-rhel72-1 ~] $ pcs cluster standby remote-node Error: node 'remote-node' does not appear to exist in configuration [vm-rhel72-1 ~] $ pcs status nodes|grep remote-node Online: remote-node After Fix: [vm-rhel72-1 ~] $ pcs cluster standby remote-node [vm-rhel72-1 ~] $ pcs status nodes|grep remote-node Standby: remote-node There are other commands that also suffered from a similar problem: 1) pcs stonith level add + pcs stonith level verify Before Fix: [vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing Error: remote-node is not currently a node (use --force to override) [vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing --force [vm-rhel72-1 ~] $ pcs stonith show xvm-fencing (stonith:fence_xvm): Started vm-rhel72-3 Node: remote-node Level 1 - xvm-fencing [vm-rhel72-1 ~] $ pcs stonith level verify Error: remote-node is not currently a node After Fix: [vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing [vm-rhel72-1 ~] $ pcs stonith show xvm-fencing (stonith:fence_xvm): Started vm-rhel72-3 Node: remote-node Level 1 - xvm-fencing [vm-rhel72-1 ~] $ pcs stonith level verify 2) pcs node (un)maintenance Before Fix: [vm-rhel72-1 ~] $ pcs node maintenance remote-node Error: Node 'remote-node' does not appear to exist in configuration [vm-rhel72-1 ~] $ pcs node unmaintenance remote-node Error: Node 'remote-node' does not appear to exist in configuration After Fix: [vm-rhel72-1 ~] $ pcs node maintenance remote-node [vm-rhel72-1 ~] $ pcs status|grep remote-node: RemoteNode remote-node: maintenance [vm-rhel72-1 ~] $ pcs node unmaintenance remote-node [vm-rhel72-1 ~] $ pcs status|grep RemoteOnline: RemoteOnline: [ remote-node ] This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions Setup cluster and prepare remote node. For reproducing use fresh cib - when remote node is in cib, problem does not appear. Before all: [vm-rhel72-1 ~] $ pcs cluster setup --name devcluster vm-rhel72-1 vm-rhel72-3 --start [vm-rhel72-1 ~] $ pcs resource create remote-node ocf:pacemaker:remote server="vm-rhel72-2" 1) "standby" Before fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.143-15.el7.x86_64 [vm-rhel72-1 ~] $ pcs cluster standby remote-node Error: node 'remote-node' does not appear to exist in configuration [vm-rhel72-1 ~] $ pcs status nodes|grep remote-node Online: remote-node After Fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.151-1.el7.x86_64 [vm-rhel72-1 ~] $ pcs cluster standby remote-node [vm-rhel72-1 ~] $ pcs status nodes|grep remote-node Standby: remote-node 2) "stonith" Before fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.143-15.el7.x86_64 [vm-rhel72-1 ~] $ pcs stonith create xvm-fencing fence_xvm pcmk_host_list="vm-rhel72-1 vm-rhel72-3" [vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing Error: remote-node is not currently a node (use --force to override) [vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing --force [vm-rhel72-1 ~] $ pcs stonith show xvm-fencing (stonith:fence_xvm): Started vm-rhel72-3 Node: remote-node Level 1 - xvm-fencing [vm-rhel72-1 ~] $ pcs stonith level verify Error: remote-node is not currently a node After Fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.151-1.el7.x86_64 [vm-rhel72-1 ~] $ pcs stonith level add 1 remote-node xvm-fencing [vm-rhel72-1 ~] $ pcs stonith show xvm-fencing (stonith:fence_xvm): Started vm-rhel72-3 Node: remote-node Level 1 - xvm-fencing [vm-rhel72-1 ~] $ pcs stonith level verify 3) "maintenance" Before fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.143-15.el7.x86_64 Command "pcs node maintenance" does not exists in this package. After Fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.151-1.el7.x86_64 vm-rhel72-1 ~] $ pcs node maintenance remote-node [vm-rhel72-1 ~] $ pcs status|grep remote-node: RemoteNode remote-node: maintenance [vm-rhel72-1 ~] $ pcs node unmaintenance remote-node [vm-rhel72-1 ~] $ pcs status|grep RemoteOnline: RemoteOnline: [ remote-node ] It took me a while to figure this out because it has been working just fine for me. This is what is going on: Let's have three nodes with FQDNs rh72-node1, rh72-node2 and rh72-node3. The first and second are full-fledged nodes and the third one is used as a remote node. If the remote node has been created like this # pcs resource create rh72-node3 ocf:pacemaker:remote everything works. However it does not work if the node has been created like this # pcs resource create remote-node3 ocf:pacemaker:remote server=rh72-node3 This is what happens: [root@rh72-node3:~]# pcs node standby --debug Running: /usr/sbin/crm_standby -v on Finished running: /usr/sbin/crm_standby -v on Return value: 1 --Debug Output Start-- Could not map name=rh72-node3 to a UUID --Debug Output End-- Error: Could not map name=rh72-node3 to a UUID This works perfectly fine on a full-fledged node but obviously not on remotes. OK, so we need to fill in the node name in pcs to make it work. Let's test it manually beforehand to see if it works: [root@rh72-node3:~]# crm_node -n rh72-node3 [root@rh72-node3:~]# crm_standby -v on -N rh72-node3 Could not map name=rh72-node3 to a UUID [root@rh72-node3:~]# echo $? 1 [root@rh72-node3:~]# crm_mon -1bD Online: [ rh72-node1 rh72-node2 ] RemoteOnline: [ remote-node3 ] Active resources: <snipped> This does not seem to be the correct way to get remote node name. Maybe we can get node id and then the name for the id: [root@rh72-node3:~]# crm_node -i [root@rh72-node3:~]# echo $? 1 No, that does not work either. If I put the right node name, it works: [root@rh72-node3:~]# crm_standby -v on -N remote-node3 [root@rh72-node3:~]# echo $? 0 [root@rh72-node3:~]# crm_mon -1bD RemoteNode remote-node3: standby Online: [ rh72-node1 rh72-node2 ] Active resources: <snipped> But how to get the right name? The only other option I can think about is to look for an ocf:pacemaker:remote resource with server=<output of cmr_node -n>. It feels a little bit clumsy to me because there are other ways to create remote nodes (e.g. by remote-node meta attribute) and we would need to check them all. I would like getting the remote node name directly from pacemaker much better. The best would be when the name could be completely omitted as it is with full-fledged nodes. I could really use some help from the pacemaker team here. Ah, I didn't think about the node name vs uname issue. We've had this sort of thing come up before, and I think it will require some changes on pacemaker's side. "crm_node -n" needs to return the right name on remote nodes, and crm_attribute (which crm_standby is just a wrapper for) needs to determine the local node name properly. So I suppose we need to clone this bz for pacemaker. By the way, full cluster nodes can have a node name different from their uname, too (via "name:" in corosync.conf). I'm guessing crm_standby and crm_node detect the correct name in that case, so pcs doesn't have the same problem there. Moving to ON_QA as the bug is in pacemaker and cannot be fixed or worked around in pcs. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2596.html |