Bug 1225423 - pcs should allow to remove a dead node from a cluster
Summary: pcs should allow to remove a dead node from a cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcs
Version: 7.2
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Tomas Jelinek
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
: 1376209 (view as bug list)
Depends On:
Blocks: 1305654 1382633
TreeView+ depends on / blocked
 
Reported: 2015-05-27 10:41 UTC by Tomas Jelinek
Modified: 2016-11-03 20:54 UTC (History)
11 users (show)

Fixed In Version: pcs-0.9.152-5.el7
Doc Type: Bug Fix
Doc Text:
Cause: User wants to remove a powered off node from a cluster. Consequence: Pcs does not remove the node as it cannot connect to it and remove the cluster configuration files from it. Fix: Skip removing configuration files from the node if the user used --force flag. Result: It is possible to remove a powered off node from the cluster.
Clone Of:
: 1382633 (view as bug list)
Environment:
Last Closed: 2016-11-03 20:54:10 UTC
Target Upstream Version:


Attachments (Terms of Use)
proposed fix (2.96 KB, patch)
2016-07-19 15:12 UTC, Tomas Jelinek
no flags Details | Diff
proposed fix web UI (1.06 KB, patch)
2016-07-20 08:03 UTC, Tomas Jelinek
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1376209 0 unspecified CLOSED Need a way to cleanup pacemaker resources when a node crashes or is offline. 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2016:2596 0 normal SHIPPED_LIVE Moderate: pcs security, bug fix, and enhancement update 2016-11-03 12:11:34 UTC

Internal Links: 1376209

Description Tomas Jelinek 2015-05-27 10:41:36 UTC
Description of problem:
It is not possible to remove a node from a cluster if pcsd is not running on the node or the node itself is not running.

Version-Release number of selected component (if applicable):
pcs-0.9.137-15.el7

How reproducible:
always

Steps to Reproduce:
1. create a cluster
2. shutdown a node
3. try to remove the node from the cluster using 'pcs cluster node remove <nodename>'

Actual results:
Error: pcsd is not running on <nodename>

Expected results:
Node is removed from the cluster. We probably want to warn user first and allow removal of the node only when --force switch is used.

Additional info:
workaround:
1. run 'pcs cluster localnode remove <nodename>' on all remaining nodes
2. run 'pcs cluster reload corosync' on one node
3. run 'crm_node -R <nodename> --force' on one node

Comment 3 Alexandre Maumené 2016-01-13 10:47:19 UTC
Hi,

I also hit the bug, thanks for the workaround.

Regards,

Comment 4 Tomas Jelinek 2016-07-19 15:12:18 UTC
Created attachment 1181676 [details]
proposed fix

Test:

> Let's have a three node cluster
[root@rh72-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh72-node1 rh72-node2 rh72-node3
 Offline:
Pacemaker Nodes:
 Online: rh72-node1 rh72-node2 rh72-node3
 Standby:
 Maintenance:
 Offline:
Pacemaker Remote Nodes:
 Online:
 Standby:
 Maintenance:
 Offline:

> Power off one node ...
[root@rh72-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh72-node1 rh72-node2
 Offline: rh72-node3
Pacemaker Nodes:
 Online: rh72-node1 rh72-node2
 Standby:
 Maintenance:
 Offline: rh72-node3
Pacemaker Remote Nodes:
 Online:
 Standby:
 Maintenance:
 Offline:

> ... and remove it from the cluster
[root@rh72-node1:~]# pcs cluster node remove rh72-node3
Error: pcsd is not running on rh72-node3, use --force to override
[root@rh72-node1:~]# pcs cluster node remove rh72-node3 --force
rh72-node3: Unable to connect to rh72-node3 ([Errno 113] No route to host)
rh72-node3: Unable to connect to rh72-node3 ([Errno 113] No route to host)
Warning: unable to destroy cluster
rh72-node3: Unable to connect to rh72-node3 ([Errno 113] No route to host)
rh72-node2: Corosync updated
rh72-node1: Corosync updated
[root@rh72-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh72-node1 rh72-node2
 Offline:
Pacemaker Nodes:
 Online: rh72-node1 rh72-node2
 Standby:
 Maintenance:
 Offline:
Pacemaker Remote Nodes:
 Online:
 Standby:
 Maintenance:
 Offline:

Comment 5 Tomas Jelinek 2016-07-20 08:03:02 UTC
Created attachment 1181958 [details]
proposed fix web UI

fix for web UI

Comment 6 Ivan Devat 2016-07-28 13:39:26 UTC
Setup:
[vm-rhel72-1 ~] $ pcs status nodes both
Corosync Nodes:
 Online: vm-rhel72-1 vm-rhel72-2 vm-rhel72-3
 Offline:
Pacemaker Nodes:
 Online: vm-rhel72-1 vm-rhel72-2 vm-rhel72-3
 Standby:
 Maintenance:
 Offline:
Pacemaker Remote Nodes:
 Online:
 Standby:
 Maintenance:
 Offline:

> Power off one node ...
[vm-rhel72-1 ~] $ pcs status nodes both
Corosync Nodes:
 Online: vm-rhel72-1 vm-rhel72-3
 Offline: vm-rhel72-2
Pacemaker Nodes:
 Online: vm-rhel72-1 vm-rhel72-3
 Standby:
 Maintenance:
 Offline: vm-rhel72-2
Pacemaker Remote Nodes:
 Online:
 Standby:
 Maintenance:
 Offline:


Before Fix:
[vm-rhel72-1 ~] $ rpm -q pcs           
pcs-0.9.152-4.el7.x86_64
[vm-rhel72-1 ~] $ pcs cluster node remove vm-rhel72-2
Error: pcsd is not running on vm-rhel72-2

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-5.el7.x86_64
[vm-rhel72-1 ~] $ pcs cluster node remove vm-rhel72-2
Error: pcsd is not running on vm-rhel72-2, use --force to override
[vm-rhel72-1 ~] $ pcs cluster node remove vm-rhel72-2 --force
vm-rhel72-2: Unable to connect to vm-rhel72-2 ([Errno 111] Connection refused)
vm-rhel72-2: Unable to connect to vm-rhel72-2 ([Errno 111] Connection refused)
Warning: unable to destroy cluster
vm-rhel72-2: Unable to connect to vm-rhel72-2 ([Errno 111] Connection refused)
vm-rhel72-1: Corosync updated
vm-rhel72-3: Corosync updated
[vm-rhel72-1 ~] $ pcs status nodes both 
Corosync Nodes:
 Online: vm-rhel72-1 vm-rhel72-3
 Offline:
Pacemaker Nodes:
 Online: vm-rhel72-1 vm-rhel72-3
 Standby:
 Maintenance:
 Offline:
Pacemaker Remote Nodes:
 Online:
 Standby:
 Maintenance:
 Offline:

Comment 10 Tomas Jelinek 2016-10-18 14:46:09 UTC
*** Bug 1376209 has been marked as a duplicate of this bug. ***

Comment 12 errata-xmlrpc 2016-11-03 20:54:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2596.html


Note You need to log in before you can comment on or make changes to this bug.