Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1225423 - pcs should allow to remove a dead node from a cluster
pcs should allow to remove a dead node from a cluster
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcs (Show other bugs)
7.2
Unspecified Unspecified
urgent Severity urgent
: rc
: ---
Assigned To: Tomas Jelinek
cluster-qe@redhat.com
: ZStream
: 1376209 (view as bug list)
Depends On:
Blocks: 1305654 1382633
  Show dependency treegraph
 
Reported: 2015-05-27 06:41 EDT by Tomas Jelinek
Modified: 2016-11-03 16:54 EDT (History)
11 users (show)

See Also:
Fixed In Version: pcs-0.9.152-5.el7
Doc Type: Bug Fix
Doc Text:
Cause: User wants to remove a powered off node from a cluster. Consequence: Pcs does not remove the node as it cannot connect to it and remove the cluster configuration files from it. Fix: Skip removing configuration files from the node if the user used --force flag. Result: It is possible to remove a powered off node from the cluster.
Story Points: ---
Clone Of:
: 1382633 (view as bug list)
Environment:
Last Closed: 2016-11-03 16:54:10 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
proposed fix (2.96 KB, patch)
2016-07-19 11:12 EDT, Tomas Jelinek
no flags Details | Diff
proposed fix web UI (1.06 KB, patch)
2016-07-20 04:03 EDT, Tomas Jelinek
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2596 normal SHIPPED_LIVE Moderate: pcs security, bug fix, and enhancement update 2016-11-03 08:11:34 EDT

  None (edit)
Description Tomas Jelinek 2015-05-27 06:41:36 EDT
Description of problem:
It is not possible to remove a node from a cluster if pcsd is not running on the node or the node itself is not running.

Version-Release number of selected component (if applicable):
pcs-0.9.137-15.el7

How reproducible:
always

Steps to Reproduce:
1. create a cluster
2. shutdown a node
3. try to remove the node from the cluster using 'pcs cluster node remove <nodename>'

Actual results:
Error: pcsd is not running on <nodename>

Expected results:
Node is removed from the cluster. We probably want to warn user first and allow removal of the node only when --force switch is used.

Additional info:
workaround:
1. run 'pcs cluster localnode remove <nodename>' on all remaining nodes
2. run 'pcs cluster reload corosync' on one node
3. run 'crm_node -R <nodename> --force' on one node
Comment 3 Alexandre Maumené 2016-01-13 05:47:19 EST
Hi,

I also hit the bug, thanks for the workaround.

Regards,
Comment 4 Tomas Jelinek 2016-07-19 11:12 EDT
Created attachment 1181676 [details]
proposed fix

Test:

> Let's have a three node cluster
[root@rh72-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh72-node1 rh72-node2 rh72-node3
 Offline:
Pacemaker Nodes:
 Online: rh72-node1 rh72-node2 rh72-node3
 Standby:
 Maintenance:
 Offline:
Pacemaker Remote Nodes:
 Online:
 Standby:
 Maintenance:
 Offline:

> Power off one node ...
[root@rh72-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh72-node1 rh72-node2
 Offline: rh72-node3
Pacemaker Nodes:
 Online: rh72-node1 rh72-node2
 Standby:
 Maintenance:
 Offline: rh72-node3
Pacemaker Remote Nodes:
 Online:
 Standby:
 Maintenance:
 Offline:

> ... and remove it from the cluster
[root@rh72-node1:~]# pcs cluster node remove rh72-node3
Error: pcsd is not running on rh72-node3, use --force to override
[root@rh72-node1:~]# pcs cluster node remove rh72-node3 --force
rh72-node3: Unable to connect to rh72-node3 ([Errno 113] No route to host)
rh72-node3: Unable to connect to rh72-node3 ([Errno 113] No route to host)
Warning: unable to destroy cluster
rh72-node3: Unable to connect to rh72-node3 ([Errno 113] No route to host)
rh72-node2: Corosync updated
rh72-node1: Corosync updated
[root@rh72-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh72-node1 rh72-node2
 Offline:
Pacemaker Nodes:
 Online: rh72-node1 rh72-node2
 Standby:
 Maintenance:
 Offline:
Pacemaker Remote Nodes:
 Online:
 Standby:
 Maintenance:
 Offline:
Comment 5 Tomas Jelinek 2016-07-20 04:03 EDT
Created attachment 1181958 [details]
proposed fix web UI

fix for web UI
Comment 6 Ivan Devat 2016-07-28 09:39:26 EDT
Setup:
[vm-rhel72-1 ~] $ pcs status nodes both
Corosync Nodes:
 Online: vm-rhel72-1 vm-rhel72-2 vm-rhel72-3
 Offline:
Pacemaker Nodes:
 Online: vm-rhel72-1 vm-rhel72-2 vm-rhel72-3
 Standby:
 Maintenance:
 Offline:
Pacemaker Remote Nodes:
 Online:
 Standby:
 Maintenance:
 Offline:

> Power off one node ...
[vm-rhel72-1 ~] $ pcs status nodes both
Corosync Nodes:
 Online: vm-rhel72-1 vm-rhel72-3
 Offline: vm-rhel72-2
Pacemaker Nodes:
 Online: vm-rhel72-1 vm-rhel72-3
 Standby:
 Maintenance:
 Offline: vm-rhel72-2
Pacemaker Remote Nodes:
 Online:
 Standby:
 Maintenance:
 Offline:


Before Fix:
[vm-rhel72-1 ~] $ rpm -q pcs           
pcs-0.9.152-4.el7.x86_64
[vm-rhel72-1 ~] $ pcs cluster node remove vm-rhel72-2
Error: pcsd is not running on vm-rhel72-2

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-5.el7.x86_64
[vm-rhel72-1 ~] $ pcs cluster node remove vm-rhel72-2
Error: pcsd is not running on vm-rhel72-2, use --force to override
[vm-rhel72-1 ~] $ pcs cluster node remove vm-rhel72-2 --force
vm-rhel72-2: Unable to connect to vm-rhel72-2 ([Errno 111] Connection refused)
vm-rhel72-2: Unable to connect to vm-rhel72-2 ([Errno 111] Connection refused)
Warning: unable to destroy cluster
vm-rhel72-2: Unable to connect to vm-rhel72-2 ([Errno 111] Connection refused)
vm-rhel72-1: Corosync updated
vm-rhel72-3: Corosync updated
[vm-rhel72-1 ~] $ pcs status nodes both 
Corosync Nodes:
 Online: vm-rhel72-1 vm-rhel72-3
 Offline:
Pacemaker Nodes:
 Online: vm-rhel72-1 vm-rhel72-3
 Standby:
 Maintenance:
 Offline:
Pacemaker Remote Nodes:
 Online:
 Standby:
 Maintenance:
 Offline:
Comment 10 Tomas Jelinek 2016-10-18 10:46:09 EDT
*** Bug 1376209 has been marked as a duplicate of this bug. ***
Comment 12 errata-xmlrpc 2016-11-03 16:54:10 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2596.html

Note You need to log in before you can comment on or make changes to this bug.