Bug 1186692

Summary: cluster node removal should verify possible loss of quorum
Product: Red Hat Enterprise Linux 7 Reporter: Radek Steiger <rsteiger>
Component: pcsAssignee: Tomas Jelinek <tojeline>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 7.1CC: cfeist, cluster-maint, tojeline
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcs-0.9.140-1.el7 Doc Type: Bug Fix
Doc Text:
Cause: User removes a node from a cluster where some nodes are not running. Consequence: Cluster loses a quorum. Fix: Detect whether removing a node will result in a loss of the quorum and do not remove the node if so. Result: User is informed that by removing the node the cluster will lose the quorum. User has to run the command with --force flag in order to remove the node.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 09:34:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1180506    
Bug Blocks:    
Attachments:
Description Flags
proposed fix none

Description Radek Steiger 2015-01-28 10:22:25 UTC
> Description of problem:

In bug 1180506 we've added a warning when stopping a node could cause loss of quorum. We should add the same warning for removing nodes, such as in a following scenario:

1. Have a 5-node cluster with 2 of the nodes being stopped
2. Remove one of the running nodes
3. Enjoy the loss of quorum without a warning


> Version-Release number of selected component (if applicable):

pcs-0.9.137-13.el7


> Actual results:

Loss of quorum.


> Expected results:

Similar warning message as in stopping a node:
"Error: Stopping the node(s) will cause a loss of the quorum, use --force to override"

Comment 1 Tomas Jelinek 2015-03-02 14:49:27 UTC
Created attachment 997107 [details]
proposed fix

Comment 2 Tomas Jelinek 2015-03-02 14:58:34 UTC
Test:

[root@rh70-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh70-node1 rh70-node2 rh70-node3 
 Offline: 
Pacemaker Nodes:
 Online: rh70-node1 rh70-node2 rh70-node3 
 Standby: 
 Offline: 
[root@rh70-node1:~]# pcs cluster stop rh70-node2
rh70-node2: Stopping Cluster (pacemaker)...
rh70-node2: Stopping Cluster (corosync)...
[root@rh70-node1:~]# pcs cluster node remove rh70-node3
Error: Removing the node will cause a loss of the quorum, use --force to override
[root@rh70-node1:~]# echo $?
1
[root@rh70-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh70-node1 rh70-node3 
 Offline: rh70-node2 
Pacemaker Nodes:
 Online: rh70-node1 rh70-node3 
 Standby: 
 Offline: rh70-node2 
[root@rh70-node1:~]# pcs cluster node remove rh70-node3 --force
rh70-node3: Stopping Cluster (pacemaker)...
rh70-node3: Successfully destroyed cluster
rh70-node1: Corosync updated
rh70-node2: Corosync updated
[root@rh70-node1:~]# echo $?
0
[root@rh70-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh70-node1 
 Offline: rh70-node2 
Pacemaker Nodes:
 Online: rh70-node1 
 Standby: 
 Offline: rh70-node2

Comment 4 Tomas Jelinek 2015-06-04 14:41:49 UTC
Before Fix:
[root@rh71-node1 ~]# rpm -q pcs
pcs-0.9.137-13.el7_1.2.x86_64
[root@rh71-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh71-node1 rh71-node2 rh71-node3 
 Offline: 
Pacemaker Nodes:
 Online: rh71-node1 rh71-node2 rh71-node3 
 Standby: 
 Offline: 
[root@rh71-node1:~]# pcs cluster stop rh71-node2
rh71-node2: Stopping Cluster (pacemaker)...
rh71-node2: Stopping Cluster (corosync)...
[root@rh71-node1:~]# pcs cluster node remove rh71-node3
rh71-node3: Stopping Cluster (pacemaker)...
rh71-node3: Successfully destroyed cluster
rh71-node1: Corosync updated
rh71-node2: Corosync updated


After Fix:
[root@rh71-node1:~]# rpm -q pcs
pcs-0.9.140-1.el6.x86_64
[root@rh71-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh71-node1 rh71-node2 rh71-node3 
 Offline: 
Pacemaker Nodes:
 Online: rh71-node1 rh71-node2 rh71-node3 
 Standby: 
 Offline: 
[root@rh71-node1:~]# pcs cluster stop rh71-node2
rh71-node2: Stopping Cluster (pacemaker)...
rh71-node2: Stopping Cluster (corosync)...
[root@rh71-node1:~]# pcs cluster node remove rh71-node3
Error: Removing the node will cause a loss of the quorum, use --force to override
[root@rh71-node1:~]# echo $?
1
[root@rh71-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh71-node1 rh71-node3 
 Offline: rh71-node2 
Pacemaker Nodes:
 Online: rh71-node1 rh71-node3 
 Standby: 
 Offline: rh71-node2 
[root@rh71-node1:~]# pcs cluster node remove rh71-node3 --force
rh71-node3: Stopping Cluster (pacemaker)...
rh71-node3: Successfully destroyed cluster
rh71-node1: Corosync updated
rh71-node2: Corosync updated
[root@rh71-node1:~]# echo $?
0
[root@rh71-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh71-node1 
 Offline: rh71-node2 
Pacemaker Nodes:
 Online: rh71-node1 
 Standby: 
 Offline: rh71-node2

Comment 8 errata-xmlrpc 2015-11-19 09:34:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2290.html