Bug 1186692

Summary:

cluster node removal should verify possible loss of quorum

Product:

Red Hat Enterprise Linux 7

Reporter:

Radek Steiger <rsteiger>

Component:

pcs

Assignee:

Tomas Jelinek <tojeline>

Status:

CLOSED ERRATA

QA Contact:

cluster-qe <cluster-qe>

Severity:

unspecified

Docs Contact:

Priority:

medium

Version:

7.1

CC:

cfeist, cluster-maint, tojeline

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

pcs-0.9.140-1.el7

Doc Type:

Bug Fix

Doc Text:

Cause: User removes a node from a cluster where some nodes are not running. Consequence: Cluster loses a quorum. Fix: Detect whether removing a node will result in a loss of the quorum and do not remove the node if so. Result: User is informed that by removing the node the cluster will lose the quorum. User has to run the command with --force flag in order to remove the node.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-11-19 09:34:26 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1180506

Bug Blocks:

Attachments:

Description	Flags
proposed fix	none

Description Radek Steiger 2015-01-28 10:22:25 UTC

> Description of problem:

In bug 1180506 we've added a warning when stopping a node could cause loss of quorum. We should add the same warning for removing nodes, such as in a following scenario:

1. Have a 5-node cluster with 2 of the nodes being stopped
2. Remove one of the running nodes
3. Enjoy the loss of quorum without a warning


> Version-Release number of selected component (if applicable):

pcs-0.9.137-13.el7


> Actual results:

Loss of quorum.


> Expected results:

Similar warning message as in stopping a node:
"Error: Stopping the node(s) will cause a loss of the quorum, use --force to override"

Comment 1 Tomas Jelinek 2015-03-02 14:49:27 UTC

Created attachment 997107 [details]
proposed fix

Comment 2 Tomas Jelinek 2015-03-02 14:58:34 UTC

Test:

[root@rh70-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh70-node1 rh70-node2 rh70-node3 
 Offline: 
Pacemaker Nodes:
 Online: rh70-node1 rh70-node2 rh70-node3 
 Standby: 
 Offline: 
[root@rh70-node1:~]# pcs cluster stop rh70-node2
rh70-node2: Stopping Cluster (pacemaker)...
rh70-node2: Stopping Cluster (corosync)...
[root@rh70-node1:~]# pcs cluster node remove rh70-node3
Error: Removing the node will cause a loss of the quorum, use --force to override
[root@rh70-node1:~]# echo $?
1
[root@rh70-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh70-node1 rh70-node3 
 Offline: rh70-node2 
Pacemaker Nodes:
 Online: rh70-node1 rh70-node3 
 Standby: 
 Offline: rh70-node2 
[root@rh70-node1:~]# pcs cluster node remove rh70-node3 --force
rh70-node3: Stopping Cluster (pacemaker)...
rh70-node3: Successfully destroyed cluster
rh70-node1: Corosync updated
rh70-node2: Corosync updated
[root@rh70-node1:~]# echo $?
0
[root@rh70-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh70-node1 
 Offline: rh70-node2 
Pacemaker Nodes:
 Online: rh70-node1 
 Standby: 
 Offline: rh70-node2

Comment 4 Tomas Jelinek 2015-06-04 14:41:49 UTC

Before Fix:
[root@rh71-node1 ~]# rpm -q pcs
pcs-0.9.137-13.el7_1.2.x86_64
[root@rh71-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh71-node1 rh71-node2 rh71-node3 
 Offline: 
Pacemaker Nodes:
 Online: rh71-node1 rh71-node2 rh71-node3 
 Standby: 
 Offline: 
[root@rh71-node1:~]# pcs cluster stop rh71-node2
rh71-node2: Stopping Cluster (pacemaker)...
rh71-node2: Stopping Cluster (corosync)...
[root@rh71-node1:~]# pcs cluster node remove rh71-node3
rh71-node3: Stopping Cluster (pacemaker)...
rh71-node3: Successfully destroyed cluster
rh71-node1: Corosync updated
rh71-node2: Corosync updated


After Fix:
[root@rh71-node1:~]# rpm -q pcs
pcs-0.9.140-1.el6.x86_64
[root@rh71-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh71-node1 rh71-node2 rh71-node3 
 Offline: 
Pacemaker Nodes:
 Online: rh71-node1 rh71-node2 rh71-node3 
 Standby: 
 Offline: 
[root@rh71-node1:~]# pcs cluster stop rh71-node2
rh71-node2: Stopping Cluster (pacemaker)...
rh71-node2: Stopping Cluster (corosync)...
[root@rh71-node1:~]# pcs cluster node remove rh71-node3
Error: Removing the node will cause a loss of the quorum, use --force to override
[root@rh71-node1:~]# echo $?
1
[root@rh71-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh71-node1 rh71-node3 
 Offline: rh71-node2 
Pacemaker Nodes:
 Online: rh71-node1 rh71-node3 
 Standby: 
 Offline: rh71-node2 
[root@rh71-node1:~]# pcs cluster node remove rh71-node3 --force
rh71-node3: Stopping Cluster (pacemaker)...
rh71-node3: Successfully destroyed cluster
rh71-node1: Corosync updated
rh71-node2: Corosync updated
[root@rh71-node1:~]# echo $?
0
[root@rh71-node1:~]# pcs status nodes both
Corosync Nodes:
 Online: rh71-node1 
 Offline: rh71-node2 
Pacemaker Nodes:
 Online: rh71-node1 
 Standby: 
 Offline: rh71-node2

Comment 8 errata-xmlrpc 2015-11-19 09:34:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2290.html