Red Hat Bugzilla – Bug 1186692
cluster node removal should verify possible loss of quorum
Last modified: 2015-11-19 04:34:26 EST
> Description of problem: In bug 1180506 we've added a warning when stopping a node could cause loss of quorum. We should add the same warning for removing nodes, such as in a following scenario: 1. Have a 5-node cluster with 2 of the nodes being stopped 2. Remove one of the running nodes 3. Enjoy the loss of quorum without a warning > Version-Release number of selected component (if applicable): pcs-0.9.137-13.el7 > Actual results: Loss of quorum. > Expected results: Similar warning message as in stopping a node: "Error: Stopping the node(s) will cause a loss of the quorum, use --force to override"
Created attachment 997107 [details] proposed fix
Test: [root@rh70-node1:~]# pcs status nodes both Corosync Nodes: Online: rh70-node1 rh70-node2 rh70-node3 Offline: Pacemaker Nodes: Online: rh70-node1 rh70-node2 rh70-node3 Standby: Offline: [root@rh70-node1:~]# pcs cluster stop rh70-node2 rh70-node2: Stopping Cluster (pacemaker)... rh70-node2: Stopping Cluster (corosync)... [root@rh70-node1:~]# pcs cluster node remove rh70-node3 Error: Removing the node will cause a loss of the quorum, use --force to override [root@rh70-node1:~]# echo $? 1 [root@rh70-node1:~]# pcs status nodes both Corosync Nodes: Online: rh70-node1 rh70-node3 Offline: rh70-node2 Pacemaker Nodes: Online: rh70-node1 rh70-node3 Standby: Offline: rh70-node2 [root@rh70-node1:~]# pcs cluster node remove rh70-node3 --force rh70-node3: Stopping Cluster (pacemaker)... rh70-node3: Successfully destroyed cluster rh70-node1: Corosync updated rh70-node2: Corosync updated [root@rh70-node1:~]# echo $? 0 [root@rh70-node1:~]# pcs status nodes both Corosync Nodes: Online: rh70-node1 Offline: rh70-node2 Pacemaker Nodes: Online: rh70-node1 Standby: Offline: rh70-node2
Before Fix: [root@rh71-node1 ~]# rpm -q pcs pcs-0.9.137-13.el7_1.2.x86_64 [root@rh71-node1:~]# pcs status nodes both Corosync Nodes: Online: rh71-node1 rh71-node2 rh71-node3 Offline: Pacemaker Nodes: Online: rh71-node1 rh71-node2 rh71-node3 Standby: Offline: [root@rh71-node1:~]# pcs cluster stop rh71-node2 rh71-node2: Stopping Cluster (pacemaker)... rh71-node2: Stopping Cluster (corosync)... [root@rh71-node1:~]# pcs cluster node remove rh71-node3 rh71-node3: Stopping Cluster (pacemaker)... rh71-node3: Successfully destroyed cluster rh71-node1: Corosync updated rh71-node2: Corosync updated After Fix: [root@rh71-node1:~]# rpm -q pcs pcs-0.9.140-1.el6.x86_64 [root@rh71-node1:~]# pcs status nodes both Corosync Nodes: Online: rh71-node1 rh71-node2 rh71-node3 Offline: Pacemaker Nodes: Online: rh71-node1 rh71-node2 rh71-node3 Standby: Offline: [root@rh71-node1:~]# pcs cluster stop rh71-node2 rh71-node2: Stopping Cluster (pacemaker)... rh71-node2: Stopping Cluster (corosync)... [root@rh71-node1:~]# pcs cluster node remove rh71-node3 Error: Removing the node will cause a loss of the quorum, use --force to override [root@rh71-node1:~]# echo $? 1 [root@rh71-node1:~]# pcs status nodes both Corosync Nodes: Online: rh71-node1 rh71-node3 Offline: rh71-node2 Pacemaker Nodes: Online: rh71-node1 rh71-node3 Standby: Offline: rh71-node2 [root@rh71-node1:~]# pcs cluster node remove rh71-node3 --force rh71-node3: Stopping Cluster (pacemaker)... rh71-node3: Successfully destroyed cluster rh71-node1: Corosync updated rh71-node2: Corosync updated [root@rh71-node1:~]# echo $? 0 [root@rh71-node1:~]# pcs status nodes both Corosync Nodes: Online: rh71-node1 Offline: rh71-node2 Pacemaker Nodes: Online: rh71-node1 Standby: Offline: rh71-node2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-2290.html