Bug 1127878 - support corosync's ability to unblock quorum
Summary: support corosync's ability to unblock quorum
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcs
Version: 7.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Tomas Jelinek
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1086233 1129713
Blocks: 1245264 1264566
TreeView+ depends on / blocked
 
Reported: 2014-08-07 18:08 UTC by David Vossel
Modified: 2015-09-18 20:11 UTC (History)
5 users (show)

Fixed In Version: pcs-0.9.130-1.el7
Doc Type: Bug Fix
Doc Text:
Cause: Customer sets up a cluster with wait_for_all=1. The whole cluster shuts down for some reason and then some nodes boot and some remain unavailable for some reason. Consequence: Cluster does not start any resources because it is inquorate. Fix: Create 'pcs cluster quorum unblock' command to cancel wait_for_all temporarily. Result: Cluster becomes quorate (provided there are enough nodes running) and starts resources.
Clone Of:
: 1245264 (view as bug list)
Environment:
Last Closed: 2015-03-05 09:20:29 UTC


Attachments (Terms of Use)
proposed fix (5.37 KB, patch)
2014-09-26 12:42 UTC, Tomas Jelinek
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0415 normal SHIPPED_LIVE pcs bug fix and enhancement update 2015-03-05 14:16:41 UTC

Description David Vossel 2014-08-07 18:08:21 UTC
Description of problem:

pcs needs to support the "unblock quorum" runtime feature that is being added corosync in this issue.

https://bugzilla.redhat.com/show_bug.cgi?id=1086233

This is necessary for situations where users know the cluster in inquorate, but are confident that the cluster should proceed with resource management regardless.

Comment 2 Christine Caulfield 2014-08-12 15:09:32 UTC
The feature has been added to votequorum, see bz#1086233 for details.

Comment 5 Tomas Jelinek 2014-09-26 12:42:46 UTC
Created attachment 941556 [details]
proposed fix

Test (corosync-2.3.3-3.el7 or better is needed):

1) Create a cluster with wait_for_all=1.
# pcs cluster setup rh70-node1 rh70-node2 rh70-node3 --name mycluster --enable --start --wait_for_all=1
# pcs stonith create xvm fence_xvm pcmk_host_list=rh70-node1,rh70-node2,rh70-node3
# pcs resource create dummy Dummy

2) Shut down all the nodes and start them again except one. Cluster won't be quorate and won't start the testing resource.
[root@rh70-node1:~]# pcs cluster stop --all
rh70-node1: Stopping Cluster...
rh70-node2: Stopping Cluster...
rh70-node3: Stopping Cluster...
[root@rh70-node1:~]# pcs cluster start
Starting Cluster...
[root@rh70-node2:~]# pcs cluster start
Starting Cluster...
[root@rh70-node1:~]# pcs status | grep quorum
Current DC: rh70-node1 (1) - partition WITHOUT quorum

3) Unblock quorum
[root@rh70-node1:~]# pcs cluster quorum unblock
Node: rh70-node3 confirmed fenced
Quorum unblocked
Waiting for nodes canceled

4) Verify cluster is quorate and running services
[root@rh70-node1:~]# pcs status | grep quorum
Current DC: rh70-node1 (1) - partition with quorum
[root@rh70-node1:~]# pcs resource
 dummy  (ocf::heartbeat:Dummy): Started

Comment 6 Tomas Jelinek 2014-09-26 14:58:01 UTC
Before Fix:
[root@rh70-node1 ~]# rpm -q pcs
pcs-0.9.115-32.el7.x86_64
[root@rh70-node1:~]# pcs cluster quorum unblock

Usage: pcs cluster [commands]...
Configure cluster for use with pacemaker
{output trimmed}


After Fix:
[root@rh70-node1:~]# rpm -q pcs
pcs-0.9.130-1.el7.x86_64
[root@rh70-node1:~]# pcs cluster setup rh70-node1 rh70-node2 --name cluster70 --enable --start --wait_for_all=1
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
rh70-node1: Succeeded
rh70-node2: Succeeded
Starting cluster on nodes: rh70-node1, rh70-node2...
rh70-node1: Starting Cluster...
rh70-node2: Starting Cluster...
rh70-node1: Cluster Enabled
rh70-node2: Cluster Enabled
[root@rh70-node1:~]# pcs stonith create xvm fence_xvm pcmk_host_list=rh70-node1,rh70-node2
[root@rh70-node1:~]# pcs resource create dummy Dummy
[root@rh70-node1:~]# pcs cluster stop --all
rh70-node1: Stopping Cluster...
rh70-node2: Stopping Cluster...
[root@rh70-node1:~]# pcs cluster start
Starting Cluster...
[root@rh70-node1:~]# pcs status | grep quorum
Current DC: rh70-node1 (1) - partition WITHOUT quorum
[root@rh70-node1:~]# pcs cluster quorum unblock
Node: rh70-node2 confirmed fenced
Quorum unblocked
Waiting for nodes canceled
[root@rh70-node1:~]# pcs status | grep quorum
Current DC: rh70-node1 (1) - partition with quorum
[root@rh70-node1:~]# pcs resource
 dummy  (ocf::heartbeat:Dummy): Started

Comment 10 errata-xmlrpc 2015-03-05 09:20:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0415.html


Note You need to log in before you can comment on or make changes to this bug.