1127878 – support corosync's ability to unblock quorum

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1127878 - support corosync's ability to unblock quorum

Summary: support corosync's ability to unblock quorum

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pcs
Sub Component:
Version:	7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Tomas Jelinek
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1086233 1129713
Blocks:	1245264 1264566
TreeView+	depends on / blocked

Reported:	2014-08-07 18:08 UTC by David Vossel
Modified:	2015-09-18 20:11 UTC (History)
CC List:	5 users (show)
Fixed In Version:	pcs-0.9.130-1.el7
Doc Type:	Bug Fix
Doc Text:	Cause: Customer sets up a cluster with wait_for_all=1. The whole cluster shuts down for some reason and then some nodes boot and some remain unavailable for some reason. Consequence: Cluster does not start any resources because it is inquorate. Fix: Create 'pcs cluster quorum unblock' command to cancel wait_for_all temporarily. Result: Cluster becomes quorate (provided there are enough nodes running) and starts resources.
Clone Of:
Clones:	1245264 (view as bug list)
Environment:
Last Closed:	2015-03-05 09:20:29 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
proposed fix (5.37 KB, patch) 2014-09-26 12:42 UTC, Tomas Jelinek	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:0415	0	normal	SHIPPED_LIVE	pcs bug fix and enhancement update	2015-03-05 14:16:41 UTC

Description David Vossel 2014-08-07 18:08:21 UTC

Description of problem:

pcs needs to support the "unblock quorum" runtime feature that is being added corosync in this issue.

https://bugzilla.redhat.com/show_bug.cgi?id=1086233

This is necessary for situations where users know the cluster in inquorate, but are confident that the cluster should proceed with resource management regardless.

Comment 2 Christine Caulfield 2014-08-12 15:09:32 UTC

The feature has been added to votequorum, see bz#1086233 for details.

Comment 5 Tomas Jelinek 2014-09-26 12:42:46 UTC

Created attachment 941556 [details]
proposed fix

Test (corosync-2.3.3-3.el7 or better is needed):

1) Create a cluster with wait_for_all=1.
# pcs cluster setup rh70-node1 rh70-node2 rh70-node3 --name mycluster --enable --start --wait_for_all=1
# pcs stonith create xvm fence_xvm pcmk_host_list=rh70-node1,rh70-node2,rh70-node3
# pcs resource create dummy Dummy

2) Shut down all the nodes and start them again except one. Cluster won't be quorate and won't start the testing resource.
[root@rh70-node1:~]# pcs cluster stop --all
rh70-node1: Stopping Cluster...
rh70-node2: Stopping Cluster...
rh70-node3: Stopping Cluster...
[root@rh70-node1:~]# pcs cluster start
Starting Cluster...
[root@rh70-node2:~]# pcs cluster start
Starting Cluster...
[root@rh70-node1:~]# pcs status | grep quorum
Current DC: rh70-node1 (1) - partition WITHOUT quorum

3) Unblock quorum
[root@rh70-node1:~]# pcs cluster quorum unblock
Node: rh70-node3 confirmed fenced
Quorum unblocked
Waiting for nodes canceled

4) Verify cluster is quorate and running services
[root@rh70-node1:~]# pcs status | grep quorum
Current DC: rh70-node1 (1) - partition with quorum
[root@rh70-node1:~]# pcs resource
 dummy  (ocf::heartbeat:Dummy): Started

Comment 6 Tomas Jelinek 2014-09-26 14:58:01 UTC

Before Fix:
[root@rh70-node1 ~]# rpm -q pcs
pcs-0.9.115-32.el7.x86_64
[root@rh70-node1:~]# pcs cluster quorum unblock

Usage: pcs cluster [commands]...
Configure cluster for use with pacemaker
{output trimmed}


After Fix:
[root@rh70-node1:~]# rpm -q pcs
pcs-0.9.130-1.el7.x86_64
[root@rh70-node1:~]# pcs cluster setup rh70-node1 rh70-node2 --name cluster70 --enable --start --wait_for_all=1
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
rh70-node1: Succeeded
rh70-node2: Succeeded
Starting cluster on nodes: rh70-node1, rh70-node2...
rh70-node1: Starting Cluster...
rh70-node2: Starting Cluster...
rh70-node1: Cluster Enabled
rh70-node2: Cluster Enabled
[root@rh70-node1:~]# pcs stonith create xvm fence_xvm pcmk_host_list=rh70-node1,rh70-node2
[root@rh70-node1:~]# pcs resource create dummy Dummy
[root@rh70-node1:~]# pcs cluster stop --all
rh70-node1: Stopping Cluster...
rh70-node2: Stopping Cluster...
[root@rh70-node1:~]# pcs cluster start
Starting Cluster...
[root@rh70-node1:~]# pcs status | grep quorum
Current DC: rh70-node1 (1) - partition WITHOUT quorum
[root@rh70-node1:~]# pcs cluster quorum unblock
Node: rh70-node2 confirmed fenced
Quorum unblocked
Waiting for nodes canceled
[root@rh70-node1:~]# pcs status | grep quorum
Current DC: rh70-node1 (1) - partition with quorum
[root@rh70-node1:~]# pcs resource
 dummy  (ocf::heartbeat:Dummy): Started

Comment 10 errata-xmlrpc 2015-03-05 09:20:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0415.html

Note You need to log in before you can comment on or make changes to this bug.