Bug 1127878
| Summary: | support corosync's ability to unblock quorum | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | David Vossel <dvossel> | ||||
| Component: | pcs | Assignee: | Tomas Jelinek <tojeline> | ||||
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.1 | CC: | ccaulfie, cluster-maint, fdinitto, rsteiger, tojeline | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | pcs-0.9.130-1.el7 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Cause:
Customer sets up a cluster with wait_for_all=1. The whole cluster shuts down for some reason and then some nodes boot and some remain unavailable for some reason.
Consequence:
Cluster does not start any resources because it is inquorate.
Fix:
Create 'pcs cluster quorum unblock' command to cancel wait_for_all temporarily.
Result:
Cluster becomes quorate (provided there are enough nodes running) and starts resources.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1245264 (view as bug list) | Environment: | |||||
| Last Closed: | 2015-03-05 09:20:29 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1086233, 1129713 | ||||||
| Bug Blocks: | 1245264, 1264566 | ||||||
| Attachments: |
|
||||||
|
Description
David Vossel
2014-08-07 18:08:21 UTC
The feature has been added to votequorum, see bz#1086233 for details. Created attachment 941556 [details]
proposed fix
Test (corosync-2.3.3-3.el7 or better is needed):
1) Create a cluster with wait_for_all=1.
# pcs cluster setup rh70-node1 rh70-node2 rh70-node3 --name mycluster --enable --start --wait_for_all=1
# pcs stonith create xvm fence_xvm pcmk_host_list=rh70-node1,rh70-node2,rh70-node3
# pcs resource create dummy Dummy
2) Shut down all the nodes and start them again except one. Cluster won't be quorate and won't start the testing resource.
[root@rh70-node1:~]# pcs cluster stop --all
rh70-node1: Stopping Cluster...
rh70-node2: Stopping Cluster...
rh70-node3: Stopping Cluster...
[root@rh70-node1:~]# pcs cluster start
Starting Cluster...
[root@rh70-node2:~]# pcs cluster start
Starting Cluster...
[root@rh70-node1:~]# pcs status | grep quorum
Current DC: rh70-node1 (1) - partition WITHOUT quorum
3) Unblock quorum
[root@rh70-node1:~]# pcs cluster quorum unblock
Node: rh70-node3 confirmed fenced
Quorum unblocked
Waiting for nodes canceled
4) Verify cluster is quorate and running services
[root@rh70-node1:~]# pcs status | grep quorum
Current DC: rh70-node1 (1) - partition with quorum
[root@rh70-node1:~]# pcs resource
dummy (ocf::heartbeat:Dummy): Started
Before Fix:
[root@rh70-node1 ~]# rpm -q pcs
pcs-0.9.115-32.el7.x86_64
[root@rh70-node1:~]# pcs cluster quorum unblock
Usage: pcs cluster [commands]...
Configure cluster for use with pacemaker
{output trimmed}
After Fix:
[root@rh70-node1:~]# rpm -q pcs
pcs-0.9.130-1.el7.x86_64
[root@rh70-node1:~]# pcs cluster setup rh70-node1 rh70-node2 --name cluster70 --enable --start --wait_for_all=1
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop pacemaker.service
Redirecting to /bin/systemctl stop corosync.service
Killing any remaining services...
Removing all cluster configuration files...
rh70-node1: Succeeded
rh70-node2: Succeeded
Starting cluster on nodes: rh70-node1, rh70-node2...
rh70-node1: Starting Cluster...
rh70-node2: Starting Cluster...
rh70-node1: Cluster Enabled
rh70-node2: Cluster Enabled
[root@rh70-node1:~]# pcs stonith create xvm fence_xvm pcmk_host_list=rh70-node1,rh70-node2
[root@rh70-node1:~]# pcs resource create dummy Dummy
[root@rh70-node1:~]# pcs cluster stop --all
rh70-node1: Stopping Cluster...
rh70-node2: Stopping Cluster...
[root@rh70-node1:~]# pcs cluster start
Starting Cluster...
[root@rh70-node1:~]# pcs status | grep quorum
Current DC: rh70-node1 (1) - partition WITHOUT quorum
[root@rh70-node1:~]# pcs cluster quorum unblock
Node: rh70-node2 confirmed fenced
Quorum unblocked
Waiting for nodes canceled
[root@rh70-node1:~]# pcs status | grep quorum
Current DC: rh70-node1 (1) - partition with quorum
[root@rh70-node1:~]# pcs resource
dummy (ocf::heartbeat:Dummy): Started
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0415.html |