Bug 1229194
Summary: | auto_tie_breaker can create two quorate clusters | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Christine Caulfield <ccaulfie> | ||||||
Component: | corosync | Assignee: | Christine Caulfield <ccaulfie> | ||||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 7.1 | CC: | ccaulfie, cluster-maint, fdinitto, jfriesse, jkortus, jsvarova, mjuricek | ||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | corosync-2.3.4-6.el7 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Prior to this update, in clusters with an odd number of nodes that had the auto_tie_breaker option enabled, when one of the nodes failed, the remaining nodes were split 50:50. Consequently, auto_tie_breaker was not invoked and a random half of the cluster was fenced, rather than the half that did not contain the tie breaker node. With this update, the wait_for_all option is required for clusters with an odd number of nodes. As a result, the cluster half that does not contain the tie breaker node is now fenced in the described scenario.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1260719 (view as bug list) | Environment: | |||||||
Last Closed: | 2015-11-19 11:41:35 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1260719 | ||||||||
Attachments: |
|
Description
Christine Caulfield
2015-06-08 09:10:08 UTC
commit b9f5c290b7dedd0a677cdfc25db7dd111245a745 Author: Christine Caulfield <ccaulfie> Date: Thu Jun 18 09:57:59 2015 +0100 votequorum: Fix auto_tie_breaker behaviour in odd-sized clusters Created attachment 1041008 [details]
votequorum: Fix auto_tie_breaker behaviour in odd-sized clusters
votequorum: Fix auto_tie_breaker behaviour in odd-sized clusters
auto_tie_breaker can behave incorrectly in the case of a cluster
with an odd number of nodes. It's possible for a partition to
have quorum while the other side has the ATB node, and both will
continue working. (Of course in a properly configured cluster one side
will be fenced but that becomes an indeterminate race .. just what ATB
is supposed to avoid).
This patch prevents ATB from running in a partition if the 'other'
partition might have quorum, and also mandates the use of wait_for_all
in clusters with an odd number of nodes so that a quorate partition
cannot start services or fence an existing partition with the tie
breaker node.
Signed-Off-By: Christine Caulfield <ccaulfie>
Reviewed-by: Jan Friesse <jfriesse>
Created attachment 1041647 [details]
quorum: don't allow quorum_trackstart to be called twice
quorum: don't allow quorum_trackstart to be called twice
If quorum_trackstart() or votequorum_trackstart() are called twice with
CS_TRACK_CHANGES then the client gets added twice to the notifications
list effectively corrupting it. Users have reported segfaults in
corosync when they did this (by mistake!).
As there's already a tracking_enabled flag in the private-data, we check
that before adding to the list again and return an error if
the process is already registered.
Signed-off-by: Christine Caulfield <ccaulfie>
Reviewed-by: Jan Friesse <jfriesse>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2354.html |