Bug 1260719
Summary: | auto_tie_breaker can create two quorate clusters | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jan Kurik <jkurik> | ||||||
Component: | corosync | Assignee: | Jan Friesse <jfriesse> | ||||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 7.1 | CC: | ccaulfie, cluster-maint, fdinitto, jfriesse, jkortus, jsvarova, mjuricek | ||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | corosync-2.3.4-4.el7_1.3 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Prior to this update, in clusters with an odd number of nodes that had the auto_tie_breaker option enabled, when one of the nodes failed, the remaining nodes were split 50:50. Consequently, auto_tie_breaker was not invoked and a random half of the cluster was fenced, rather than the half that did not contain the tie breaker node. With this update, the wait_for_all option is required for clusters with an odd number of nodes. As a result, the cluster half that does not contain the tie breaker node is now fenced in the described scenario.
|
Story Points: | --- | ||||||
Clone Of: | 1229194 | Environment: | |||||||
Last Closed: | 2015-09-15 09:22:05 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1229194 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Jan Kurik
2015-09-07 13:59:45 UTC
Created attachment 1071046 [details] 7.1.z-bz1260719-1-quorum-don-t-allow-quorum_trackstart-to-be-called-tw quorum: don't allow quorum_trackstart to be called twice If quorum_trackstart() or votequorum_trackstart() are called twice with CS_TRACK_CHANGES then the client gets added twice to the notifications list effectively corrupting it. Users have reported segfaults in corosync when they did this (by mistake!). As there's already a tracking_enabled flag in the private-data, we check that before adding to the list again and return an error if the process is already registered. Signed-off-by: Christine Caulfield <ccaulfie> Reviewed-by: Jan Friesse <jfriesse> Created attachment 1071047 [details] 7.1.z-bz1260719-2-votequorum-Fix-auto_tie_breaker-behaviour-in-odd-siz votequorum: Fix auto_tie_breaker behaviour in odd-sized clusters auto_tie_breaker can behave incorrectly in the case of a cluster with an odd number of nodes. It's possible for a partition to have quorum while the other side has the ATB node, and both will continue working. (Of course in a properly configured cluster one side will be fenced but that becomes an indeterminate race .. just what ATB is supposed to avoid). This patch prevents ATB from running in a partition if the 'other' partition might have quorum, and also mandates the use of wait_for_all in clusters with an odd number of nodes so that a quorate partition cannot start services or fence an existing partition with the tie breaker node. Signed-Off-By: Christine Caulfield <ccaulfie> Reviewed-by: Jan Friesse <jfriesse> Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1789.html |