Bug 1260719

Summary: auto_tie_breaker can create two quorate clusters
Product: Red Hat Enterprise Linux 7 Reporter: Jan Kurik <jkurik>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.1CC: ccaulfie, cluster-maint, fdinitto, jfriesse, jkortus, jsvarova, mjuricek
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: corosync-2.3.4-4.el7_1.3 Doc Type: Bug Fix
Doc Text:
Prior to this update, in clusters with an odd number of nodes that had the auto_tie_breaker option enabled, when one of the nodes failed, the remaining nodes were split 50:50. Consequently, auto_tie_breaker was not invoked and a random half of the cluster was fenced, rather than the half that did not contain the tie breaker node. With this update, the wait_for_all option is required for clusters with an odd number of nodes. As a result, the cluster half that does not contain the tie breaker node is now fenced in the described scenario.
Story Points: ---
Clone Of: 1229194 Environment:
Last Closed: 2015-09-15 09:22:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1229194    
Bug Blocks:    
Attachments:
Description Flags
7.1.z-bz1260719-1-quorum-don-t-allow-quorum_trackstart-to-be-called-tw
none
7.1.z-bz1260719-2-votequorum-Fix-auto_tie_breaker-behaviour-in-odd-siz none

Description Jan Kurik 2015-09-07 13:59:45 UTC
This bug has been copied from bug #1229194 and has been proposed
to be backported to 7.1 z-stream (EUS).

Comment 4 Jan Friesse 2015-09-07 14:26:59 UTC
Created attachment 1071046 [details]
7.1.z-bz1260719-1-quorum-don-t-allow-quorum_trackstart-to-be-called-tw

quorum: don't allow quorum_trackstart to be called twice

If quorum_trackstart() or votequorum_trackstart() are called twice with
CS_TRACK_CHANGES then the client gets added twice to the notifications
list effectively corrupting it. Users have reported segfaults in
corosync when they did this (by mistake!).

As there's already a tracking_enabled flag in the private-data, we check
that before adding to the list again and return an error if
the process is already registered.

Signed-off-by: Christine Caulfield <ccaulfie>
Reviewed-by: Jan Friesse <jfriesse>

Comment 5 Jan Friesse 2015-09-07 14:27:01 UTC
Created attachment 1071047 [details]
7.1.z-bz1260719-2-votequorum-Fix-auto_tie_breaker-behaviour-in-odd-siz

votequorum: Fix auto_tie_breaker behaviour in odd-sized clusters

auto_tie_breaker can behave incorrectly in the case of a cluster
with an odd number of nodes. It's possible for a partition to
have quorum while the other side has the ATB node, and both will
continue working. (Of course in a properly configured cluster one side
will be fenced but that becomes an indeterminate race .. just what ATB
is supposed to avoid).

This patch prevents ATB from running in a partition if the 'other'
partition might have quorum, and also mandates the use of wait_for_all
in clusters with an odd number of nodes so that a quorate partition
cannot start services or fence an existing partition with the tie
breaker node.

Signed-Off-By: Christine Caulfield <ccaulfie>
Reviewed-by: Jan Friesse <jfriesse>

Comment 11 errata-xmlrpc 2015-09-15 09:22:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1789.html