Bug 1260719 - auto_tie_breaker can create two quorate clusters
auto_tie_breaker can create two quorate clusters
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: corosync (Show other bugs)
Unspecified Unspecified
urgent Severity urgent
: rc
: ---
Assigned To: Jan Friesse
: ZStream
Depends On: 1229194
  Show dependency treegraph
Reported: 2015-09-07 09:59 EDT by Jan Kurik
Modified: 2015-09-16 05:39 EDT (History)
7 users (show)

See Also:
Fixed In Version: corosync-2.3.4-4.el7_1.3
Doc Type: Bug Fix
Doc Text:
Prior to this update, in clusters with an odd number of nodes that had the auto_tie_breaker option enabled, when one of the nodes failed, the remaining nodes were split 50:50. Consequently, auto_tie_breaker was not invoked and a random half of the cluster was fenced, rather than the half that did not contain the tie breaker node. With this update, the wait_for_all option is required for clusters with an odd number of nodes. As a result, the cluster half that does not contain the tie breaker node is now fenced in the described scenario.
Story Points: ---
Clone Of: 1229194
Last Closed: 2015-09-15 05:22:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
7.1.z-bz1260719-1-quorum-don-t-allow-quorum_trackstart-to-be-called-tw (3.74 KB, patch)
2015-09-07 10:26 EDT, Jan Friesse
no flags Details | Diff
7.1.z-bz1260719-2-votequorum-Fix-auto_tie_breaker-behaviour-in-odd-siz (3.56 KB, patch)
2015-09-07 10:27 EDT, Jan Friesse
no flags Details | Diff

  None (edit)
Description Jan Kurik 2015-09-07 09:59:45 EDT
This bug has been copied from bug #1229194 and has been proposed
to be backported to 7.1 z-stream (EUS).
Comment 4 Jan Friesse 2015-09-07 10:26:59 EDT
Created attachment 1071046 [details]

quorum: don't allow quorum_trackstart to be called twice

If quorum_trackstart() or votequorum_trackstart() are called twice with
CS_TRACK_CHANGES then the client gets added twice to the notifications
list effectively corrupting it. Users have reported segfaults in
corosync when they did this (by mistake!).

As there's already a tracking_enabled flag in the private-data, we check
that before adding to the list again and return an error if
the process is already registered.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Comment 5 Jan Friesse 2015-09-07 10:27:01 EDT
Created attachment 1071047 [details]

votequorum: Fix auto_tie_breaker behaviour in odd-sized clusters

auto_tie_breaker can behave incorrectly in the case of a cluster
with an odd number of nodes. It's possible for a partition to
have quorum while the other side has the ATB node, and both will
continue working. (Of course in a properly configured cluster one side
will be fenced but that becomes an indeterminate race .. just what ATB
is supposed to avoid).

This patch prevents ATB from running in a partition if the 'other'
partition might have quorum, and also mandates the use of wait_for_all
in clusters with an odd number of nodes so that a quorate partition
cannot start services or fence an existing partition with the tie
breaker node.

Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Comment 11 errata-xmlrpc 2015-09-15 05:22:05 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.