Bug 544482 - token timeout should be smaller then consensus timeout
Summary: token timeout should be smaller then consensus timeout
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman
Version: 5.4
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
: 542018 (view as bug list)
Depends On:
Blocks: 567538 567539
TreeView+ depends on / blocked
 
Reported: 2009-12-05 00:27 UTC by Steven Dake
Modified: 2016-04-26 14:53 UTC (History)
12 users (show)

Fixed In Version: cman-2.0.115-24.el5.src.rpm
Doc Type: Bug Fix
Doc Text:
Clone Of: 544479
Environment:
Last Closed: 2010-03-30 08:38:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The RHEL5 patch (933 bytes, patch)
2009-12-09 16:35 UTC, Christine Caulfield
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2010:0266 0 normal SHIPPED_LIVE cman bug fix and enhancement update 2010-03-29 12:54:44 UTC

Description Steven Dake 2009-12-05 00:27:50 UTC
+++ This bug was initially created as a clone of Bug #544479 +++

Description of problem:
token timeout should be smaller then consensus timeout.  If this is not the case, it is possible for totem  to "split-brain" itself under rare circumstances, especially with larger node counts, resulting in a "disallowed node state" behavior.

During membership, nodes which have not reached consensus are added to a "failed" list when consensus timer expires.

During the membership protocol, it is possible for a node to achieve consensus.  When this happens, there is a check to ignore new join messages with older ring sequence ids then the newly requested ring id.  These new join messages may contain information that is desireable to have in the new membership and the other processors will not form consensus until the processor that is in COMMIT has attempted to form consensus by exchanging join messages.

Unfortunately, the only thing that takes a processor out of commit is if it receives its commit token again, or the token timeout expires.   Under extremely rare circumstances, it is possible for a processor to reject the commit token because it doesn't match its view of membership.

Before accepting a commit token, the processor accepting the commit token verifies the proposed membership matches that of its internal membership.  If at any time after the commit token is created and originated, one of the processors in the commit membership receives a join message (while it is still in the gather state), it will further reject the commit token.

When the token timeout period is 10 seconds (default fedora), a processor in the commit state rejects membership messages until the token timeout expires (because the commit token gets stuck at some processor rejecting it).  Unfortunately at the same time, some other processor has already expired its consensus timer which is 4.8 seconds.  Since the processors in the commit state for 10 seconds didn't participate in consensus gathering, it is determined "failed", delivering a failed processor confchg for every node that accepted the commit token.  It then detects a new processor and forms a proper configuration.

In testing 32 nodes with default parameters I could trigger this case about 1 in 30 times.  To test, I used cman_tool join; fenced; fence_tool join on each of the 32 nodes, then killed 7 nodes in the cluster, and repeated.

Comment 1 Steven Dake 2009-12-05 00:29:45 UTC
recommend increasing consensus timeout period to something like 12-15 seconds.

Comment 2 Steven Dake 2009-12-05 00:34:42 UTC
*** Bug 542018 has been marked as a duplicate of this bug. ***

Comment 3 Christine Caulfield 2009-12-09 16:35:34 UTC
Created attachment 377235 [details]
The RHEL5 patch

This is the RHEL5 version of the patch. Waiting for a 5.5 ack.

Comment 4 Christine Caulfield 2009-12-16 13:09:34 UTC
Committed to RHEL55 branch:

commit c3fd533042a15a684206439e51a5377528e8b709
Author: Christine Caulfield <ccaulfie>
Date:   Wed Dec 16 13:07:11 2009 +0000

    cman: Make consensus 2*token timeout

Comment 10 Jaroslav Kortus 2010-03-08 11:49:22 UTC
According to log files the consensus is now 2*token value, the ratio is kept after token value change. Consensus still can be set to value lower than 2*token, but this seems to be intentional to allow override.

Comment 12 errata-xmlrpc 2010-03-30 08:38:05 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0266.html

Comment 13 Binbin Wang 2010-06-28 09:21:15 UTC
actually for two nodes cluster, the consensus value can be very very very small.


Note You need to log in before you can comment on or make changes to this bug.