Bug 544479 - token timeout should be smaller then consensus timeout
Summary: token timeout should be smaller then consensus timeout
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: cman
Version: rawhide
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-12-05 00:24 UTC by Steven Dake
Modified: 2016-04-26 13:23 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
: 544482 (view as bug list)
Environment:
Last Closed: 2010-03-02 14:41:46 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Steven Dake 2009-12-05 00:24:29 UTC
Description of problem:
token timeout should be smaller then consensus timeout.  If this is not the case, it is possible for totem  to "split-brain" itself under rare circumstances, especially with larger node counts, resulting in a "disallowed node state" behavior.

During membership, nodes which have not reached consensus are added to a "failed" list when consensus timer expires.

During the membership protocol, it is possible for a node to achieve consensus.  When this happens, there is a check to ignore new join messages with older ring sequence ids then the newly requested ring id.  These new join messages may contain information that is desireable to have in the new membership and the other processors will not form consensus until the processor that is in COMMIT has attempted to form consensus by exchanging join messages.

Unfortunately, the only thing that takes a processor out of commit is if it receives its commit token again, or the token timeout expires.   Under extremely rare circumstances, it is possible for a processor to reject the commit token because it doesn't match its view of membership.

Before accepting a commit token, the processor accepting the commit token verifies the proposed membership matches that of its internal membership.  If at any time after the commit token is created and originated, one of the processors in the commit membership receives a join message (while it is still in the gather state), it will further reject the commit token.

When the token timeout period is 10 seconds (default fedora), a processor in the commit state rejects membership messages until the token timeout expires (because the commit token gets stuck at some processor rejecting it).  Unfortunately at the same time, some other processor has already expired its consensus timer which is 4.8 seconds.  Since the processors in the commit state for 10 seconds didn't participate in consensus gathering, it is determined "failed", delivering a failed processor confchg for every node that accepted the commit token.  It then detects a new processor and forms a proper configuration.

In testing 32 nodes with default parameters I could trigger this case about 1 in 30 times.  To test, I used cman_tool join; fenced; fence_tool join on each of the 32 nodes, then killed 7 nodes in the cluster, and repeated.

Comment 1 Christine Caulfield 2009-12-07 09:49:53 UTC
commit 02a8b8872f59ac4933233aed31b3cfa39cda9db5
Author: Christine Caulfield <ccaulfie>
Date:   Mon Dec 7 09:46:05 2009 +0000

    cman: Make consensus twice token timeout


Note You need to log in before you can comment on or make changes to this bug.