Description of problem: cman currently sets the corosync parameter totem.token_retransmits_before_loss_const to 20. The reasons for this seem to be lost in the mists of time, but recent testing of openais timeouts suggest that it is unhelpful at best, and may actually impede operation of larger clusters With the normal cman defaults I could get a 32 node cluster up and running with no trouble. But reducing the token timeout much below that would cause failures. By removing this rogue value from cman and letting corosync calculate it, the token value could be reduced quite significantly and keep a stable cluster. That might not sound very useful in itself (though it always handy to reduce node failure detection times), but it could have implications for running normally configured clusters on busy LANs or systems. So I recommend we remove this configuration value.
So, Steve thinks this may be related to bug 623176. What did you mean by "reducing token timeout much below that" -- below what, 10,000ms ?
*** This bug has been marked as a duplicate of bug 623176 ***