Bug 1870449 - Increase TOTEM value in corosync.conf from 1 sec to 3 sec [NEEDINFO]
Summary: Increase TOTEM value in corosync.conf from 1 sec to 3 sec
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: corosync
Version: 8.0
Hardware: ppc64le
OS: Unspecified
Target Milestone: rc
: 8.0
Assignee: Jan Friesse
QA Contact: cluster-qe@redhat.com
Steven J. Levine
Depends On:
TreeView+ depends on / blocked
Reported: 2020-08-20 06:59 UTC by Shang Wu
Modified: 2021-05-18 15:26 UTC (History)
10 users (show)

Fixed In Version: corosync-3.1.0-1.el8
Doc Type: Bug Fix
Doc Text:
.Default token timeout value in `corosync.conf` file increased from 1 second to 3 seconds Previously, the TOTEM token timeout value in the `corosync.conf` file was set to 1 second. This short timeout makes the cluster react quickly but in the case of network delays it may result in premature failover. The default value is now set to 3 seconds to provide a better trade-off between quick response and broader applicability. For information on modifying the token timeout value, see link:https://access.redhat.com/solutions/221263[How to change totem token timeout value in a RHEL 5, 6, 7, or 8 High Availability cluster?]
Clone Of:
Last Closed: 2021-05-18 15:26:09 UTC
Type: Bug
Target Upstream Version:
jfriesse: needinfo? (shwu)
jfriesse: needinfo? (shwu)
jfriesse: needinfo?

Attachments (Terms of Use)

Description Shang Wu 2020-08-20 06:59:00 UTC
Description of problem:
Currently, totem value in corosync.conf file is default to 1 second. It causes the issue whenever the network switchover and the cluster node with virtual I/O Shared Ethernet Adapter (SEA) getting fenced.

The testing is done on the IBM Power E950 servers for SAP HANA workload.

The user has to edit the config manually and increase the value to 5 seconds to avoid the immediate failover.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Comment 1 Jan Friesse 2020-08-20 07:18:48 UTC
Choosing right token timeout is always trade off between failover speed and ability to survive slower network/higher load/...

Token timeout is computed as a "token + (number_of_nodes - 2) *  token_coefficient" where token is by default 1000ms and token_coefficient is by default 650ms. So final token timeout is not always 1sec but it depends on number of nodes.

No matter what, I'm not against setting different token timeout (actually, it would be like 2 line patch), but we need much deeper analysis. Do you have some hard numbers (like X% customers have these problems with N node cluster when token is not set to Y, ...)? Or it is happening just for this specific workload with this specific hw? If so, then kbase is probably better solution.

Also I think it needs deeper discussion, so I've added some needinfo to other people (Shane, Reid, Chrissie, Feist).

@Shane, @Reid: You may have some real life numbers. How many customers need increase token timeout manually?

@Chrissie: Just would like to know your opinion there

@Feist: You may also have some opinion there + if you are aware of other people to ask, please set needinfo for them

btw. AFAIK need of hand editing corosync.conf should solved quite soon in pcs which should allow editing token timeout (and many other options) directly.

Comment 2 Christine Caulfield 2020-08-20 08:02:00 UTC
It would be interesting/essential to hear tales from the field to see what user experience is actually like before making a definite decision on this.
If it's causing a problem for a reasonable number of people then we should change it (bearing in mind not to break things for existing customers), otherwise we're just making life difficult for ourselves.

The default used to be 5 seconds all the way up to (and including) the openais days, but I would hope that hardware has increased in speed enough to make the current 1 second (plus adjustments) reasonable for modern use - surely we should be aiming to increase responsiveness where possible.

Maybe there's an argument for an architecture-specific default - or would that end up being a support nightmare?

Comment 6 Jan Friesse 2020-08-24 07:28:29 UTC
Ok, I think message (and evidence) here is quite clear - increase default token timeout - so let's do it.

But we must carefully consider the value, because of upgrades. If we increase it too much then token "gets lost" from other nodes point of view (node with higher token timeout holds token for longer time - longer than other nodes token timeout).

I've made some tests and results are:
6000 - Token loss
5000 - Token is not lost but corosync with 1sec timeout displays warning ("Token has not been received in 750 ms") - simply because token is resend from node with 5sec timeout after roughly 900ms, so very close to 1sec node token timeout
4000 - Similar as 5000
3000 - No token loss, no warning

Also initial membership creation increases slightly because of how knet ping/pong_count mechanism works.

So I would suggest to set default to 3 sec so upgrades go smoothly. It is 3 times more than today so hopefully much better and allow smooth upgrade. And if we find it's still not enough, we can increase in later release.

Or 5sec is magic bullet and 3sec wouldn't be enough?

Comment 7 Jan Friesse 2020-08-24 14:44:13 UTC
Adding Klaus so he is aware.

Comment 19 Jan Friesse 2020-10-12 12:42:57 UTC
Upstream PR: https://github.com/corosync/corosync/pull/600

Comment 20 Jan Friesse 2020-10-12 13:04:31 UTC
For QA: I've tested two main areas:
- Default token timeout is really changed (check runtime.totem_config + reaction time when node stops)
- Clusters with mixed version (so different token timeouts) works reasonably well (they are not fenced because of token loss)

Comment 32 Patrik Hagara 2020-12-04 16:14:31 UTC
Tested on a 3-node cluster (relevant due to token_coefficient, see corosync.conf manual page).

before (corosync-3.0.3-4.el8)

> [root@virt-145 ~]# rpm -q corosync
> corosync-3.0.3-4.el8.x86_64
> [root@virt-145 ~]# corosync-cmapctl runtime.config.totem.token
> runtime.config.totem.token (u32) = 1650
> runtime.config.totem.token_retransmit (u32) = 392
> runtime.config.totem.token_retransmits_before_loss_const (u32) = 4
> runtime.config.totem.token_warning (u32) = 75

Result: the totem token timeout is 1000 ms (listed as 1650 ms at runtime on a 3-node cluster due to token_coefficient).

after (corosync-3.1.0-3.el8)

> [root@virt-492 ~]# rpm -q corosync
> corosync-3.1.0-3.el8.x86_64
> [root@virt-492 ~]# corosync-cmapctl runtime.config.totem.token
> runtime.config.totem.token (u32) = 3650
> runtime.config.totem.token_retransmit (u32) = 869
> runtime.config.totem.token_retransmits_before_loss_const (u32) = 4
> runtime.config.totem.token_warning (u32) = 75

Result: the totem token timeout now defaults to 3000 ms (listed as 3650 ms at runtime on a 3-node cluster due to token_coefficient).

Rolling upgrade test from 8.3 to 8.4 passed. Basic network failure recovery tests passed.

Comment 39 errata-xmlrpc 2021-05-18 15:26:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (corosync bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.