1870449 – Increase TOTEM value in corosync.conf from 1 sec to 3 sec

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1870449 - Increase TOTEM value in corosync.conf from 1 sec to 3 sec

Summary: Increase TOTEM value in corosync.conf from 1 sec to 3 sec

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	corosync
Sub Component:
Version:	8.0
Hardware:	ppc64le
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	8.0
Assignee:	Jan Friesse
QA Contact:	cluster-qe@redhat.com
Docs Contact:	Steven J. Levine
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-20 06:59 UTC by Shang Wu
Modified:	2023-09-15 00:46 UTC (History)
CC List:	10 users (show)
Fixed In Version:	corosync-3.1.0-1.el8
Doc Type:	Bug Fix
Doc Text:	.Default token timeout value in `corosync.conf` file increased from 1 second to 3 seconds Previously, the TOTEM token timeout value in the `corosync.conf` file was set to 1 second. This short timeout makes the cluster react quickly but in the case of network delays it may result in premature failover. The default value is now set to 3 seconds to provide a better trade-off between quick response and broader applicability. For information on modifying the token timeout value, see link:https://access.redhat.com/solutions/221263[How to change totem token timeout value in a RHEL 5, 6, 7, or 8 High Availability cluster?]
Clone Of:
Environment:
Last Closed:	2021-05-18 15:26:09 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Description Shang Wu 2020-08-20 06:59:00 UTC

Description of problem:
Currently, totem value in corosync.conf file is default to 1 second. It causes the issue whenever the network switchover and the cluster node with virtual I/O Shared Ethernet Adapter (SEA) getting fenced.

The testing is done on the IBM Power E950 servers for SAP HANA workload.

The user has to edit the config manually and increase the value to 5 seconds to avoid the immediate failover.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jan Friesse 2020-08-20 07:18:48 UTC

@Shang:
Choosing right token timeout is always trade off between failover speed and ability to survive slower network/higher load/...

Token timeout is computed as a "token + (number_of_nodes - 2) *  token_coefficient" where token is by default 1000ms and token_coefficient is by default 650ms. So final token timeout is not always 1sec but it depends on number of nodes.

No matter what, I'm not against setting different token timeout (actually, it would be like 2 line patch), but we need much deeper analysis. Do you have some hard numbers (like X% customers have these problems with N node cluster when token is not set to Y, ...)? Or it is happening just for this specific workload with this specific hw? If so, then kbase is probably better solution.

Also I think it needs deeper discussion, so I've added some needinfo to other people (Shane, Reid, Chrissie, Feist).

@Shane, @Reid: You may have some real life numbers. How many customers need increase token timeout manually?

@Chrissie: Just would like to know your opinion there

@Feist: You may also have some opinion there + if you are aware of other people to ask, please set needinfo for them

btw. AFAIK need of hand editing corosync.conf should solved quite soon in pcs which should allow editing token timeout (and many other options) directly.

Comment 2 Christine Caulfield 2020-08-20 08:02:00 UTC

It would be interesting/essential to hear tales from the field to see what user experience is actually like before making a definite decision on this.
If it's causing a problem for a reasonable number of people then we should change it (bearing in mind not to break things for existing customers), otherwise we're just making life difficult for ourselves.

The default used to be 5 seconds all the way up to (and including) the openais days, but I would hope that hardware has increased in speed enough to make the current 1 second (plus adjustments) reasonable for modern use - surely we should be aiming to increase responsiveness where possible.


Maybe there's an argument for an architecture-specific default - or would that end up being a support nightmare?

Comment 6 Jan Friesse 2020-08-24 07:28:29 UTC

Ok, I think message (and evidence) here is quite clear - increase default token timeout - so let's do it.

But we must carefully consider the value, because of upgrades. If we increase it too much then token "gets lost" from other nodes point of view (node with higher token timeout holds token for longer time - longer than other nodes token timeout).

I've made some tests and results are:
6000 - Token loss
5000 - Token is not lost but corosync with 1sec timeout displays warning ("Token has not been received in 750 ms") - simply because token is resend from node with 5sec timeout after roughly 900ms, so very close to 1sec node token timeout
4000 - Similar as 5000
3000 - No token loss, no warning

Also initial membership creation increases slightly because of how knet ping/pong_count mechanism works.

So I would suggest to set default to 3 sec so upgrades go smoothly. It is 3 times more than today so hopefully much better and allow smooth upgrade. And if we find it's still not enough, we can increase in later release.

Or 5sec is magic bullet and 3sec wouldn't be enough?

Comment 7 Jan Friesse 2020-08-24 14:44:13 UTC

Adding Klaus so he is aware.

Comment 19 Jan Friesse 2020-10-12 12:42:57 UTC

Upstream PR: https://github.com/corosync/corosync/pull/600

Comment 20 Jan Friesse 2020-10-12 13:04:31 UTC

For QA: I've tested two main areas:
- Default token timeout is really changed (check runtime.totem_config + reaction time when node stops)
- Clusters with mixed version (so different token timeouts) works reasonably well (they are not fenced because of token loss)

Comment 32 Patrik Hagara 2020-12-04 16:14:31 UTC

Tested on a 3-node cluster (relevant due to token_coefficient, see corosync.conf manual page).


before (corosync-3.0.3-4.el8)
=============================

> [root@virt-145 ~]# rpm -q corosync
> corosync-3.0.3-4.el8.x86_64
> [root@virt-145 ~]# corosync-cmapctl runtime.config.totem.token
> runtime.config.totem.token (u32) = 1650
> runtime.config.totem.token_retransmit (u32) = 392
> runtime.config.totem.token_retransmits_before_loss_const (u32) = 4
> runtime.config.totem.token_warning (u32) = 75

Result: the totem token timeout is 1000 ms (listed as 1650 ms at runtime on a 3-node cluster due to token_coefficient).


after (corosync-3.1.0-3.el8)
============================

> [root@virt-492 ~]# rpm -q corosync
> corosync-3.1.0-3.el8.x86_64
> [root@virt-492 ~]# corosync-cmapctl runtime.config.totem.token
> runtime.config.totem.token (u32) = 3650
> runtime.config.totem.token_retransmit (u32) = 869
> runtime.config.totem.token_retransmits_before_loss_const (u32) = 4
> runtime.config.totem.token_warning (u32) = 75

Result: the totem token timeout now defaults to 3000 ms (listed as 3650 ms at runtime on a 3-node cluster due to token_coefficient).


Rolling upgrade test from 8.3 to 8.4 passed. Basic network failure recovery tests passed.

Comment 39 errata-xmlrpc 2021-05-18 15:26:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (corosync bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1780

Comment 40 Red Hat Bugzilla 2023-09-15 00:46:38 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.