Bug 1870449
| Summary: | Increase TOTEM value in corosync.conf from 1 sec to 3 sec | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Shang Wu <shwu> |
| Component: | corosync | Assignee: | Jan Friesse <jfriesse> |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | high | Docs Contact: | Steven J. Levine <slevine> |
| Priority: | unspecified | ||
| Version: | 8.0 | CC: | ccaulfie, cfeist, cluster-maint, cnewsom, jfriesse, kgaillot, kwenning, nwahl, phagara, sbradley |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | 8.0 | ||
| Hardware: | ppc64le | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | corosync-3.1.0-1.el8 | Doc Type: | Bug Fix |
| Doc Text: |
.Default token timeout value in `corosync.conf` file increased from 1 second to 3 seconds
Previously, the TOTEM token timeout value in the `corosync.conf` file was set to 1 second. This short timeout makes the cluster react quickly but in the case of network delays it may result in premature failover. The default value is now set to 3 seconds to provide a better trade-off between quick response and broader applicability. For information on modifying the token timeout value, see link:https://access.redhat.com/solutions/221263[How to change totem token timeout value in a RHEL 5, 6, 7, or 8 High Availability cluster?]
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-18 15:26:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Shang Wu
2020-08-20 06:59:00 UTC
@Shang: Choosing right token timeout is always trade off between failover speed and ability to survive slower network/higher load/... Token timeout is computed as a "token + (number_of_nodes - 2) * token_coefficient" where token is by default 1000ms and token_coefficient is by default 650ms. So final token timeout is not always 1sec but it depends on number of nodes. No matter what, I'm not against setting different token timeout (actually, it would be like 2 line patch), but we need much deeper analysis. Do you have some hard numbers (like X% customers have these problems with N node cluster when token is not set to Y, ...)? Or it is happening just for this specific workload with this specific hw? If so, then kbase is probably better solution. Also I think it needs deeper discussion, so I've added some needinfo to other people (Shane, Reid, Chrissie, Feist). @Shane, @Reid: You may have some real life numbers. How many customers need increase token timeout manually? @Chrissie: Just would like to know your opinion there @Feist: You may also have some opinion there + if you are aware of other people to ask, please set needinfo for them btw. AFAIK need of hand editing corosync.conf should solved quite soon in pcs which should allow editing token timeout (and many other options) directly. It would be interesting/essential to hear tales from the field to see what user experience is actually like before making a definite decision on this. If it's causing a problem for a reasonable number of people then we should change it (bearing in mind not to break things for existing customers), otherwise we're just making life difficult for ourselves. The default used to be 5 seconds all the way up to (and including) the openais days, but I would hope that hardware has increased in speed enough to make the current 1 second (plus adjustments) reasonable for modern use - surely we should be aiming to increase responsiveness where possible. Maybe there's an argument for an architecture-specific default - or would that end up being a support nightmare? Ok, I think message (and evidence) here is quite clear - increase default token timeout - so let's do it.
But we must carefully consider the value, because of upgrades. If we increase it too much then token "gets lost" from other nodes point of view (node with higher token timeout holds token for longer time - longer than other nodes token timeout).
I've made some tests and results are:
6000 - Token loss
5000 - Token is not lost but corosync with 1sec timeout displays warning ("Token has not been received in 750 ms") - simply because token is resend from node with 5sec timeout after roughly 900ms, so very close to 1sec node token timeout
4000 - Similar as 5000
3000 - No token loss, no warning
Also initial membership creation increases slightly because of how knet ping/pong_count mechanism works.
So I would suggest to set default to 3 sec so upgrades go smoothly. It is 3 times more than today so hopefully much better and allow smooth upgrade. And if we find it's still not enough, we can increase in later release.
Or 5sec is magic bullet and 3sec wouldn't be enough?
Adding Klaus so he is aware. Upstream PR: https://github.com/corosync/corosync/pull/600 For QA: I've tested two main areas: - Default token timeout is really changed (check runtime.totem_config + reaction time when node stops) - Clusters with mixed version (so different token timeouts) works reasonably well (they are not fenced because of token loss) Tested on a 3-node cluster (relevant due to token_coefficient, see corosync.conf manual page). before (corosync-3.0.3-4.el8) ============================= > [root@virt-145 ~]# rpm -q corosync > corosync-3.0.3-4.el8.x86_64 > [root@virt-145 ~]# corosync-cmapctl runtime.config.totem.token > runtime.config.totem.token (u32) = 1650 > runtime.config.totem.token_retransmit (u32) = 392 > runtime.config.totem.token_retransmits_before_loss_const (u32) = 4 > runtime.config.totem.token_warning (u32) = 75 Result: the totem token timeout is 1000 ms (listed as 1650 ms at runtime on a 3-node cluster due to token_coefficient). after (corosync-3.1.0-3.el8) ============================ > [root@virt-492 ~]# rpm -q corosync > corosync-3.1.0-3.el8.x86_64 > [root@virt-492 ~]# corosync-cmapctl runtime.config.totem.token > runtime.config.totem.token (u32) = 3650 > runtime.config.totem.token_retransmit (u32) = 869 > runtime.config.totem.token_retransmits_before_loss_const (u32) = 4 > runtime.config.totem.token_warning (u32) = 75 Result: the totem token timeout now defaults to 3000 ms (listed as 3650 ms at runtime on a 3-node cluster due to token_coefficient). Rolling upgrade test from 8.3 to 8.4 passed. Basic network failure recovery tests passed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (corosync bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1780 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |