RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1774143 - [Support RFE] Make it easier to raise corosync totem token
Summary: [Support RFE] Make it easier to raise corosync totem token
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: pcs
Version: 8.3
Hardware: All
OS: Linux
high
medium
Target Milestone: rc
: 8.4
Assignee: Ondrej Mular
QA Contact: cluster-qe@redhat.com
Steven J. Levine
URL:
Whiteboard:
Depends On:
Blocks: 1774149
TreeView+ depends on / blocked
 
Reported: 2019-11-19 17:09 UTC by John Ruemker
Modified: 2021-05-18 15:12 UTC (History)
11 users (show)

Fixed In Version: pcs-0.10.7-3.el8
Doc Type: Enhancement
Doc Text:
Feature: Allow to change corosync totem token. Reason: Users need to raise corosync totem token to avoid fencing during temporary system unresponsiveness. Result: New comand 'pcs cluster config update' was introduced to change corosync configuration which includes change of totem token value.
Clone Of:
Environment:
Last Closed: 2021-05-18 15:12:05 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1667061 0 high CLOSED [RFE] provide commands for changing corosync configuration of an existing cluster 2023-02-18 04:31:53 UTC
Red Hat Knowledge Base (Solution) 221263 0 None None None 2019-11-19 18:54:22 UTC

Internal Links: 1667061

Description John Ruemker 2019-11-19 17:09:04 UTC
Most customers accept the default totem token setting in their clusters.  However, it is very common for our customers to experience node fencing as a result of temporary system unresponsiveness, and usually we have to suggest raising totem token before a customer will even consider that as a possible adjustment.  

Also, many admins don't even know what totem token is, and so they're not finding the documentation or options on their own if they want to dial-back how aggressive their cluster is in fencing nodes.

The Red Hat Support team has long dealt with a high volume of customers asking for root-cause analysis of fence events or node-reboots.  The most common scenario for node fencing is a node becoming temporarily unresponsive longer than corosync's communication timeout, and many customers are left with the impression that RHEL HA is "unstable" when they see node fencing, when in reality a 1s timeout is often just too aggressive for many environments. 

Even though you can set token timeout at setup-time with 'pcs cluster setup totem ...', there is no way yet to do this after the fact.  Also, like I said, many customers don't know what totem is, or what token is, so even with the setup option many admins miss this.

The goal of this request is to make it easier for customers to raise the totem token value using pcs.  I would like to propose a few different ways to do that:

1) Offer a command to change totem token after setup, such as: 

   # pcs cluster update --token=10s


2) Make a command to do the same but with more obvious terminology and simpler for a novice administrator to figure out.  Maybe we could have variations of this under both 'pcs cluster' and 'pcs stonith'.   For example: 

   # pcs cluster communication-timeout (--aggressive|--moderate|--lenient|--timeout=X)

   # pcs stonith communication-timeout (--aggressive|--moderate|--lenient|--timeout=X)

and those options might set something like 1s, 10s, 60s.  


3) In bug #1774132, I proposed we add a 'pcs diagnostics' command, and if we do that, we could offer a variation of this token-adjustment as a subcommand.  For example:

   # pcs diagnostics communication-timeout (--aggressive|--moderate|--lenient|--timeout=X)

   # pcs diagnostics setup  # (which would deploy all the diagnostics, including a --moderate communication timeout).


Obviously these are all redundant.  My goal with these suggestions is to make it more obvious to admins that they can tweak how aggressive the cluster is with fencing, and to put those commands in several places so they'll be more likely to actually discover it. 

My top priority is to at least make it possible to update token through SOME command, so we don't have to guide customers through a config file adjustment whenever there is a fencing incident.  

Once we can do that, then these other suggestions seem like they'd be easy to add as alternative ways to do the same thing.

Comment 1 John Ruemker 2019-11-19 17:12:47 UTC
Related RFE filed in RHEL 7 and linked by many customers asking for ability to change arbitrary corosync values through pcs: https://bugzilla.redhat.com/show_bug.cgi?id=1173346

This request I've described here is more narrow, asking just for totem token.

Comment 10 Miroslav Lisik 2020-12-16 16:33:11 UTC
Proposed fix + tests in attachment 1739694 [details] (bz1667061 comment 12)

Test:
pcs cluster config update totem token=10000

Comment 11 Miroslav Lisik 2020-12-18 17:44:53 UTC
Test:

[root@r8-node-01 ~]# rpm -q pcs
pcs-0.10.7-3.el8.x86_64

[root@r8-node-01 ~]# grep token /etc/corosync/corosync.conf
[root@r8-node-01 ~]# pcs cluster config update totem token=3000
Sending updated corosync.conf to nodes...
r8-node-01: Succeeded
r8-node-02: Succeeded
r8-node-01: Corosync configuration reloaded
[root@r8-node-01 ~]# grep token /etc/corosync/corosync.conf
    token: 3000
[root@r8-node-01 ~]# corosync-cmapctl | grep token | head -1
runtime.config.totem.token (u32) = 3000

[root@r8-node-01 ~]# pcs cluster config update totem token=10000
Sending updated corosync.conf to nodes...
r8-node-01: Succeeded
r8-node-02: Succeeded
r8-node-01: Corosync configuration reloaded
[root@r8-node-01 ~]# grep token /etc/corosync/corosync.conf
    token: 10000
[root@r8-node-01 ~]# corosync-cmapctl | grep token | head -1
runtime.config.totem.token (u32) = 10000

Comment 15 Nina Hostakova 2021-01-12 16:25:38 UTC
Updating the totem token value has been tested along with other corosync configuration options within bz1667061.

Marking verified based on bz1667061 comment19.

Comment 19 errata-xmlrpc 2021-05-18 15:12:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:1737


Note You need to log in before you can comment on or make changes to this bug.