Bug 1940076 - Cluster fails to form membership when totem token is set to 30s or longer
Summary: Cluster fails to form membership when totem token is set to 30s or longer
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kronosnet
Version: 8.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Christine Caulfield
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1959113 1959114 1959115
TreeView+ depends on / blocked
 
Reported: 2021-03-17 14:46 UTC by Josef Zimek
Modified: 2023-07-10 07:36 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1959113 1959114 1959115 (view as bug list)
Environment:
Last Closed: 2021-05-10 17:33:07 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5889891 0 None None None 2021-03-17 15:56:09 UTC

Description Josef Zimek 2021-03-17 14:46:00 UTC
Description of problem:

Cluster membership is able to form as long as totem token is set to value lower than 30000ms. With 30000ms set the cluster fails to form membership and remains inquorate


Version-Release number of selected component (if applicable):
libknet1-1.10-6.el8_2.x86_64                              
libknet1-compress-bzip2-plugin-1.10-6.el8_2.x86_64        
libknet1-compress-lz4-plugin-1.10-6.el8_2.x86_64          
libknet1-compress-lzma-plugin-1.10-6.el8_2.x86_64         
libknet1-compress-lzo2-plugin-1.10-6.el8_2.x86_64     
libknet1-compress-plugins-all-1.10-6.el8_2.x86_64        
libknet1-compress-zlib-plugin-1.10-6.el8_2.x86_64        
libknet1-crypto-nss-plugin-1.10-6.el8_2.x86_64           
libknet1-crypto-openssl-plugin-1.10-6.el8_2.x86_64       
libknet1-crypto-plugins-all-1.10-6.el8_2.x86_64           
libknet1-plugins-all-1.10-6.el8_2.x86_64  



How reproducible:
always

Steps to Reproduce:

Set up cluster and set totem token higher than 29s

Actual results:
Setting the token to 10000 -> works
Setting the token to 20000 -> works
Setting the token to 29000 -> works
Setting the token to 30000 -> does not work. The node is inquorate and all memebers are offline.
Setting the token to 40000 -> does not work. The node is inquorate and all memebers are offline.

(qdevice or number of links doesn't play role here)


Expected results:
Cluster forms normally even with totem token higher than 29s

Additional info:
Proven to work after this patch is applied: https://github.com/kronosnet/kronosnet/commit/4df82e5fd847423b164f4fba70e20fd0026639ce

Comment 1 Jan Friesse 2021-03-17 15:03:52 UTC
Some notes (I was working with reporter and gss to identify problem):
- RHEL 8.3 (and 8.4) is fixed. This bug is there for 8.2.z and probably 8.1.z (I need to test if problem exists also in 8.1 or not and if patch is applicable)
- We may consider to add few more patches because this problem was reported also upstream quite some time ago (I've just totally forgot that info) - https://github.com/corosync/corosync/issues/559 - so adding also https://github.com/kronosnet/kronosnet/pull/281 and https://github.com/kronosnet/kronosnet/pull/283 may be worth - to discuss with Fabio
- Reporter tested the testing package - https://honzaf.fedorapeople.org/knet-02889299/ and (as Pepa said) he reported it as working

Comment 3 Jan Friesse 2021-03-17 15:40:39 UTC
RHEL 8.1 is really also affected (that's expected - kronosnet is also version 1.10 in RHEL 8.1).


Note You need to log in before you can comment on or make changes to this bug.