Bug 1940076

Summary: Cluster fails to form membership when totem token is set to 30s or longer
Product: Red Hat Enterprise Linux 8 Reporter: Josef Zimek <pzimek>
Component: kronosnetAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED CURRENTRELEASE QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.4CC: cfeist, fdinitto, jfriesse, sbradley
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1959113 1959114 1959115 (view as bug list) Environment:
Last Closed: 2021-05-10 17:33:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1959113, 1959114, 1959115    

Description Josef Zimek 2021-03-17 14:46:00 UTC
Description of problem:

Cluster membership is able to form as long as totem token is set to value lower than 30000ms. With 30000ms set the cluster fails to form membership and remains inquorate


Version-Release number of selected component (if applicable):
libknet1-1.10-6.el8_2.x86_64                              
libknet1-compress-bzip2-plugin-1.10-6.el8_2.x86_64        
libknet1-compress-lz4-plugin-1.10-6.el8_2.x86_64          
libknet1-compress-lzma-plugin-1.10-6.el8_2.x86_64         
libknet1-compress-lzo2-plugin-1.10-6.el8_2.x86_64     
libknet1-compress-plugins-all-1.10-6.el8_2.x86_64        
libknet1-compress-zlib-plugin-1.10-6.el8_2.x86_64        
libknet1-crypto-nss-plugin-1.10-6.el8_2.x86_64           
libknet1-crypto-openssl-plugin-1.10-6.el8_2.x86_64       
libknet1-crypto-plugins-all-1.10-6.el8_2.x86_64           
libknet1-plugins-all-1.10-6.el8_2.x86_64  



How reproducible:
always

Steps to Reproduce:

Set up cluster and set totem token higher than 29s

Actual results:
Setting the token to 10000 -> works
Setting the token to 20000 -> works
Setting the token to 29000 -> works
Setting the token to 30000 -> does not work. The node is inquorate and all memebers are offline.
Setting the token to 40000 -> does not work. The node is inquorate and all memebers are offline.

(qdevice or number of links doesn't play role here)


Expected results:
Cluster forms normally even with totem token higher than 29s

Additional info:
Proven to work after this patch is applied: https://github.com/kronosnet/kronosnet/commit/4df82e5fd847423b164f4fba70e20fd0026639ce

Comment 1 Jan Friesse 2021-03-17 15:03:52 UTC
Some notes (I was working with reporter and gss to identify problem):
- RHEL 8.3 (and 8.4) is fixed. This bug is there for 8.2.z and probably 8.1.z (I need to test if problem exists also in 8.1 or not and if patch is applicable)
- We may consider to add few more patches because this problem was reported also upstream quite some time ago (I've just totally forgot that info) - https://github.com/corosync/corosync/issues/559 - so adding also https://github.com/kronosnet/kronosnet/pull/281 and https://github.com/kronosnet/kronosnet/pull/283 may be worth - to discuss with Fabio
- Reporter tested the testing package - https://honzaf.fedorapeople.org/knet-02889299/ and (as Pepa said) he reported it as working

Comment 3 Jan Friesse 2021-03-17 15:40:39 UTC
RHEL 8.1 is really also affected (that's expected - kronosnet is also version 1.10 in RHEL 8.1).