Description of problem: Cluster membership is able to form as long as totem token is set to value lower than 30000ms. With 30000ms set the cluster fails to form membership and remains inquorate Version-Release number of selected component (if applicable): libknet1-1.10-6.el8_2.x86_64 libknet1-compress-bzip2-plugin-1.10-6.el8_2.x86_64 libknet1-compress-lz4-plugin-1.10-6.el8_2.x86_64 libknet1-compress-lzma-plugin-1.10-6.el8_2.x86_64 libknet1-compress-lzo2-plugin-1.10-6.el8_2.x86_64 libknet1-compress-plugins-all-1.10-6.el8_2.x86_64 libknet1-compress-zlib-plugin-1.10-6.el8_2.x86_64 libknet1-crypto-nss-plugin-1.10-6.el8_2.x86_64 libknet1-crypto-openssl-plugin-1.10-6.el8_2.x86_64 libknet1-crypto-plugins-all-1.10-6.el8_2.x86_64 libknet1-plugins-all-1.10-6.el8_2.x86_64 How reproducible: always Steps to Reproduce: Set up cluster and set totem token higher than 29s Actual results: Setting the token to 10000 -> works Setting the token to 20000 -> works Setting the token to 29000 -> works Setting the token to 30000 -> does not work. The node is inquorate and all memebers are offline. Setting the token to 40000 -> does not work. The node is inquorate and all memebers are offline. (qdevice or number of links doesn't play role here) Expected results: Cluster forms normally even with totem token higher than 29s Additional info: Proven to work after this patch is applied: https://github.com/kronosnet/kronosnet/commit/4df82e5fd847423b164f4fba70e20fd0026639ce
Some notes (I was working with reporter and gss to identify problem): - RHEL 8.3 (and 8.4) is fixed. This bug is there for 8.2.z and probably 8.1.z (I need to test if problem exists also in 8.1 or not and if patch is applicable) - We may consider to add few more patches because this problem was reported also upstream quite some time ago (I've just totally forgot that info) - https://github.com/corosync/corosync/issues/559 - so adding also https://github.com/kronosnet/kronosnet/pull/281 and https://github.com/kronosnet/kronosnet/pull/283 may be worth - to discuss with Fabio - Reporter tested the testing package - https://honzaf.fedorapeople.org/knet-02889299/ and (as Pepa said) he reported it as working
RHEL 8.1 is really also affected (that's expected - kronosnet is also version 1.10 in RHEL 8.1).