Hide Forgot
Description of problem: After a node is killed, the libquorum callback gives the old ring_seq number, not the new one. This breaks the in-development version of dlm_controld. Extracting some messages from dlm_controld log: - all cluster nodes join in seq 4972 8820 cluster quorum 1 seq 4972 nodes 4 8820 cluster node 1 added seq 4972 8820 cluster node 2 added seq 4972 8820 cluster node 4 added seq 4972 8820 cluster node 5 added seq 4972 - cpg totem callbacks show correct ring seq 8820 dlm:controld ring 1:4972 4 memb 1 2 4 5 - node 4 killed and removed from cluster this should obviously be ring seq 4976, not 4972 10233 cluster quorum 1 seq 4972 nodes 3 10233 cluster node 4 removed seq 4972 - cpg totem callbacks show new correct ring seq 10233 dlm:controld ring 1:4976 3 memb 1 2 5 (On a related note, it would be really useful if the corosync messages in /var/log/messages included the ring seq number so logs could be correlated among nodes, and with other code. Similarly, no corosync commands display any ring seq numbers, which cman_tool did.) Version-Release number of selected component (if applicable): corosync-1.2.3-36.el6.x86_64 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Also happens if a just shut down a node using corosync-cfgtool -H.
Created attachment 547878 [details] corosync.conf
Fabio says votequorum is irrelevant. Reassigning to make sure this works properly in the new quorum module.
Either way, there is a bug in votequorum confchg_fn that does not send notifications back to to the quorum module and that would explain why you see different ringids based on when certain things are happening.
This is definitely a bug in votequorum but the API is not supported in RHEL. Moving to upstream.
Fixed in the topic-quorum branch
topic-quorum is merged for ages now, so closing as upstream.