Bug 768144 - quorum_notification_fn_t gives wrong ring_seq
Summary: quorum_notification_fn_t gives wrong ring_seq
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Corosync Cluster Engine
Classification: Retired
Component: quorum
Version: 1.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Fabio Massimo Di Nitto
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-12-15 21:15 UTC by David Teigland
Modified: 2013-06-20 14:40 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-06-20 14:40:54 UTC


Attachments (Terms of Use)
corosync.conf (1.42 KB, text/plain)
2011-12-16 17:05 UTC, David Teigland
no flags Details

Description David Teigland 2011-12-15 21:15:23 UTC
Description of problem:

After a node is killed, the libquorum callback gives the old ring_seq number, not the new one.  This breaks the in-development version of dlm_controld.

Extracting some messages from dlm_controld log:

- all cluster nodes join in seq 4972
8820 cluster quorum 1 seq 4972 nodes 4
8820 cluster node 1 added seq 4972
8820 cluster node 2 added seq 4972
8820 cluster node 4 added seq 4972
8820 cluster node 5 added seq 4972

- cpg totem callbacks show correct ring seq
8820 dlm:controld ring 1:4972 4 memb 1 2 4 5

- node 4 killed and removed from cluster
  this should obviously be ring seq 4976, not 4972
10233 cluster quorum 1 seq 4972 nodes 3
10233 cluster node 4 removed seq 4972

- cpg totem callbacks show new correct ring seq
10233 dlm:controld ring 1:4976 3 memb 1 2 5

(On a related note, it would be really useful if the corosync
messages in /var/log/messages included the ring seq number so
logs could be correlated among nodes, and with other code.
Similarly, no corosync commands display any ring seq numbers,
which cman_tool did.)

Version-Release number of selected component (if applicable):

corosync-1.2.3-36.el6.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 David Teigland 2011-12-15 21:41:38 UTC
Also happens if a just shut down a node using corosync-cfgtool -H.

Comment 3 David Teigland 2011-12-16 17:05:02 UTC
Created attachment 547878 [details]
corosync.conf

Comment 4 David Teigland 2011-12-16 17:26:33 UTC
Fabio says votequorum is irrelevant.  Reassigning to make sure this works properly in the new quorum module.

Comment 5 Fabio Massimo Di Nitto 2011-12-17 07:28:58 UTC
Either way, there is a bug in votequorum confchg_fn that does not send notifications back to to the quorum module and that would explain why you see different ringids based on when certain things are happening.

Comment 6 Fabio Massimo Di Nitto 2011-12-19 13:46:56 UTC
This is definitely a bug in votequorum but the API is not supported in RHEL. Moving to upstream.

Comment 7 Fabio Massimo Di Nitto 2012-01-06 04:05:10 UTC
Fixed in the topic-quorum branch

Comment 8 Jan Friesse 2013-06-20 14:40:54 UTC
topic-quorum is merged for ages now, so closing as upstream.


Note You need to log in before you can comment on or make changes to this bug.