Bug 837289 - Sending durable messages via a federation link to a destination cluster causes cluster-desync in long term
Sending durable messages via a federation link to a destination cluster cause...
Status: NEW
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
All Linux
high Severity high
: ---
: ---
Assigned To: Alan Conway
MRG Quality Engineering
Depends On:
  Show dependency treegraph
Reported: 2012-07-03 07:28 EDT by Pavel Moravec
Modified: 2014-06-17 15:55 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
reproducer (3.72 KB, application/x-gzip)
2013-02-13 11:41 EST, Pavel Moravec
no flags Details

  None (edit)
Description Pavel Moravec 2012-07-03 07:28:00 EDT
Description of problem:
Long usage of a federation link between two clusters where _durable_ messages are frequently sent leads to cluster-desync (of the dst.cluster). See Aditional info for reasoning.

Version-Release number of selected component (if applicable):
any (checked qpid 0.14)

How reproducible:
depending on time, but 10% for default scenario

Steps to Reproduce:
(ideally, if possible, rebuild qpid with changing cpp/src/qpid/SessionState.cpp: static const uint32_t SPONTANEOUS_REQUEST_INTERVAL = 64 to really, really significantly speedup the reproducer)
(also note, you could get false bug-like results due to bz836141)

1) Have a source broker (or cluster, this does not matter) and dest.cluster with queue state replication of just one queue from source do dest.cluster.
2) On the federation route, setup --ack to some low number (to speedup replication, I used --ack 5).
3) Randomly produce and consume _durable_ messages to the src.broker to the queue to be replicated - ideally, do the enqueues and dequeues as much alternating as possible. 
4) Either wait very long time (usually in millions of enqueues and dequeues) to get cluster desync, or monitor on whole dst.cluster these trace logs:

trace @QPID.9593a43d-0450-4bea-b510-a315e726bb63: sent cmd 131015: {MessageAcceptBody: transfers={ [5077482,5077486] }; }

(where the trace is generated by adding "log-enable=trace+:senderRecord" to /etc/qpidd.conf and the uuid is ID of the session on fed.link)

Actual results:
- After some random time (i.e. "sent cmd 98000:" or so), the session points in "transfers" will differ among cluster members (note that due to bz836141, they will differ by 3 since beginning, so observe for a _different_ difference that will randomly appear).
- After that, there is a +- 10% probability that after each SPONTANEOUS_REQUEST_INTERVAL-th message (by default after 65536, 65536*2, ..) sent, the cluster will de-sync logging something like:

critical Error delivering frames: local error did not occur on all cluster members : invalid-argument: admin@QPID.0eb08005-17cf-4219-8904-7c13013ffc80: confirmed < (65535+0) but only sent < (1625+0) (qpid/SessionState.cpp:154) (qpid/cluster/ErrorCheck.cpp:89)

(where the difference between "confirmed" and "sent" corresponds to the difference from session points)

Expected results:
no either session point difference or cluster de-sync

Additional info:
- Replication steps use queue state replication, though the bug is relevant to more general usage of federation link. Anyway once queue replication is set up, the reproducer is easier and faster leading to the bug.

- The root cause of the bug is in async completion of received messages. In particular, the session difference started to appear when SessionState::IncompleteIngressMsgXfer::completed(bool sync) was called with sync=false for the node keeping the fed.link connection and with sync=true for the other nodes (that had the fed.link as shadow connection only).
This was caused by different asynchronous processing / storing of durable messages into the store. The asynchronous storing eventually causes that the same group of messages is stored (resp. the storing is completed) in different ordering. That completion, however, triggers async completion (see ./qpid/broker/AsyncCompletion.h file) what causes different brokers accepts (or reply to accept for shadow connections) different sets / ranges of messages.

This hidden asynchronicity could lead to the state that one broker sends a bunch of ranges in one message.accept while another broker sends (or replays for shadow connections) multiple message.accept AMQP messages. This then directly leads to de-sync in session points, what can be - with some "luck" - revealed when the dt.broker sends spontaneous sync. request (that the other brokers don't replay, having session point set differently).
Comment 1 Pavel Moravec 2013-02-13 11:41:49 EST
Created attachment 696913 [details]

1) compile attached cpp
2) run attached script and monitor if "sent cmd 117328" (for whatever number there) trace log is the same on all qpidd.500*.log - that directly indicates if the bug can occur or not (hitting the bug directly is little bit more tricky)

Note You need to log in before you can comment on or make changes to this bug.