Bug 836141
Summary: | destination cluster de-sync when federation link used for a longer time | |||
---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Pavel Moravec <pmoravec> | |
Component: | qpid-cpp | Assignee: | mick <mgoulish> | |
Status: | CLOSED ERRATA | QA Contact: | Zdenek Kraus <zkraus> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 2.1 | CC: | aconway, gsim, iboverma, jross, kgiusti, lzhaldyb, mcressma, mgoulish, sauchter, zkraus | |
Target Milestone: | 2.3 | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | qpid-cpp-0.18-4 | Doc Type: | Bug Fix | |
Doc Text: |
BZ 836141
---------------------------------
A bug report in iambic pentameter,
by Mick Goulish.
Cause:
The eldest cluster broker sends commands
Creates its federation link and bridge,
But leaves behind its juniors, all confused.
Not privy to these mysteries, they soon
Fall out of sync, and drag the cluster down.
Consequence:
Without agreement, how can clusters run?
If elder and its juniors disagree,
Divide against itself this fragile house,
How can it stand?
How can the perfect synchrony exist?
It can't. And therein lies the rub.
Fix:
The elder broker builds its links and bridge,
But tells the youngsters: "Do thou as I do!"
They too can build that bridge, maintaining it
Against the day when elder fails. All their
Commands are thus maintained in sync.
Result:
The cluster hums, the traffic leaps and flows.
Where once was strife and error, harmony
has come. The synchrony will no doubt be
disturbed again. But not by this BZ.
|
Story Points: | --- | |
Clone Of: | ||||
: | 866627 (view as bug list) | Environment: | ||
Last Closed: | 2013-03-06 18:56:15 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 866627 | |||
Attachments: |
Description
Pavel Moravec
2012-06-28 08:10:39 UTC
In Bridge::create method in ./cpp/src/qpid/broker/Bridge.cpp, these lines are not replayed on shadow connections: peer->getMessage().subscribe(args.i_src, args.i_dest, args.i_sync ? 0 : 1, 0, false, "", 0, options); peer->getMessage().flow(args.i_dest, 0, 0xFFFFFFFF); peer->getMessage().flow(args.i_dest, 1, 0xFFFFFFFF); QPID_LOG(debug, "Activated route from queue " << args.i_src << " to " << args.i_dest); That's why "shadow brokers" (i.e. those that don't have the fed.link as local connection but only as shadow) dont execute the methods directly - so far ok, I think. But.. The above is not executed on shadow brokers as method Link::maintenanceVisit() in ./cpp/src/qpid/broker/Link.cpp is executed differently for them. In particular: .. else if (state == STATE_OPERATIONAL && (!created.empty() || !cancellations.empty()) && connection != 0) connection->requestIOProcessing (boost::bind(&Link::ioThreadProcessing, this)); the if statement is evaluated to false (as state = STATE_PASSIVE and connection = 0) so connection->requestIOProcessing (that further calls Link::ioThreadProcessing method that calls Bridge::create) is not executed on "shadow" brokers. The question is: what piece in the chain should multicast the three messages (from Bridge::create)? Backtrace of creating a bridge: ------------------------------- #0 qpid::broker::Bridge::create (this=0x1aa7840, c=...) at qpid/broker/Bridge.cpp:74 #1 0x00007fabed2eee79 in qpid::broker::Link::ioThreadProcessing (this=0x1a9d8f0) at qpid/broker/Link.cpp:339 #2 0x00007fabed2b5fcc in operator() (this=0x1a9dcf0) at /usr/include/boost/function/function_template.hpp:1013 #3 qpid::broker::Connection::doIoCallbacks (this=0x1a9dcf0) at qpid/broker/Connection.cpp:341 #4 0x00007fabed2b6460 in qpid::broker::Connection::doOutput (this=0x1a9dcf0) at qpid/broker/Connection.cpp:348 #5 0x00007fabec215d01 in qpid::cluster::OutputInterceptor::deliverDoOutput (this=0x1aa03d0, limit=2048) at qpid/cluster/OutputInterceptor.cpp:89 #6 0x00007fabece5fbf2 in invoke<qpid::framing::AMQP_AllOperations::ClusterConnectionHandler> (this=0x7fabe7a05a70, body=<value optimized out>) at ./qpid/framing/ClusterConnectionDeliverDoOutputBody.h:63 #7 qpid::framing::AMQP_AllOperations::ClusterConnectionHandler::Invoker::visit (this=0x7fabe7a05a70, body=<value optimized out>) at qpid/framing/AllInvoker.cpp:1116 #8 0x00007fabec1f8876 in qpid::framing::invoke<qpid::cluster::Connection> (target=<value optimized out>, body=<value optimized out>) at ./qpid/framing/Invoker.h:80 #9 0x00007fabec1f1e09 in qpid::cluster::Connection::deliveredFrame (this=0x1aa0370, f=<value optimized out>) at qpid/cluster/Connection.cpp:248 #10 0x00007fabec1cb14e in qpid::cluster::Cluster::processFrame (this=<value optimized out>, e=..., l=<value optimized out>) at qpid/cluster/Cluster.cpp:537 #11 0x00007fabec1cf9b8 in qpid::cluster::Cluster::deliveredFrame (this=0x1a91e30, efConst=<value optimized out>) at qpid/cluster/Cluster.cpp:521 #12 0x00007fabec1d8528 in operator() (this=0x1a92448, values=std::deque with 1 elements = {...}) at /usr/include/boost/function/function_template.hpp:1013 #13 qpid::cluster::PollableQueue<qpid::cluster::EventFrame>::handleBatch (this=0x1a92448, values=std::deque with 1 elements = {...}) at ./qpid/cluster/PollableQueue.h:59 #14 0x00007fabec1d2dbc in operator() (function_obj_ptr=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:162 #15 operator()<std::_Deque_iterator<qpid::cluster::EventFrame, const qpid::cluster::EventFrame&, const qpid::cluster::EventFrame*>, boost::_mfi::mf1<std::_Deque_iterator<qpid::cluster::EventFrame, const qpid::cluster::EventFrame&, const qpid::cluster::EventFrame*>, qpid::cluster::PollableQueue<qpid::cluster::EventFrame>, const std::deque<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> >&>, boost::_bi::list1<const std::deque<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> >&> > (function_obj_ptr=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/bind/bind.hpp:296 #16 operator()<std::deque<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > > (function_obj_ptr=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/bind/bind_template.hpp:47 #17 boost::detail::function::function_obj_invoker1<boost::_bi::bind_t<std::_Deque_iterator<qpid::cluster::EventFrame, qpid::cluster::EventFrame const&, qpid::cluster::EventFrame const*>, boost::_mfi::mf1<std::_Deque_iterator<qpid::cluster::EventFrame, qpid::cluster::EventFrame const&, qpid::cluster::EventFrame const*>, qpid::cluster::PollableQueue<qpid::cluster::EventFrame>, std::deque<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > const&>, boost::_bi::list2<boost::_bi::value<qpid::cluster::PollableQueue<qpid::cluster::EventFrame>*>, boost::arg<1> > >, std::_Deque_iterator<qpid::cluster::EventFrame, qpid::cluster::EventFrame const&, qpid::cluster::EventFrame const*>, std::deque<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > const&>::invoke (function_obj_ptr=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/function/function_template.hpp:132 #18 0x00007fabec1dccb2 in operator() (this=0x1a92448) at /usr/include/boost/function/function_template.hpp:1013 #19 qpid::sys::PollableQueue<qpid::cluster::EventFrame>::process (this=0x1a92448) at ./qpid/sys/PollableQueue.h:152 #20 0x00007fabec1e00ac in qpid::sys::PollableQueue<qpid::cluster::EventFrame>::dispatch (this=0x1a92448, cond=...) at ./qpid/sys/PollableQueue.h:138 #21 0x00007fabece08e3d in operator() (this=<value optimized out>) at /usr/include/boost/function/function_template.hpp:1013 #22 qpid::sys::PollableConditionPrivate::dispatch (this=<value optimized out>) at qpid/sys/posix/PollableCondition.cpp:95 #23 0x00007fabecec8223 in boost::function1<void, qpid::sys::DispatchHandle&>::operator() (this=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/function/function_template.hpp:1013 #24 0x00007fabecec5426 in qpid::sys::DispatchHandle::processEvent (this=0x1a94fd0, type=qpid::sys::Poller::READABLE) at qpid/sys/DispatchHandle.cpp:280 #25 0x00007fabece0d50d in process (this=0x1a77130) at ./qpid/sys/Poller.h:131 #26 qpid::sys::Poller::run (this=0x1a77130) at qpid/sys/epoll/EpollPoller.cpp:519 #27 0x00007fabece04e9a in qpid::sys::(anonymous namespace)::runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35 #28 0x00000037b5a077f1 in start_thread () from /lib64/libpthread.so.0 #29 0x00000037b56e592d in clone () from /lib64/libc.so.6 The problem is that the IO callbacks (see frame #3) are not cluster safe because they are queued for execution in non-IO threads. The issue is more general - it occurs regardless of queue state replication, as it depends only on federation link long usage. (anyway the replication scenario is valid) Created attachment 603210 [details]
reproducer stuff
Content:
- reproducer_836141.sh (the main script)
- directories 4001 5001 5002 (for running brokers on that ports, 4001 is src. broker, 5001 and 5002 are two dst brokers in cluster, 5001 is elder node, 5002 is being restarted)
- run_spout_drain.sh (to run spout and drain; assuming spout and drain are in $PATH)
Comment on attachment 603210 [details]
reproducer stuff
Sorry for wrong reproducer scripts.. It is quite tricky.
Proper reproducer:
1) create queue state replication etc., with "--ack 5" in qpid-route.
2) send 163800 messages to dataQ and in parallel consume them (i.e. to generate 163800*2 event messages on the fed.link)
3) slooowly send 100 messages to the queue (i.e. 2 messages per second). Around 70th message, non-elder dst.broker shuts down.
A proposed patch for this issue is https://reviews.apache.org/r/6493/ The patch is based on trunk r1370792 I have not yet been able to reproduce the problem so the patch is not confirmed. Created attachment 603869 [details]
replication script
Attached is proper version of reproducer script. After sending 65th message in the "until egrep" loop, broker 5002 (the new joiner) shuts down.
Created attachment 604098 [details]
Patch based on 0.10-8, does not work.
It is not straightforward to back-port the fix to 0.10. The attached patch is an attempt but fails the test because in this early code, bridges are named with a UUID which is not consistent across the cluster. We need to synchronize bridge names to get this to work.
Summary of status Background: If you create a link to a cluster, only the elder broker creates a connection and sets up a bridge subscription. The other brokers do get a shadow connection but they don't create a link or a bridge - they treat it like a regular client connection. Problem: To subscribe the bridge, the elder sends a MessageSubscribe, MessageFlow etc. which the other brokers know nothing about, so the command numbers are different by 3 for elder vs. non-elder. This doesn't show up immediately because the other end doesn't send completions unless asked, and we don't ask. The problem shows up (sporadically) when the other end hits the "spontaneous request" limit, by default 65536, and sends an unsolicited completion. It doesn't always fail there reliable because if there are more than 3 messages in transit, the command number will be low enough to be in the range of commands that the non-elders know about. Solution: Non-elder brokers create passive links and bridges, and associate the link with the shadow connection. The _only_ difference between an active and passive link is that an active link opens a connection, but a passive link just waits for the shadow connection to appear. Other than that they send identical commands so have identical numbering. The passive links go thru exactly the same state changes as the active one. There are 2 patches, 1. The one mentioned in comment 12 (https://reviews.apache.org/r/6493/): This based off a trunk revision. There is a new test in the patch (cluster_tests.py:test_federation_bridges_consistent) to verify that all members of a cluster send the same bridge subscription commands, that test is passing. I wasn't able to reproduce the original problem on this trunk revision with or without the patch. 2 .attachment 603869 [details]: This is 1. rebased onto 0.10-8. Unfortunately it doesn't work. In the older code bridges are assigned UUID names and these are not synchronized in a cluster. I am able to reproduce the bug on un-patched 0.10, but the patched version fails with a "wrong session name". We need to figure out a way to sync the session names. (In the newer trunk code bridges have user-defined names that are synchronized) Created attachment 604397 [details]
Alan's fix ported to proper hotfix 0.10-8_gs_2
Alan's fix ported to proper hotfix 0.10-8_gs_2 (i.e. not to pure 0.10-8).
To resolve link names generated not from UUID, QPID-3767 (SVN revisions r1342442,
r1334139 and r1334138) needs to be backported upon it.
Created attachment 606198 [details] patch - current (my final) status Attaching a patch that can be the base for customer's hotfix (fixing BZs 826989, adjacent 844618 and this 836141). What the patch joins and contains: 1) patch from 826989 2) patch from 844618 3) Alan's backport of QPID-4201 4) Ken's backport of QPID-3767 4) qpid.link for failover renamed to get rid of Uuid (see https://reviews.apache.org/r/6493/, first comment of Kenneth Giusti) 5) one subtle change in Alan's backport in swapping two methods in Connection::Connection I successfully verified the patch against both reproducers. However, this patch has two small issues: 1) in 1 from 30 cases, federation link establishment results into immediate cluster de-sync and consequent brokers stop. I can provide details when requested. 2) sometimes when abruptly stopping a broker, it writes to stderr: pure virtual method called terminate called without an active exception Can you attach details on reproducing issue 1 from comment 17, and any log info you might have around the failure? I'm less worried about issue 2, since it only happens during shutdown. What was the "Subtle change" mentioned in comment 17 point 5? Created attachment 609884 [details]
qpid traces from GS hotfix 4 with cluster de-sync
Trace logs of cluster de-sync after federation route creation in GS hotfix 4.
(In reply to comment #19) > What was the "Subtle change" mentioned in comment 17 point 5? Originally I thought I attached the patch with the "subtle change" that based on some tests had less probability of cluster de-sync when fed.route is created. But reviewing it, I commited the patch _without_ the change. (the change was about moving links.notifyConnection just _after_ addManagementObject method in https://bugzilla.redhat.com/attachment.cgi?id=606198&action=diff#a/qpid/cpp/src/qpid/broker/Connection.cpp_sec1; but as I wrote, I didnt apply the change, cf. with original https://bugzilla.redhat.com/attachment.cgi?id=604098&action=diff#a/qpid/cpp/src/qpid/broker/Connection.cpp_sec1) On my box - mrg26 - I am getting much more frequent failures on start-up -- close to 50%. I also see that if we get past the start-up without a problem, then the rest of the test runs fine for thousands of iterations. failure mechanism, at bottom level, is: amqp_0_10::SessionHandler::attached checks the incoming session name against the name it was expecting, from getState()->getId().getName(), and finds that they differ. It becomes upset. The names differ only in the UUID-looking part at the end of the name. ( is it a management ID? ) If I make amqp_0_10::SessionHandler::attached not bother its little head about this check, then the rest of the test runs forever with no problems.... still poking... The problematic names come through SessionHandler::attachAs There are two calls to SessionHandler::attachAs both calls come from Bridge::create ( should there be two? ) both calls come from different threads when the queue name is the same in both calls, the test succeeds. when the two names are different, the test fails. hmmm. Created attachment 614799 [details]
additional patch fixing failover problems
Additional patch fixing some failures in federation link cluster failover automate tests.
While there remains a problem with the 0.10 backport, this issue is addressed (fixed on trunk, comment 12) for 2.3. -> POST Comment 12 is stating that the fix "depends on" trunk r1370792. The fix itself does not yet appear to be checked into trunk. Moving back to ASSIGNED. This patch can't be applied to trunk any more, the cluster code was removed from trunk in the Big Refactor. So we only need the backports. Issue was successfully reproduced on qpidd-cpp-0.14-22 in 100% runs, on qpid-cpp-0.18-13 issue didn't occur at all after several tries. Thus this issue was fixed. Tested on RHEL 5.9, 6.4 && i686, x86_64 with qpid-cpp-0.18-13. -> VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0561.html |