Bug 508959 - Attempt to propagate binding info over dynamic link can crash broker if link is concurrently destroyed
Summary: Attempt to propagate binding info over dynamic link can crash broker if link ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1
Hardware: All
OS: Linux
urgent
high
Target Milestone: 1.3
: ---
Assignee: Gordon Sim
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
: 509970 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-30 16:39 UTC by Gordon Sim
Modified: 2015-11-16 00:07 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, attempting to propagate binding information over a dynamic link that was concurrently destroyed may have caused the broker to terminate unexpectedly. This update ensures that dynamic bridges are not propagated over destroyed links, and the broker no longer crashes.
Clone Of:
Environment:
Last Closed: 2010-10-14 15:59:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0773 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 15:56:44 UTC

Description Gordon Sim 2009-06-30 16:39:20 UTC
Description of problem:

Broker crashes when using dynamic federation links if the link dies concurrently with the propagation of a bind or unbind.

Version-Release number of selected component (if applicable):

Since 1.1

How reproducible:

Relatively easily

Steps to Reproduce:
1. setup dynamic federation for a particular exchange

   e.g. start brokers on 5672 and 5673

   qpid-config add exchange topic federated.topic
   qpid-config -a localhost:5673 add exchange topic federated.topic
   qpid-route dynamic add localhost:5672 localhost:5673 federated.topic
   qpid-route dynamic add localhost:5673 localhost:5672 federated.topic
   qpid-config add queue test-queue

2. bind/unbind continually on one broker

   e.g. run the following in a loop

   qpid-config bind federated.topic test-queue binding
   qpid-config unbind federated.topic test-queue binding
   
3. stop and restart the other broker

   e.g. stop and restart the broker on 5673
  
Actual results:

Eventually the broker on 5672 crashes (segfault or aborts).

Expected results:

No crashing

Additional info:

Another way to reproduce is to setup federation between clusters, have receivers on one cluster, senders on the other and bounce the nodes in turn on the cluster for the receivers.

Comment 1 Gordon Sim 2009-06-30 16:46:51 UTC
The root of the problem is that the binding information for dynamic routes are propagated by sending commands over a bridge on another connections IO thread.

Bridge::propagateBinding() needs to record the details being propagated and request processing on the IO thread for the link to which it belongs.

Comment 2 Ted Ross 2009-07-01 18:10:27 UTC
More information...

If you repeat the same test but use "amq.topic" instead of "federated.topic", the crash does not occur.

It appears that the destination broker, when propagating a binding to a source broker that doesn't have the named exchange, receives an exception that leaves the destination broker in a bad state.  Subsequent attempts to propagate bindings result in the broker crash.

Here's another reproducer that illustrates this affect in a more focused way:

#!/bin/sh
b1=localhost:5672
b2=localhost:5673

echo "Creating fed.topic exchange on brokers..."
qpid-config -a $b1 add exchange topic fed.topic
qpid-config -a $b2 add exchange topic fed.topic

echo "Creating bi-directional dynamic routes..."
qpid-route dynamic add $b1 $b2 fed.topic
qpid-route dynamic add $b2 $b1 fed.topic

echo "Create queue..."
qpid-config -a $b1 add queue test-queue

echo "Please stop and restart $b2"
sleep 10

echo "Create binding..."
qpid-config -a $b1 bind fed.topic test-queue test-key

echo "Create a second binding..."
qpid-config -a $b1 bind fed.topic test-queue test-key2

echo "Create a third binding..."
qpid-config -a $b1 bind fed.topic test-queue test-key3

Comment 3 Gordon Sim 2009-07-07 08:15:58 UTC
*** Bug 509970 has been marked as a duplicate of this bug. ***

Comment 4 Gordon Sim 2009-07-07 08:18:06 UTC
Similar crash observed from run of failover_soak (see https://bugzilla.redhat.com/show_bug.cgi?id=509970).

Comment 7 Gordon Sim 2010-06-08 18:43:45 UTC
#0  0x009a9424 in __kernel_vsyscall ()
#1  0x0034c781 in raise () from /lib/libc.so.6
#2  0x0034e04a in abort () from /lib/libc.so.6
#3  0x07d2d44f in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#4  0x07d2b385 in ?? () from /usr/lib/libstdc++.so.6
#5  0x07d2b3c2 in std::terminate() () from /usr/lib/libstdc++.so.6
#6  0x07d2c075 in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#7  0x00b543f0 in qpid::framing::Proxy::send (this=0x83fc044, b=@0xbf8360ec) at ../../src/qpid/framing/Proxy.cpp:37
#8  0x00ab0f19 in qpid::framing::AMQP_ServerProxy::Exchange::bind (this=0x83fc044, queue="bridge_queue_1_0cdf920a-1cae-422e-954c-8fd0a37929e5", 
    exchange="federated.topic", bindingKey="binding", arguments=@0xbf8361a0) at qpid/framing/AMQP_ServerProxy.cpp:328
#9  0x0064a974 in qpid::broker::Bridge::ioThreadPropagateBinding (this=0xb5502788, queue="bridge_queue_1_0cdf920a-1cae-422e-954c-8fd0a37929e5", 
    exchange="federated.topic", key="binding", args={values = std::map with 3 elements = {...}}) at ../../src/qpid/broker/Bridge.cpp:316
#10 0x0064f13a in boost::_mfi::mf4<void, qpid::broker::Bridge, std::string const&, std::string const&, std::string const&, qpid::framing::FieldTable>::operator() (a4=<value optimized out>, a3=<value optimized out>, p=<value optimized out>, this=<value optimized out>, a2=<value optimized out>, 
    a1=<value optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:494
#11 operator()<boost::_mfi::mf4<void, qpid::broker::Bridge, const std::string&, const std::string&, const std::string&, qpid::framing::FieldTable>, boost::_bi::list0> (a4=<value optimized out>, a3=<value optimized out>, p=<value optimized out>, this=<value optimized out>, a2=<value optimized out>, 
    a1=<value optimized out>) at /usr/include/boost/bind.hpp:504
#12 boost::_bi::bind_t<void, boost::_mfi::mf4<void, qpid::broker::Bridge, std::string const&, std::string const&, std::string const&, qpid::framing::FieldTable>, boost::_bi::list5<boost::_bi::value<qpid::broker::Bridge*>, boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<qpid::framing::FieldTable> > >::operator() (a4=<value optimized out>, a3=<value optimized out>, 
    p=<value optimized out>, this=<value optimized out>, a2=<value optimized out>, a1=<value optimized out>)
    at /usr/include/boost/bind/bind_template.hpp:20
#13 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf4<void, qpid::broker::Bridge, std::string const&, std::string const&, std::string const&, qpid::framing::FieldTable>, boost::_bi::list5<boost::_bi::value<qpid::broker::Bridge*>, boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<qpid::framing::FieldTable> > >, void>::invoke (
    a4=<value optimized out>, a3=<value optimized out>, p=<value optimized out>, this=<value optimized out>, a2=<value optimized out>, 
    a1=<value optimized out>) at /usr/include/boost/function/function_template.hpp:152
#14 0x0066f38c in boost::function0<void>::operator() (this=0xbf836290) at /usr/include/boost/function/function_template.hpp:989
#15 0x0066b924 in qpid::broker::Connection::doOutput (this=0xb554bc18) at ../../src/qpid/broker/Connection.cpp:276
#16 0x002815b9 in qpid::cluster::OutputInterceptor::deliverDoOutput (this=0x83bcc00, limit=2048) at ../../src/qpid/cluster/OutputInterceptor.cpp:86
#17 0x00256f27 in qpid::cluster::Connection::deliverDoOutput (this=0x83bcbc8, limit=2048) at ../../src/qpid/cluster/Connection.cpp:242
#18 0x00aee44d in qpid::framing::ClusterConnectionDeliverDoOutputBody::invoke<qpid::framing::AMQP_AllOperations::ClusterConnectionHandler> (
    invocable=<value optimized out>, this=<value optimized out>) at ./qpid/framing/ClusterConnectionDeliverDoOutputBody.h:63
#19 qpid::framing::AMQP_AllOperations::ClusterConnectionHandler::Invoker::visit (invocable=<value optimized out>, this=<value optimized out>)
    at qpid/framing/AllInvoker.cpp:1100
#20 0x00afa3ab in qpid::framing::ClusterConnectionDeliverDoOutputBody::accept (this=0x83fb3d8, v=@0xbf8363ac)
    at ./qpid/framing/ClusterConnectionDeliverDoOutputBody.h:67
#21 0x00262193 in qpid::framing::invoke<qpid::cluster::Connection> (target=@0x83bcbc8, body=@0x83fb3d8) at ../../src/qpid/framing/Invoker.h:80
#22 0x0025c7fc in qpid::cluster::Connection::deliveredFrame (this=0x83bcbc8, f=@0xbf836890) at ../../src/qpid/cluster/Connection.cpp:250
#23 0x0022fdfa in qpid::cluster::Cluster::processFrame (this=0x83b51b8, e=@0xbf836890, l=@0xbf8368d8) at ../../src/qpid/cluster/Cluster.cpp:522
---Type <return> to continue, or q <return> to quit---
#24 0x0023bc08 in qpid::cluster::Cluster::deliveredFrame (this=0x83b51b8, efConst=@0x83bd2a0) at ../../src/qpid/cluster/Cluster.cpp:506
#25 0x0023e674 in boost::_mfi::mf1<void, qpid::cluster::Cluster, qpid::cluster::EventFrame const&>::operator() (a1=<value optimized out>, 
    p=<value optimized out>, this=<value optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:162
#26 operator()<boost::_mfi::mf1<void, qpid::cluster::Cluster, const qpid::cluster::EventFrame&>, boost::_bi::list1<const qpid::cluster::EventFrame&> >
    (a1=<value optimized out>, p=<value optimized out>, this=<value optimized out>) at /usr/include/boost/bind.hpp:292
#27 operator()<qpid::cluster::EventFrame> (a1=<value optimized out>, p=<value optimized out>, this=<value optimized out>)
    at /usr/include/boost/bind/bind_template.hpp:47
#28 boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::cluster::Cluster, qpid::cluster::EventFrame const&>, boost::_bi::list2<boost::_bi::value<qpid::cluster::Cluster*>, boost::arg<1> > >, void, qpid::cluster::EventFrame const&>::invoke (
    a1=<value optimized out>, p=<value optimized out>, this=<value optimized out>) at /usr/include/boost/function/function_template.hpp:152
#29 0x00242773 in boost::function1<void, qpid::cluster::EventFrame const&>::operator() (this=0x83b5580, a0=@0x83bd2a0)
    at /usr/include/boost/function/function_template.hpp:989
#30 0x00247a1a in qpid::cluster::PollableQueue<qpid::cluster::EventFrame>::handleBatch (this=0x83b54e8, 
    values=std::vector of length 1, capacity 8 = {...}) at ../../src/qpid/cluster/PollableQueue.h:59
#31 0x0023e97a in boost::_mfi::mf1<__gnu_cxx::__normal_iterator<qpid::cluster::EventFrame const*, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > >, qpid::cluster::PollableQueue<qpid::cluster::EventFrame>, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > const&>::operator() (a1=<value optimized out>, p=<value optimized out>, this=<value optimized out>)
    at /usr/include/boost/bind/mem_fn_template.hpp:162
#32 operator()<__gnu_cxx::__normal_iterator<const qpid::cluster::EventFrame*, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > >, boost::_mfi::mf1<__gnu_cxx::__normal_iterator<const qpid::cluster::EventFrame*, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > >, qpid::cluster::PollableQueue<qpid::cluster::EventFrame>, const std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> >&>, boost::_bi::list1<const std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> >&> > (
    a1=<value optimized out>, p=<value optimized out>, this=<value optimized out>) at /usr/include/boost/bind.hpp:282
#33 operator()<std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > > (a1=<value optimized out>, 
    p=<value optimized out>, this=<value optimized out>) at /usr/include/boost/bind/bind_template.hpp:47
#34 boost::detail::function::function_obj_invoker1<boost::_bi::bind_t<__gnu_cxx::__normal_iterator<qpid::cluster::EventFrame const*, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > >, boost::_mfi::mf1<__gnu_cxx::__normal_iterator<qpid::cluster::EventFrame const*, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > >, qpid::cluster::PollableQueue<qpid::cluster::EventFrame>, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > const&>, boost::_bi::list2<boost::_bi::value<qpid::cluster::PollableQueue<qpid::cluster::EventFrame>*>, boost::arg<1> > >, __gnu_cxx::__normal_iterator<qpid::cluster::EventFrame const*, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > >, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > const&>::invoke (
    a1=<value optimized out>, p=<value optimized out>, this=<value optimized out>) at /usr/include/boost/function/function_template.hpp:131
#35 0x0024550a in boost::function1<__gnu_cxx::__normal_iterator<qpid::cluster::EventFrame const*, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > >, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > const&>::operator() (
    this=0x83b5530, a0=std::vector of length 1, capacity 8 = {...}) at /usr/include/boost/function/function_template.hpp:989
#36 0x00246c08 in qpid::sys::PollableQueue<qpid::cluster::EventFrame>::process (this=0x83b54e8) at ../../src/qpid/sys/PollableQueue.h:151
#37 0x00247619 in qpid::sys::PollableQueue<qpid::cluster::EventFrame>::dispatch (this=0x83b54e8, cond=@0x83b5540)
    at ../../src/qpid/sys/PollableQueue.h:137
#38 0x0023e9c4 in boost::_mfi::mf1<void, qpid::sys::PollableQueue<qpid::cluster::EventFrame>, qpid::sys::PollableCondition&>::operator() (
    a1=<value optimized out>, p=<value optimized out>, this=<value optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:162
#39 operator()<boost::_mfi::mf1<void, qpid::sys::PollableQueue<qpid::cluster::EventFrame>, qpid::sys::PollableCondition&>, boost::_bi::list1<qpid::sys:---Type <return> to continue, or q <return> to quit---

Comment 8 Gordon Sim 2010-06-09 09:02:54 UTC
The test case in description still reproduces the problem easily. As pointed out in comment #2 the current cause of the crash is due to using the session in the bridge after an exception has occurred.

Another example of a similar crash is to create a queue route, then delete the source queue, then remove the original queue route. E.g. for two brokers on localhost using ports 5672 and 5673:

qpid-config add queue test-queue
qpid-route queue add localhost:5673 localhost:5672 amq.fanout test-queue
qpid-config -a localhost:5673 add queue test-queue
qpid-config -a localhost:5673 bind amq.fanout test-queue
echo msg | sender --send-eos 1
receiver --port 5673
qpid-config del queue test-queue --force
qpid-route queue del localhost:5673 localhost:5672 amq.fanout test-queue

Comment 9 Gordon Sim 2010-06-09 09:23:50 UTC
(Just for reference the issue mentioned comment #1 was addressed in r790698 and with that in place the cluster based reproducer in the descriptions 'additional info' section no longer fails)

Comment 10 Gordon Sim 2010-06-09 12:19:18 UTC
Fixed on trunk (r952942) and in release branch (http://mrg1.lab.bos.redhat.com/git/?p=qpid.git;a=commitdiff;h=d6ead34fe2802092c0dd6490df11a6cc763506c1).

Comment 12 Jaromir Hradilek 2010-10-07 17:33:49 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, attempting to propagate binding information over a dynamic link that was concurrently destroyed may have caused the broker to terminate unexpectedly. This update ensures that dynamic bridges are not propagated over destroyed links, and the broker no longer crashes.

Comment 13 Frantisek Reznicek 2010-10-07 18:00:13 UTC
The issues have been fixed:
 - description issue A
   - proved by failover_soak (bug 509970) and by semi-automated qpid_stress_test
 - comment 8 issue B
   - reproduced on qpid-cpp-*-0.7.939184-1.el5 using above repro.
 both verified on RHEL 4.8 / 5.5 i386 / x86_64 on packages:
python-qmf-0.7.946106-13.el5
python-qpid-0.7.946106-14.el5
qmf-*0.7.946106-17.el5
qpid-cpp-*-0.7.946106-17.el5
qpid-dotnet-0.4.738274-2.el5
qpid-java-client-0.7.946106-10.el5
qpid-java-common-0.7.946106-10.el5
qpid-tools-0.7.946106-11.el5
ruby-qmf-0.7.946106-17.el5
ruby-qpid-0.7.946106-2.el5

-> VERIFIED

Comment 15 errata-xmlrpc 2010-10-14 15:59:55 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html


Note You need to log in before you can comment on or make changes to this bug.