Description of problem: Federation routes that annotate a message with a trace id can cause segfaults if that message is concurrently delivered from another queue. Version-Release number of selected component (if applicable): qpidd-0.5.752581-34.el5 How reproducible: Readily Steps to Reproduce: 1. start two brokers (I used ports 5672 and 5673 in this example) 2. create several queues and bind them to an exchange with a given key (I only managed to trigger this for durable queues, though the segfault was not actually related to durability, may just be the timing). E.g. for q in `seq 1 10`; do qpid-config add queue queue-$q --durable; qpid-config bind amq.topic queue-$q my-key; done 3. create a federation route from the broker those queues are on to the other broker, using the same exchange and key. E.g. qpid-route route add localhost:5673 localhost:5672 amq.topic my-key 4. start receivers for all these queues and then send some durable messages to the exchange with the appropriate routing key Actual results: Segfault Expected results: No segfault Additional info: Core was generated by `/usr/sbin/qpidd'. Program terminated with signal 11, Segmentation fault. #0 0x0000003d067ae09b in qpid::framing::FieldTable::encodedSize (this=0x2aaab0186810) at /usr/include/c++/4.1.2/memory:286 warning: Source file is more recent than executable. 286 return _M_ptr; (gdb) bt #0 0x0000003d067ae09b in qpid::framing::FieldTable::encodedSize (this=0x2aaab0186810) at /usr/include/c++/4.1.2/memory:286 #1 0x0000003d066f5b51 in qpid::framing::MessageProperties::bodySize (this=0x2aaab018d518) at gen/qpid/framing/MessageProperties.cpp:191 #2 0x0000003d066f5bb9 in qpid::framing::MessageProperties::encode (this=0x2aaab0186810, buffer=...) at gen/qpid/framing/MessageProperties.cpp:135 #3 0x0000003d067a271e in encode (this=<value optimized out>, buffer=...) at qpid/framing/AMQHeaderBody.h:50 #4 qpid::framing::AMQHeaderBody::encode (this=<value optimized out>, buffer=...) at qpid/framing/AMQHeaderBody.cpp:30 #5 0x0000003d070c551c in qpid::amqp_0_10::Connection::encode (this=0x2aaaac048070, buffer=0x2aaaac79f900 "\v\001", size=<value optimized out>) at qpid/amqp_0_10/Connection.cpp:87 #6 0x0000003d067d4164 in qpid::sys::cyrus::CyrusSecurityLayer::encode (this=0x2aaaac78c000, buffer=0x2aaaac75bb30 "", size=65536) at qpid/sys/cyrus/CyrusSecurityLayer.cpp:76 #7 0x0000003d067c64b1 in qpid::sys::AsynchIOHandler::idle (this=0x2aaaac047fe0) at qpid/sys/AsynchIOHandler.cpp:206 #8 0x0000003d06772fca in boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> >::operator() (this=0x0, a0=...) at /usr/include/boost/function/function_template.hpp:576 #9 0x0000003d0677171f in qpid::sys::posix::AsynchIO::writeable (this=0x2aaaac048530, h=...) at qpid/sys/posix/AsynchIO.cpp:562 #10 0x0000003d067cc8c7 in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() (this=0x0, a0=...) at /usr/include/boost/function/function_template.hpp:576 #11 0x0000003d067ca5b9 in qpid::sys::DispatchHandle::processEvent (this=0x2aaaac048538, type=WRITABLE) at qpid/sys/DispatchHandle.cpp:439 #12 0x0000003d06780b93 in process (this=0x1b809da0) at qpid/sys/Poller.h:122 #13 qpid::sys::Poller::run (this=0x1b809da0) at qpid/sys/epoll/EpollPoller.cpp:409 #14 0x0000003d06776bca in qpid::sys::(anonymous namespace)::runRunnable (p=0x2aaab0186810) at qpid/sys/posix/Thread.cpp:35 #15 0x0000003d05a0673d in start_thread () from /lib64/libpthread.so.0 #16 0x0000003d04ed3d1d in clone () from /lib64/libc.so.6
Fixed on trunk (r954933) and in release repo (http://mrg1.lab.bos.redhat.com/git/?p=qpid.git;a=commitdiff;h=a1cdf640e11415c3376c3e420d40113d2bcc723a).
Created attachment 424245 [details] Backported fix for 1.2 tree
Fyi: I could only reproduce easily on an 8 core box.
Tested: on 752581 problem show but it important to use 8core machine and send big batches of messages, send 1by1 does not help on 946106-11 it is fixed validated on RHEL5.5/RHEL4 i386 / x86_64 packages: # rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u openais-0.80.6-16.el5_5.2 openais-devel-0.80.6-16.el5_5.2 python-qpid-0.7.946106-11.el5 qpid-cpp-client-0.7.946106-11.el5 qpid-cpp-client-devel-0.7.946106-11.el5 qpid-cpp-client-devel-docs-0.7.946106-11.el5 qpid-cpp-client-ssl-0.7.946106-11.el5 qpid-cpp-mrg-debuginfo-0.7.946106-8.el5 qpid-cpp-server-0.7.946106-11.el5 qpid-cpp-server-cluster-0.7.946106-11.el5 qpid-cpp-server-devel-0.7.946106-11.el5 qpid-cpp-server-ssl-0.7.946106-11.el5 qpid-cpp-server-store-0.7.946106-11.el5 qpid-cpp-server-xml-0.7.946106-11.el5 qpid-java-client-0.7.946106-7.el5 qpid-java-common-0.7.946106-7.el5 qpid-tools-0.7.946106-8.el5 rhm-docs-0.7.946106-4.el5 rh-tests-distribution-MRG-Messaging-qpid_common-1.6-52 ->VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, federation routes that annotate a message with a trace ID could cause a segmentation fault if that message was concurrently delivered from another queue. With this update, a segmentation fault no longer occur in the aforementioned case.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -Previously, federation routes that annotate a message with a trace ID could cause a segmentation fault if that message was concurrently delivered from another queue. With this update, a segmentation fault no longer occur in the aforementioned case.+Previously, federation routes that annotated a message with a trace ID could have caused a segmentation fault if that message was concurrently delivered from another queue. This situation has been fixed so that it no longer causes a segmentation fault if it arises.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html