Description of problem: When broker is deleting a queue while performing a QMF queue method (i.e. reroute, but I guess any can be used), there is a race condition the actions interfere each other causing segfault. One possible backtrace: Version-Release number of selected component (if applicable): qpid-cpp-server 0.34-10 How reproducible: 100% in 30 minutes Steps to Reproduce: run script: for i in $(seq 1 100); do while true; do /root/Cpp_MRG/QMF_examples/create_queue q_${i} > /dev/null /root/Cpp_MRG/QMF_examples/reroute_queue q_${i} 0 amq.fanout > /dev/null & /root/Cpp_MRG/QMF_examples/delete_queue q_${i} > /dev/null & wait sleep 0.1 done & done wait programs [create|reroute|delete]_queue to be attached. Actual results: segfault within several minutes Expected results: no segfault Additional info: one possible backtrace: (gdb) bt #0 0x000000326110a1c4 in qmf::org::apache::qpid::broker::Queue::doMethod (this=0x7ff23ba39530, methodName=<value optimized out>, inMap=std::map with 3 elements = {...}, outMap= std::map with 0 elements, userId="anonymous") at /usr/src/debug/qpid-cpp-0.34/src/qmf/org/apache/qpid/broker/Queue.cpp:1161 #1 0x0000003261313d06 in qpid::management::ManagementAgent::handleMethodRequest (this=0x119f9f0, body=<value optimized out>, rte="", rtk= "6e080197-b362-4bf7-840f-9802e9f745dd#QueueRerouterResponseQueue", cid="", userId="anonymous", viaLocal=true) at /usr/src/debug/qpid-cpp-0.34/src/qpid/management/ManagementAgent.cpp:1447 #2 0x0000003261320fd5 in qpid::management::ManagementAgent::dispatchAgentCommand (this=0x119f9f0, msg=..., viaLocal=true) at /usr/src/debug/qpid-cpp-0.34/src/qpid/management/ManagementAgent.cpp:2347 #3 0x00000032613218b8 in qpid::management::ManagementAgent::dispatchCommand (this=0x119f9f0, deliverable=<value optimized out>, routingKey="broker", topic=false, qmfVersion=2) at /usr/src/debug/qpid-cpp-0.34/src/qpid/management/ManagementAgent.cpp:1255 #4 0x0000003261334179 in qpid::broker::ManagementDirectExchange::route (this=0x11a06c0, msg=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/management/ManagementDirectExchange.cpp:48 #5 0x00000032612cdcf4 in qpid::broker::SemanticState::route (this=0x7ff260832428, msg=..., strategy=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/SemanticState.cpp:506 #6 0x00000032612eafda in qpid::broker::SessionState::handleContent (this=0x7ff260832260, frame=<value optimized out>) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/SessionState.cpp:233 #7 0x00000032612eb5a1 in qpid::broker::SessionState::handleIn (this=0x7ff260832260, frame=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/SessionState.cpp:293 #8 0x0000003260790af1 in qpid::amqp_0_10::SessionHandler::handleIn (this=0x7ff1b30e30a0, f=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/amqp_0_10/SessionHandler.cpp:93 #9 0x000000326125bc9b in operator() (this=0x7ff271c410d0, frame=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/framing/Handler.h:39 #10 qpid::broker::ConnectionHandler::handle (this=0x7ff271c410d0, frame=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/ConnectionHandler.cpp:93 #11 0x0000003261256a58 in qpid::broker::amqp_0_10::Connection::received (this=0x7ff271c40ef0, frame=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/amqp_0_10/Connection.cpp:198 #12 0x00000032611e4563 in qpid::amqp_0_10::Connection::decode (this=0x7ff271c3fcd0, buffer=<value optimized out>, size=<value optimized out>) at /usr/src/debug/qpid-cpp-0.34/src/qpid/amqp_0_10/Connection.cpp:59 #13 0x00000032607b9ce0 in qpid::sys::AsynchIOHandler::readbuff (this=0x7ff1d160a790, buff=0x7ff271c3f370) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/AsynchIOHandler.cpp:138 #14 0x0000003260737ea9 in operator() (this=0x7ff271c40110, h=...) at /usr/include/boost/function/function_template.hpp:1013 #15 qpid::sys::posix::AsynchIO::readable (this=0x7ff271c40110, h=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/posix/AsynchIO.cpp:453 #16 0x00000032607be5f3 in boost::function1<void, qpid::sys::DispatchHandle&>::operator() (this=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/function/function_template.hpp:1013 #17 0x00000032607bcf52 in qpid::sys::DispatchHandle::processEvent (this=0x7ff271c40118, type=qpid::sys::Poller::READ_WRITABLE) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/DispatchHandle.cpp:286 #18 0x000000326075de5d in process (this=0x118f9a0) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/Poller.h:131 #19 qpid::sys::Poller::run (this=0x118f9a0) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/epoll/EpollPoller.cpp:522 #20 0x000000326075213a in qpid::sys::(anonymous namespace)::runRunnable (p=<value optimized out>) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/posix/Thread.cpp:35 #21 0x0000003af8807a51 in start_thread () from /lib64/libpthread.so.0 #22 0x0000003af84e896d in clone () from /lib64/libc.so.6 (gdb) (here broker deleted Queue object but coreObject = mgmtObject of the Queue was attempted to be accessed) another segfault: (gdb) bt #0 0x0000003f6f033d64 in abort () from /lib64/libc.so.6 #1 0x0000003f710bea7d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6 #2 0x0000003f710bcbd6 in ?? () from /usr/lib64/libstdc++.so.6 #3 0x0000003f710bcc03 in std::terminate() () from /usr/lib64/libstdc++.so.6 #4 0x0000003f710bd55f in __cxa_pure_virtual () from /usr/lib64/libstdc++.so.6 #5 0x00007f3c16971d99 in qpid::broker::Queue::remove (this=0x164a8a0, maxCount=0, p=..., f=..., type=qpid::broker::CONSUMER, triggerAutoDelete=false, maxTests=0) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/Queue.cpp:750 #6 0x00007f3c169731f1 in qpid::broker::Queue::purge (this=0x164a8a0, qty=0, dest=..., filter=<value optimized out>) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/Queue.cpp:796 #7 0x00007f3c16973f1e in qpid::broker::Queue::ManagementMethod (this=0x164a8a0, methodId=<value optimized out>, args=..., etext=<value optimized out>) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/Queue.cpp:1471 #8 0x00007f3c168491f3 in qmf::org::apache::qpid::broker::Queue::doMethod (this=0x18f3140, methodName=<value optimized out>, inMap=std::map with 3 elements = {...}, outMap= std::map with 0 elements, userId="anonymous@QPID") at /usr/src/debug/qpid-cpp-0.34/src/qmf/org/apache/qpid/broker/Queue.cpp:1163 #9 0x00007f3c16a52d06 in qpid::management::ManagementAgent::handleMethodRequest (this=0x14fcb90, body=<value optimized out>, rte="", rtk= "7d78a4cf-f2fa-4bcf-858e-5662372ce32a#reply-queue", cid="", userId="anonymous@QPID", viaLocal=true) at /usr/src/debug/qpid-cpp-0.34/src/qpid/management/ManagementAgent.cpp:1447 #10 0x00007f3c16a5ffd5 in qpid::management::ManagementAgent::dispatchAgentCommand (this=0x14fcb90, msg=..., viaLocal=true) at /usr/src/debug/qpid-cpp-0.34/src/qpid/management/ManagementAgent.cpp:2347 #11 0x00007f3c16a608b8 in qpid::management::ManagementAgent::dispatchCommand (this=0x14fcb90, deliverable=<value optimized out>, routingKey="broker", topic=false, qmfVersion=2) at /usr/src/debug/qpid-cpp-0.34/src/qpid/management/ManagementAgent.cpp:1255 #12 0x00007f3c16a73179 in qpid::broker::ManagementDirectExchange::route (this=0x1511340, msg=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/management/ManagementDirectExchange.cpp:48 #13 0x00007f3c16a0ccf4 in qpid::broker::SemanticState::route (this=0x7f3bfc4d8288, msg=..., strategy=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/SemanticState.cpp:506 #14 0x00007f3c16a29fda in qpid::broker::SessionState::handleContent (this=0x7f3bfc4d80c0, frame=<value optimized out>) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/SessionState.cpp:233 #15 0x00007f3c16a2a5a1 in qpid::broker::SessionState::handleIn (this=0x7f3bfc4d80c0, frame=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/SessionState.cpp:293 #16 0x00007f3c16446af1 in qpid::amqp_0_10::SessionHandler::handleIn (this=0x7f3bfc302e00, f=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/amqp_0_10/SessionHandler.cpp:93 #17 0x00007f3c1699ac9b in operator() (this=0x7f3bfc27b0f0, frame=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/framing/Handler.h:39 #18 qpid::broker::ConnectionHandler::handle (this=0x7f3bfc27b0f0, frame=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/ConnectionHandler.cpp:93 #19 0x00007f3c16995a58 in qpid::broker::amqp_0_10::Connection::received (this=0x7f3bfc27af10, frame=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/broker/amqp_0_10/Connection.cpp:198 #20 0x00007f3c16923563 in qpid::amqp_0_10::Connection::decode (this=0x7f3bfc4f69e0, buffer=<value optimized out>, size=<value optimized out>) at /usr/src/debug/qpid-cpp-0.34/src/qpid/amqp_0_10/Connection.cpp:59 #21 0x00007f3c1646fce0 in qpid::sys::AsynchIOHandler::readbuff (this=0x7f3bfc531660, buff=0x7f3bfc79e7a0) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/AsynchIOHandler.cpp:138 #22 0x00007f3c163edea9 in operator() (this=0x7f3bfc079690, h=...) at /usr/include/boost/function/function_template.hpp:1013 #23 qpid::sys::posix::AsynchIO::readable (this=0x7f3bfc079690, h=...) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/posix/AsynchIO.cpp:453 #24 0x00007f3c164745f3 in boost::function1<void, qpid::sys::DispatchHandle&>::operator() (this=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/function/function_template.hpp:1013 #25 0x00007f3c16473286 in qpid::sys::DispatchHandle::processEvent (this=0x7f3bfc079698, type=qpid::sys::Poller::READABLE) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/DispatchHandle.cpp:280 #26 0x00007f3c16413e5d in process (this=0x14fb7c0) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/Poller.h:131 #27 qpid::sys::Poller::run (this=0x14fb7c0) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/epoll/EpollPoller.cpp:522 #28 0x00007f3c1640813a in qpid::sys::(anonymous namespace)::runRunnable (p=<value optimized out>) at /usr/src/debug/qpid-cpp-0.34/src/qpid/sys/posix/Thread.cpp:35 #29 0x0000003f6f407a51 in start_thread () from /lib64/libpthread.so.0 #30 0x0000003f6f0e896d in clone () from /lib64/libc.so.6 (gdb) (here, from the other side, Queue class being deleted while accessed from QMF method execution)
Created attachment 1154210 [details] create_queue.cpp
Created attachment 1154211 [details] delete_queue.cpp
Created attachment 1154212 [details] reroute_queue.cpp
The following commits are possible fixes but unconfirmed as yet. On branch http://git.app.eng.bos.redhat.com/git/rh-qpid.git/log/?h=0.34-mrg-meteorcomm-aconway 664350d Bug 1333437 - Conditional compile mismatch in broker and common libs. f7f133f Bug 1333767 - Use shared_ptr for Link::destroy instead of weak_ptr c322c49 Bug 1333767 - Memory management error in Link/Bridge commit 664350ddc7c074dd96a5ffd021b1fdf1046c1485 Author: Alan Conway <aconway> Commit: Alan Conway <aconway> Bug 1333437 - Conditional compile mismatch in broker and common libs. Removed _IN_QPID_BROKER compile definition: - inconsistently set for libqpidcommon (compiled .cpp files) and libqpidbroker (uses .h) files - The broker has a binary incompatible notion of what is in the library. - It sort-of works by accident: - shared_ptr<T> only contains a T*, the mismatch is effectively doing reinterpret_cast<T*> - plain T* works for op->, but doesn't guarantee no concurrent deletes. - we create and destroy shared_ptr in libraries with _IN_QPID_BROKER set so we get cleanup, and no cores if we're lucky but there is limited protection from races. - only used by management: - I think exposing non-shared ptr GetObject was a feature that never materialized, - if not we need a separate function or class for non-shared-ptr users. commit f7f133f6651a3938654e7e8d14c511c16015f8b5 Author: Alan Conway <aconway> Commit: Alan Conway <aconway> Bug 1333767 - Use shared_ptr for Link::destroy instead of weak_ptr The Bridge and Link::ioCallback functions can safely be dropped if it is too late but Link::destroy needs to be called, so use a shared_ptr for that. commit c322c49f97f489b6a32d69790bb7d34dcfb6639e Author: Alan Conway <aconway> Commit: Alan Conway <aconway> Bug 1333767 - Memory management error in Link/Bridge Fixed a memory bug consistent with causing the stack in https://bugzilla.redhat.com/show_bug.cgi?id=1333767 and several other customer and internally-generated stack traces. The fault is hard to reproduce so the fix is not definitely confirmed. qpid::broker Link and Bridge use Connection::requestIOProcessing() to register callbacks in the connection thread. They were binding a plain "this" pointer to the callback, but the classes are managed by boost::shared_ptr so if all the shared_ptr were released, the callback could happen on a dangling pointer. This fix uses boost::weak_ptr in the callbacks, so if all shared_ptr instances are released, we don't use the dead pointer. [aconway@wallace rh-qpid (0.34-mrg-meteorcomm-aconway=)]$ git log -3 --pretty=short commit 664350ddc7c074dd96a5ffd021b1fdf1046c1485 Author: Alan Conway <aconway> Bug 1333437 - Conditional compile mismatch in broker and common libs. commit f7f133f6651a3938654e7e8d14c511c16015f8b5 Author: Alan Conway <aconway> Bug 1333767 - Use shared_ptr for Link::destroy instead of weak_ptr commit c322c49f97f489b6a32d69790bb7d34dcfb6639e Author: Alan Conway <aconway> Bug 1333767 - Memory management error in Link/Bridge [aconway@wallace rh-qpid (0.34-mrg-meteorcomm-aconway=)]$ git log -3 --pretty=medium commit 664350ddc7c074dd96a5ffd021b1fdf1046c1485 Author: Alan Conway <aconway> Date: Wed May 11 10:18:32 2016 -0400 Bug 1333437 - Conditional compile mismatch in broker and common libs. Removed _IN_QPID_BROKER compile definition: - inconsistently set for libqpidcommon (compiled .cpp files) and libqpidbroker (uses .h) files - The broker has a binary incompatible notion of what is in the library. - It sort-of works by accident: - shared_ptr<T> only contains a T*, the mismatch is effectively doing reinterpret_cast<T*> - plain T* works for op->, but doesn't guarantee no concurrent deletes. - we create and destroy shared_ptr in libraries with _IN_QPID_BROKER set so we get cleanup, and no cores if we're lucky but there is limited protection from races. - only used by management: - I think exposing non-shared ptr GetObject was a feature that never materialized, - if not we need a separate function or class for non-shared-ptr users. commit f7f133f6651a3938654e7e8d14c511c16015f8b5 Author: Alan Conway <aconway> Date: Wed May 11 10:30:31 2016 -0400 Bug 1333767 - Use shared_ptr for Link::destroy instead of weak_ptr The Bridge and Link::ioCallback functions can safely be dropped if it is too late but Link::destroy needs to be called, so use a shared_ptr for that. commit c322c49f97f489b6a32d69790bb7d34dcfb6639e Author: Alan Conway <aconway> Date: Tue May 10 17:33:49 2016 -0400 Bug 1333767 - Memory management error in Link/Bridge Fixed a memory bug consistent with causing the stack in https://bugzilla.redhat.com/show_bug.cgi?id=1333767 and several other customer and internally-generated stack traces. The fault is hard to reproduce so the fix is not definitely confirmed. qpid::broker Link and Bridge use Connection::requestIOProcessing() to register callbacks in the connection thread. They were binding a plain "this" pointer to the callback, but the classes are managed by boost::shared_ptr so if all the shared_ptr were released, the callback could happen on a dangling pointer. This fix uses boost::weak_ptr in the callbacks, so if all shared_ptr instances are released, we don't use the dead pointer.
Created attachment 1156315 [details] Faster reproducer Fast reproducer, generates similar cores to the ones reported above in < 1 minute. Compile with: g++ -std=c++11 -g -L/usr/local/lib64 -lqpidmessaging -lqpidtypes -lpthread stress.cpp -o stress` Start a broker and run. Runs one thread in a create/delete loop and 4 threads doing management methods concurrently. Threads don't lose time reconnecting each time around so they keep the broker in the dangerous code more continuously. This confirms that the previous fix is not complete but I think I know what remains to be done (unsafe back-pointer ManagementObject::coreObject)
Added a fix to http://git.app.eng.bos.redhat.com/git/rh-qpid.git/log/?h=0.34-mrg-meteorcomm-aconway With this fix the stress.cpp reproducer is not crashing, and the output shows "resource-deleted" exceptions where previously we would have crashed. Need further confirmation but I think this may be fixed. commit 0cca6c5fcc5944c8af26ef1a843d1db7bd0e96da Author: Alan Conway <aconway> Date: Wed May 11 18:24:07 2016 -0400 Bug 1333767 - Locking around Manageable* coreObject in ManagedObject. ManagedObject has a back-pointer to the Manageable object it belongs to. The Manageable calls ManagedObject::resourceDestroy() when it is deleted, but there were no locks so this was thread-unsafe and caused dangling pointers. Added a locked wrapper class around coreObject so destroy is atomic with respect to calls on coreObject.
Created attachment 1156734 [details] Fixed bug in fast reproducer The reproducer was not accepting messages from the response queue so eventually it timed out with "queue to long" messages on the broker. Fixed it so it should run indefinitely.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2049.html