Description of problem: Deadlock in broker. Version-Release number of selected component (if applicable): qpid-cpp-server-0.7.935473-1.el5 How reproducible: Not sure Steps to Reproduce: 1. run durable perftest with severl pubs/subs Actual results: Broker hangs: Thread 5 (Thread 0x410c9940 (LWP 19752)): #0 0x00000038fa40af70 in pthread_cond_timedwait@@GLIBC_2.3.2 () #1 0x00000039d24074d8 in qpid::sys::Timer::run () #2 0x00000039d23201ca in ?? () from /usr/lib64/libqpidcommon.so.2 #3 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0 #4 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x4200b940 (LWP 19753)): #0 0x00000038fa40d2e4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000038fa408c3a in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x00000038fa408afc in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002b85ee95b65a in mrg::journal::slock::slock () #4 0x00002b85ee96742c in mrg::journal::jcntl::flush () #5 0x00002b85ee918b0b in mrg::msgstore::JournalImpl::flush () #6 0x00002b85ee918dc6 in mrg::msgstore::JournalImpl::flushFire () #7 0x00002b85ee918e4c in mrg::msgstore::InactivityFireEvent::fire () #8 0x00000039d2407123 in qpid::sys::Timer::run () #9 0x00000039d23201ca in ?? () from /usr/lib64/libqpidcommon.so.2 #10 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0 #11 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x42a0c940 (LWP 19754)): #0 0x00000038fa40d2e4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000038fa40b16b in pthread_cond_signal@@GLIBC_2.3.2 () #2 0x00000039d29581fd in qpid::broker::IncompleteMessageList::enqueueComplete #3 0x00000039d29596df in boost::function1<void, boost::intrusive_ptr<qpid::broker::Message> const&, std::allocator<void> >::operator() () #4 0x00000039d2975425 in qpid::broker::Message::allEnqueuesComplete () #5 0x00000039d297c339 in qpid::broker::PersistableMessage::enqueueComplete () #6 0x00002b85ee916d86 in mrg::msgstore::JournalImpl::wr_aio_cb () #7 0x00002b85ee9894a7 in mrg::journal::wmgr::get_events () #8 0x00002b85ee98bd7f in mrg::journal::wmgr::write_flush () #9 0x00002b85ee98c14d in mrg::journal::wmgr::flush_check () #10 0x00002b85ee98d0bc in mrg::journal::wmgr::dequeue () #11 0x00002b85ee968064 in mrg::journal::jcntl::dequeue_data_record () #12 0x00002b85ee91c973 in mrg::msgstore::JournalImpl::dequeue_data_record () #13 0x00002b85ee9277ea in mrg::msgstore::MessageStoreImpl::async_dequeue () #14 0x00002b85ee93761f in mrg::msgstore::MessageStoreImpl::dequeue () #15 0x00000039d29794eb in qpid::broker::MessageStoreModule::dequeue () #16 0x00000039d2983db7 in qpid::broker::Queue::dequeue () #17 0x00000039d29b2674 in qpid::broker::SemanticState::ConsumerImpl::deliver () #18 0x00000039d2985eac in qpid::broker::Queue::dispatch () #19 0x00000039d29b29c4 in qpid::broker::SemanticState::ConsumerImpl::doOutput #20 0x00000039d23f9a87 in qpid::sys::AggregateOutput::doOutput () #21 0x00000039d29201e1 in qpid::broker::Connection::doOutput () #22 0x00000039d28fe616 in qpid::amqp_0_10::Connection::canEncode () #23 0x00000039d29b0bda in qpid::broker::SecureConnection::canEncode () #24 0x00000039d23fd7a5 in qpid::sys::AsynchIOHandler::idle () #25 0x00000039d231e14f in boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> >::operator() () from /usr/lib64/libqpidcommon.so.2 #26 0x00000039d231bb2f in qpid::sys::posix::AsynchIO::writeable () #27 0x00000039d2404e97 in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() () #28 0x00000039d23ffde7 in qpid::sys::DispatchHandle::processEvent () #29 0x00000039d2329fff in qpid::sys::Poller::run () #30 0x00000039d23201ca in ?? () from /usr/lib64/libqpidcommon.so.2 #31 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0 #32 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x4480f940 (LWP 19757)): #0 0x00000038fa40d2e4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000038fa408c3a in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x00000038fa408afc in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002b85ee95b65a in mrg::journal::slock::slock () #4 0x00002b85ee968052 in mrg::journal::jcntl::dequeue_data_record () #5 0x00002b85ee91c973 in mrg::msgstore::JournalImpl::dequeue_data_record () #6 0x00002b85ee9277ea in mrg::msgstore::MessageStoreImpl::async_dequeue () #7 0x00002b85ee93761f in mrg::msgstore::MessageStoreImpl::dequeue () #8 0x00000039d29794eb in qpid::broker::MessageStoreModule::dequeue () #9 0x00000039d2983db7 in qpid::broker::Queue::dequeue () #10 0x00000039d298418c in qpid::broker::Queue::popAndDequeue () #11 0x00000039d29846ba in qpid::broker::Queue::purge () #12 0x00000039d29c884f in qpid::broker::SessionAdapter::QueueHandlerImpl::purge #13 0x00000039d23467b2 in qpid::framing::AMQP_ServerOperations::QueueHandler::Invoker::visit () from /usr/lib64/libqpidcommon.so.2 #14 0x00000039d234cf17 in qpid::framing::AMQP_ServerOperations::Invoker::visit #15 0x00000039d29e16e3 in qpid::framing::invoke<qpid::broker::SessionAdapter> #16 0x00000039d29dff7d in qpid::broker::SessionState::handleCommand () #17 0x00000039d29e0ef9 in qpid::broker::SessionState::handleIn () #18 0x00000039d23b9fa9 in qpid::amqp_0_10::SessionHandler::handleIn () #19 0x00000039d291dee2 in qpid::broker::Connection::received () #20 0x00000039d28fdd84 in qpid::amqp_0_10::Connection::decode () #21 0x00000039d23fde11 in qpid::sys::AsynchIOHandler::readbuff () #22 0x00000039d231de4f in boost::function2<void, qpid::sys::AsynchIO&, qpid::sys::AsynchIOBufferBase*, std::allocator<boost::function_base> >::operator() () #23 0x00000039d231c3b3 in qpid::sys::posix::AsynchIO::readable () #24 0x00000039d2404e97 in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() () #25 0x00000039d23ffd2f in qpid::sys::DispatchHandle::processEvent () #26 0x00000039d2329fff in qpid::sys::Poller::run () #27 0x00000039d23201ca in ?? () from /usr/lib64/libqpidcommon.so.2 #28 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0 #29 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2b85ec35f830 (LWP 19751)): #0 0x00000038fa407955 in pthread_join () from /lib64/libpthread.so.0 #1 0x00000039d23203cd in qpid::sys::Thread::join () #2 0x00000039d290a311 in qpid::broker::Broker::run () #3 0x0000000000406ae6 in ?? () #4 0x00000000004055af in __cxa_pure_virtual () #5 0x00000038f9c1d994 in __libc_start_main () from /lib64/libc.so.6 #6 0x0000000000405019 in __cxa_pure_virtual () #7 0x00007fffabd2a678 in ?? () #8 0x0000000000000000 in ?? ()
Another occurence, it appears relatively easy (I got it well within 50 loops) to trigger this via: while perftest --qt 2 --nsubs 4 --npubs 2 --sub-ack 10 --durable true --count 50000 --summary; do true; done Thread 7 (Thread 0x40d70940 (LWP 6789)): #0 0x00000038fa40af70 in pthread_cond_timedwait@@GLIBC_2.3.2 () #1 0x00000039d24074d8 in qpid::sys::Timer::run () #2 0x00000039d23201ca in ?? () from /usr/lib64/libqpidcommon.so.2 #3 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0 #4 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x41771940 (LWP 6790)): #0 0x00000038fa40d2e4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000038fa408c3a in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x00000038fa408afc in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002b761173865a in mrg::journal::slock::slock () #4 0x00002b761174442c in mrg::journal::jcntl::flush () #5 0x00002b76116f5b0b in mrg::msgstore::JournalImpl::flush () #6 0x00002b76116f5dc6 in mrg::msgstore::JournalImpl::flushFire () #7 0x00002b76116f5e4c in mrg::msgstore::InactivityFireEvent::fire () #8 0x00000039d2407123 in qpid::sys::Timer::run () #9 0x00000039d23201ca in ?? () from /usr/lib64/libqpidcommon.so.2 #10 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0 #11 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x429a5940 (LWP 6791)): #0 0x00000038f9cd4018 in epoll_wait () from /lib64/libc.so.6 #1 0x00000039d23295af in qpid::sys::Poller::wait () #2 0x00000039d2329fd2 in qpid::sys::Poller::run () #3 0x00000039d23201ca in ?? () from /usr/lib64/libqpidcommon.so.2 #4 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0 #5 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x433a6940 (LWP 6792)): #0 0x00000038fa40d2e4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000038fa408c3a in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x00000038fa408afc in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002b761173865a in mrg::journal::slock::slock () #4 0x00002b7611745052 in mrg::journal::jcntl::dequeue_data_record () #5 0x00002b76116f9973 in mrg::msgstore::JournalImpl::dequeue_data_record () #6 0x00002b76117047ea in mrg::msgstore::MessageStoreImpl::async_dequeue () #7 0x00002b761171461f in mrg::msgstore::MessageStoreImpl::dequeue () #8 0x00000039d29794eb in qpid::broker::MessageStoreModule::dequeue () #9 0x00000039d2983db7 in qpid::broker::Queue::dequeue () #10 0x00000039d292ed34 in qpid::broker::DeliveryRecord::accept () #11 0x00000039d29c1853 in std::__find_if<std::_Deque_iterator<qpid::broker::DeliveryRecord, qpid::broker::DeliveryRecord&, qpid::broker::DeliveryRecord*>, qpid::broker::IsInSequenceSetAnd<boost::_bi::bind_t<bool, boost::_mfi::mf1<bool, qpid::broker::DeliveryRecord, qpid::broker::TransactionContext*>, boost::_bi::list2<boost::arg<1>, boost::_bi::value<qpid::broker::TransactionContext*> > > > > () #12 0x00000039d29c1ed9 in std::remove_if<std::_Deque_iterator<qpid::broker::DeliveryRecord, qpid::broker::DeliveryRecord&, qpid::broker::DeliveryRecord*>, qpid::broker::IsInSequenceSetAnd<boost::_bi::bind_t<bool, boost::_mfi::mf1<bool, qpid::broker::DeliveryRecord, qpid::broker::TransactionContext*>, boost::_bi::list2<boost::arg<1>, boost::_bi::value<qpid::broker::TransactionContext*> > > > > () #13 0x00000039d29b70e3 in qpid::broker::SemanticState::accepted () #14 0x00000039d23470c2 in qpid::framing::AMQP_ServerOperations::MessageHandler::Invoker::visit () from /usr/lib64/libqpidcommon.so.2 #15 0x00000039d234abd7 in qpid::framing::AMQP_ServerOperations::Invoker::visit #16 0x00000039d29e16e3 in qpid::framing::invoke<qpid::broker::SessionAdapter> #17 0x00000039d29dff7d in qpid::broker::SessionState::handleCommand () #18 0x00000039d29e0ef9 in qpid::broker::SessionState::handleIn () #19 0x00000039d23b9fa9 in qpid::amqp_0_10::SessionHandler::handleIn () #20 0x00000039d291dee2 in qpid::broker::Connection::received () #21 0x00000039d28fdd84 in qpid::amqp_0_10::Connection::decode () #22 0x00000039d23fde11 in qpid::sys::AsynchIOHandler::readbuff () #23 0x00000039d231de4f in boost::function2<void, qpid::sys::AsynchIO&, qpid::sys::AsynchIOBufferBase*, std::allocator<boost::function_base> >::operator() () #24 0x00000039d231c3b3 in qpid::sys::posix::AsynchIO::readable () #25 0x00000039d2404e97 in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() () #26 0x00000039d23ffd2f in qpid::sys::DispatchHandle::processEvent () #27 0x00000039d2329fff in qpid::sys::Poller::run () #28 0x00000039d23201ca in ?? () from /usr/lib64/libqpidcommon.so.2 #29 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0 #30 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x43da7940 (LWP 6793)): #0 0x00000038fa40d2e4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000038fa40b16b in pthread_cond_signal@@GLIBC_2.3.2 () #2 0x00000039d29581fd in qpid::broker::IncompleteMessageList::enqueueComplete #3 0x00000039d29596df in boost::function1<void, boost::intrusive_ptr<qpid::broker::Message> const&, std::allocator<void> >::operator() () #4 0x00000039d2975425 in qpid::broker::Message::allEnqueuesComplete () #5 0x00000039d297c339 in qpid::broker::PersistableMessage::enqueueComplete () #6 0x00002b76116f3d86 in mrg::msgstore::JournalImpl::wr_aio_cb () #7 0x00002b76117664a7 in mrg::journal::wmgr::get_events () #8 0x00002b7611768d7f in mrg::journal::wmgr::write_flush () #9 0x00002b761176914d in mrg::journal::wmgr::flush_check () #10 0x00002b761176a921 in mrg::journal::wmgr::enqueue () #11 0x00002b76117453be in mrg::journal::jcntl::enqueue_data_record () #12 0x00002b76116fa283 in mrg::msgstore::JournalImpl::enqueue_data_record () #13 0x00002b76117051a7 in mrg::msgstore::MessageStoreImpl::store () #14 0x00002b761171301b in mrg::msgstore::MessageStoreImpl::enqueue () #15 0x00000039d29795cb in qpid::broker::MessageStoreModule::enqueue () #16 0x00000039d298860c in qpid::broker::Queue::enqueue () #17 0x00000039d2989dc6 in qpid::broker::Queue::deliver () #18 0x00000039d292c7e3 in qpid::broker::DeliverableMessage::deliverTo () #19 0x00000039d2944065 in qpid::broker::Exchange::doRoute () #20 0x00000039d2930ed3 in qpid::broker::DirectExchange::route () #21 0x00000039d29babf3 in qpid::broker::SemanticState::route () #22 0x00000039d29bbb4d in qpid::broker::SemanticState::handle () #23 0x00000039d29e090e in qpid::broker::SessionState::handleContent () #24 0x00000039d29e0eb0 in qpid::broker::SessionState::handleIn () #25 0x00000039d23b9fa9 in qpid::amqp_0_10::SessionHandler::handleIn () #26 0x00000039d291dee2 in qpid::broker::Connection::received () #27 0x00000039d28fdd84 in qpid::amqp_0_10::Connection::decode () #28 0x00000039d23fde11 in qpid::sys::AsynchIOHandler::readbuff () #29 0x00000039d231de4f in boost::function2<void, qpid::sys::AsynchIO&, qpid::sys::AsynchIOBufferBase*, std::allocator<boost::function_base> >::operator() () #30 0x00000039d231c3b3 in qpid::sys::posix::AsynchIO::readable () #31 0x00000039d2404e97 in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() () #32 0x00000039d23ffd2f in qpid::sys::DispatchHandle::processEvent () #33 0x00000039d2329fff in qpid::sys::Poller::run () #34 0x00000039d23201ca in ?? () from /usr/lib64/libqpidcommon.so.2 #35 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0 #36 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x447a8940 (LWP 6794)): #0 0x00000038fa40d2e4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000038fa408c3a in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x00000038fa408afc in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002b761173865a in mrg::journal::slock::slock () #4 0x00002b7611745052 in mrg::journal::jcntl::dequeue_data_record () #5 0x00002b76116f9973 in mrg::msgstore::JournalImpl::dequeue_data_record () #6 0x00002b76117047ea in mrg::msgstore::MessageStoreImpl::async_dequeue () #7 0x00002b761171461f in mrg::msgstore::MessageStoreImpl::dequeue () #8 0x00000039d29794eb in qpid::broker::MessageStoreModule::dequeue () #9 0x00000039d2983db7 in qpid::broker::Queue::dequeue () #10 0x00000039d292ed34 in qpid::broker::DeliveryRecord::accept () #11 0x00000039d29c1853 in std::__find_if<std::_Deque_iterator<qpid::broker::DeliveryRecord, qpid::broker::DeliveryRecord&, qpid::broker::DeliveryRecord*>, qpid::broker::IsInSequenceSetAnd<boost::_bi::bind_t<bool, boost::_mfi::mf1<bool, qpid::broker::DeliveryRecord, qpid::broker::TransactionContext*>, boost::_bi::list2<boost::arg<1>, boost::_bi::value<qpid::broker::TransactionContext*> > > > > () #12 0x00000039d29c1ed9 in std::remove_if<std::_Deque_iterator<qpid::broker::DeliveryRecord, qpid::broker::DeliveryRecord&, qpid::broker::DeliveryRecord*>, qpid::broker::IsInSequenceSetAnd<boost::_bi::bind_t<bool, boost::_mfi::mf1<bool, qpid::broker::DeliveryRecord, qpid::broker::TransactionContext*>, boost::_bi::list2<boost::arg<1>, boost::_bi::value<qpid::broker::TransactionContext*> > > > > () #13 0x00000039d29b70e3 in qpid::broker::SemanticState::accepted () #14 0x00000039d23470c2 in qpid::framing::AMQP_ServerOperations::MessageHandler::Invoker::visit () from /usr/lib64/libqpidcommon.so.2 #15 0x00000039d234abd7 in qpid::framing::AMQP_ServerOperations::Invoker::visit #16 0x00000039d29e16e3 in qpid::framing::invoke<qpid::broker::SessionAdapter> #17 0x00000039d29dff7d in qpid::broker::SessionState::handleCommand () #18 0x00000039d29e0ef9 in qpid::broker::SessionState::handleIn () #19 0x00000039d23b9fa9 in qpid::amqp_0_10::SessionHandler::handleIn () #20 0x00000039d291dee2 in qpid::broker::Connection::received () #21 0x00000039d28fdd84 in qpid::amqp_0_10::Connection::decode () #22 0x00000039d23fde11 in qpid::sys::AsynchIOHandler::readbuff () #23 0x00000039d231de4f in boost::function2<void, qpid::sys::AsynchIO&, qpid::sys::AsynchIOBufferBase*, std::allocator<boost::function_base> >::operator() () #24 0x00000039d231c3b3 in qpid::sys::posix::AsynchIO::readable () #25 0x00000039d2404e97 in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() () #26 0x00000039d23ffd2f in qpid::sys::DispatchHandle::processEvent () #27 0x00000039d2329fff in qpid::sys::Poller::run () #28 0x00000039d23201ca in ?? () from /usr/lib64/libqpidcommon.so.2 #29 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0 #30 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2b760f13c830 (LWP 6788)): #0 0x00000038f9cd4018 in epoll_wait () from /lib64/libc.so.6 #1 0x00000039d23295af in qpid::sys::Poller::wait () #2 0x00000039d2329fd2 in qpid::sys::Poller::run () #3 0x00000039d290a2f2 in qpid::broker::Broker::run () #4 0x0000000000406ae6 in ?? () #5 0x00000000004055af in __cxa_pure_virtual () #6 0x00000038f9c1d994 in __libc_start_main () from /lib64/libc.so.6 #7 0x0000000000405019 in __cxa_pure_virtual () #8 0x00007fffb4671038 in ?? () #9 0x0000000000000000 in ?? () [gordon@mrg15 ~]$
In both these cases, it seems that something is holding the lock in qpid::broker::IncompleteMessageList::enqueueComplete() which is blocking the thread. I cannot find another thread that is doing so, however. In the first case, thread 3 is holding the store lock, but is blocked in the broker callback in IncompleteMessageList::enqueueComplete(). Threads 2 and 4 (which contians the flush call from the timer) are simply waiting for this thread to complete and release the store lock. The second case is similar. Thread 3 is stuck in the broker callback (aparrently with no thread holding the lock), and threads 2, 4 and 6 (the timer flush call) are waiting for thread 3 to return. I cannot see what is holding the lock in IncompleteMessageList::enqueueComplete() in either of these cases. None of the other threads have passed through the IncompleteMessageList class, and so cannot be holding the lock.
I believe this may have been caused by the heap corruption fixed in r944329. Certainly I can't reproduce it now, so closing.