Description of problem: Sometimes qpidd segfaults with coredump when perftest is run against it in shared mode, durable with async. Broker: rm -rf /var/lib/qpid/rhm/* # to make sure we start fresh usr/sbin/qpidd --load-module=libbdbstore.so.0 --auth no --log-output qpid.log --log-enable info+ --log-enable debug+ --num-jfile 16 --jfile-size-pgs 256 --store-async yes Perftest: perftest --mode shared --durable yes --size 256 --count 50000 -b <ip address to broker> The core dump do not come too often, unfortunately. And it do not seem to crash if it is much more or less logging enabled, only with info+, debug+ and trace+. Steps to Reproduce: Run perftest and broker with the given commands Additional info: ** One run gave this back trace #0 0x002d00c7 in rhm::bdbstore::BdbMessageStore::dequeue () from /usr/lib/libbdbstore.so.0 (gdb) bt #0 0x002d00c7 in rhm::bdbstore::BdbMessageStore::dequeue () from /usr/lib/libbdbstore.so.0 #1 0x007b09e0 in qpid::broker::MessageStoreModule::dequeue () from /usr/lib/libqpidbroker.so.0 #2 0x0074f279 in qpid::broker::Queue::dequeue () from /usr/lib/libqpidbroker.so.0 #3 0x007c90fc in qpid::broker::SemanticState::ConsumerImpl::deliver () from /usr/lib/libqpidbroker.so.0 #4 0x0075209b in qpid::broker::Queue::dispatch () from /usr/lib/libqpidbroker.so.0 #5 0x007c206b in qpid::broker::SemanticState::ConsumerImpl::doOutput () from /usr/lib/libqpidbroker.so.0 #6 0x00a3105a in ?? () #7 0x0a07a148 in ?? () #8 0xb5ff4d80 in ?? () #9 0x00010000 in ?? () #10 0x00000000 in ?? () ** Another run gave this backtrace: #0 0x00232bf6 in qpid::framing::BodyHolder::~BodyHolder$delete () from /usr/lib/libqpidbroker.so.0 (gdb) bt #0 0x00232bf6 in qpid::framing::BodyHolder::~BodyHolder$delete () from /usr/lib/libqpidbroker.so.0 #1 0x03f977db in qpid::framing::AMQFrame::~AMQFrame () from /usr/lib/libqpidcommon.so.0 #2 0x0022f499 in qpid::broker::Message::~Message$delete () from /usr/lib/libqpidbroker.so.0 #3 0x006f4338 in rhm::bdbstore::DataTokenImpl::~DataTokenImpl$delete () from /usr/lib/libbdbstore.so.0 #4 0x006f53f5 in rhm::bdbstore::JournalImpl::aio_wr_callback () from /usr/lib/libbdbstore.so.0 #5 0x0073fffa in rhm::journal::wmgr::get_events () from /usr/lib/libbdbstore.so.0 #6 0x0073b24b in rhm::journal::wmgr::write_flush () from /usr/lib/libbdbstore.so.0 #7 0x0073dca5 in rhm::journal::wmgr::dequeue () from /usr/lib/libbdbstore.so.0 #8 0x00711429 in rhm::journal::jcntl::dequeue_data_record () from /usr/lib/libbdbstore.so.0 #9 0x006fcfa7 in rhm::bdbstore::JournalImpl::dequeue_data_record () from /usr/lib/libbdbstore.so.0 #10 0x006c03e9 in rhm::bdbstore::BdbMessageStore::async_dequeue () from /usr/lib/libbdbstore.so.0 #11 0x006c91a6 in rhm::bdbstore::BdbMessageStore::dequeue () from /usr/lib/libbdbstore.so.0 #12 0x002369e0 in qpid::broker::MessageStoreModule::dequeue () from /usr/lib/libqpidbroker.so.0 #13 0x001d5279 in qpid::broker::Queue::dequeue () from /usr/lib/libqpidbroker.so.0 #14 0x0024f0fc in qpid::broker::SemanticState::ConsumerImpl::deliver () from /usr/lib/libqpidbroker.so.0 #15 0x001d809b in qpid::broker::Queue::dispatch () from /usr/lib/libqpidbroker.so.0 #16 0x0024806b in qpid::broker::SemanticState::ConsumerImpl::doOutput () from /usr/lib/libqpidbroker.so.0 #17 0x03fc2eba in qpid::sys::AggregateOutput::doOutput () from /usr/lib/libqpidcommon.so.0 #18 0x00254a6d in qpid::broker::SemanticState::doOutput () from /usr/lib/libqpidbroker.so.0 #19 0x03fc2eba in qpid::sys::AggregateOutput::doOutput () from /usr/lib/libqpidcommon.so.0 #20 0x001e95ae in qpid::broker::Connection::doOutput () from /usr/lib/libqpidbroker.so.0 #21 0x001ad60c in qpid::amqp_0_10::Connection::canEncode () from /usr/lib/libqpidbroker.so.0 #22 0x03fc4302 in qpid::sys::AsynchIOHandler::idle () from /usr/lib/libqpidcommon.so.0 #23 0x0029fb08 in boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::sys::AsynchIOHandler, qpid::sys::AsynchIO&>, boost::_bi::list2<boost::_bi::value<qpid::sys::AsynchIOHandler*>, boost::arg<1> > >, void, qpid::sys::AsynchIO&>::invoke () from /usr/lib/libqpidbroker.so.0 #24 0x03f89cde in boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> >::operator() () from /usr/lib/libqpidcommon.so.0 #25 0x03f88780 in qpid::sys::AsynchIO::writeable () from /usr/lib/libqpidcommon.so.0 #26 0x03f8a8b4 in boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::sys::AsynchIO, qpid::sys::DispatchHandle&>, boost::_bi::list2<boost::_bi::value<qpid::sys::AsynchIO*>, boost::arg<1> > >, void, qpid::sys::DispatchHandle&>::invoke () from /usr/lib/libqpidcommon.so.0 #27 0x03fc963e in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() () from /usr/lib/libqpidcommon.so.0 #28 0x03fc5ff9 in qpid::sys::DispatchHandle::dispatchCallbacks () from /usr/lib/libqpidcommon.so.0 #29 0x03fc673b in qpid::sys::Dispatcher::run () from /usr/lib/libqpidcommon.so.0 #30 0x03f8b01b in qpid::sys::Thread::runRunnable () from /usr/lib/libqpidcommon.so.0 #31 0x0063e3cc in start_thread () from /lib/tls/libpthread.so.0 #32 0x0049e1ae in clone () from /lib/tls/libc.so.6 ** A third run didn't crash the main qpidd process, but probably just a thread ... anyway, this came in the log file to Qpidd: 2008-may-19 12:41:53 debug Exception: Device or resource busy (./qpid/sys/posix/Mutex.h:155) 2008-may-19 12:41:53 debug Exception: Exception: Device or resource busy (./qpid/sys/posix/Mutex.h:155) The perftest program hangs doing sub_done. Doing a new test (without restarting qpidd), the qpidd writes this to stderr: Unexpected I/O response (RHM_IORES_BUSY) on queue perftest0". The qpid.log file says: 2008-may-19 12:55:55 debug Exception: Unexpected I/O response (RHM_IORES_BUSY) on queue perftest0". (JournalImpl.cpp:477) 2008-may-19 12:55:55 debug Exception: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 12:55:55 error Session detached due to: not-attached: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 12:55:55 debug Exception: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 12:55:55 error Session detached due to: not-attached: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) A fourth attempt, also causing thread crash, this time with : 2008-may-19 13:03:21 debug Exception: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 13:03:21 error Session detached due to: not-attached: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 13:03:21 trace RECV [10.34.33.204:49427]: Frame[Bbe; channel=1; {MessageTransferBody: destination=; accept-mode=1; acquire-mode=0; }] 2008-may-19 13:03:21 debug Exception: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 13:03:21 error Session detached due to: not-attached: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 13:03:21 trace RECV [10.34.33.204:49427]: Frame[be; channel=1; header (35 bytes); properties={{MessageProperties: content-length=256; }{DeliveryProperties: delivery-mode=2; routing-key=perftest0; }}] 2008-may-19 13:03:21 debug Exception: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 13:03:21 error Session detached due to: not-attached: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 13:03:21 trace SENT [10.34.33.204:49427]: Frame[BEbe; channel=0; {ConnectionCloseBody: reply-code=501; reply-text=Exception: Exception: Device or resource busy (./qpid/sys/posix/Mutex.h:155); }] 2008-may-19 13:03:21 debug DISCONNECTED [10.34.33.204:49429] [root@dhcp-lab-167 ~]# tail qpid.log 2008-may-19 13:03:21 debug Exception: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 13:03:21 error Session detached due to: not-attached: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 13:03:21 trace RECV [10.34.33.204:49427]: Frame[Bbe; channel=1; {MessageTransferBody: destination=; accept-mode=1; acquire-mode=0; }] 2008-may-19 13:03:21 debug Exception: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 13:03:21 error Session detached due to: not-attached: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 13:03:21 trace RECV [10.34.33.204:49427]: Frame[be; channel=1; header (35 bytes); properties={{MessageProperties: content-length=256; }{DeliveryProperties: delivery-mode=2; routing-key=perftest0; }}] 2008-may-19 13:03:21 debug Exception: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 13:03:21 error Session detached due to: not-attached: Channel 1 is not attached (qpid/broker/SessionHandler.cpp:72) 2008-may-19 13:03:21 trace SENT [10.34.33.204:49427]: Frame[BEbe; channel=0; {ConnectionCloseBody: reply-code=501; reply-text=Exception: Exception: Device or resource busy (./qpid/sys/posix/Mutex.h:155); }] 2008-may-19 13:03:21 debug DISCONNECTED [10.34.33.204:49429] The perftest program prints a "SubscribeThread exception: Closed: " message Hope this make some sense to somebody ...
This crash seems to happen only with --store-async yes ... with --store-async no the tests seems to go better.
About 4,000,000 packets later, I haven't been able to reproduce this with a broker running on a RHEL4 i386 installed system booted with mem=1G and perftest running on a RHEL5 x86_64 system.
Closing this bug now. Most probably it's a local environment error in a desktop VM which caused Qpidd to core dump. Tested on physical hardware with RHEL4 i386 and x86_64 extensively withouth triggering the same issue. Even run over night without any obvious problems on a i386 box.
I ran a RHEL5.1 client against the RHEL4 broker overnight, and achieved 11,300 iterations using the broker and client parameters in comment #1. No problems were found.
I also ran a RHEL4 client against a RHEL4 broker running on a 1G xen guest to more closely simulate the original environment. Over 40,000,000 messages passed without issue. David's original test wouldn't get to more than 500,000 or so before dumping core.