Bug 447291 - RHEL4: Qpidd sometimes core dumps on perftest (shared, durable, async)
Summary: RHEL4: Qpidd sometimes core dumps on perftest (shared, durable, async)
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: beta
Hardware: i386
OS: Linux
medium
urgent
Target Milestone: ---
: ---
Assignee: messaging-bugs
QA Contact: Kim van der Riet
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-05-19 11:05 UTC by David Sommerseth
Modified: 2016-05-22 23:27 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-20 07:05:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description David Sommerseth 2008-05-19 11:05:45 UTC
Description of problem:
Sometimes qpidd segfaults with coredump when perftest is run against it in
shared mode, durable with async.


Broker:
rm -rf /var/lib/qpid/rhm/*     # to make sure we start fresh
usr/sbin/qpidd --load-module=libbdbstore.so.0 --auth no --log-output qpid.log
--log-enable info+ --log-enable debug+ --num-jfile 16 --jfile-size-pgs 256
--store-async yes

Perftest:
perftest --mode shared --durable yes --size 256 --count 50000 -b <ip address to
broker>

The core dump do not come too often, unfortunately.  And it do not seem to crash
if it is much more or less logging enabled, only with info+, debug+ and trace+. 


Steps to Reproduce:
Run perftest and broker with the given commands
  

Additional info:

** One run gave this back trace 
#0  0x002d00c7 in rhm::bdbstore::BdbMessageStore::dequeue () from
/usr/lib/libbdbstore.so.0
(gdb) bt
#0  0x002d00c7 in rhm::bdbstore::BdbMessageStore::dequeue () from
/usr/lib/libbdbstore.so.0
#1  0x007b09e0 in qpid::broker::MessageStoreModule::dequeue () from
/usr/lib/libqpidbroker.so.0
#2  0x0074f279 in qpid::broker::Queue::dequeue () from /usr/lib/libqpidbroker.so.0
#3  0x007c90fc in qpid::broker::SemanticState::ConsumerImpl::deliver () from
/usr/lib/libqpidbroker.so.0
#4  0x0075209b in qpid::broker::Queue::dispatch () from /usr/lib/libqpidbroker.so.0
#5  0x007c206b in qpid::broker::SemanticState::ConsumerImpl::doOutput () from
/usr/lib/libqpidbroker.so.0
#6  0x00a3105a in ?? ()
#7  0x0a07a148 in ?? ()
#8  0xb5ff4d80 in ?? ()
#9  0x00010000 in ?? ()
#10 0x00000000 in ?? ()


** Another run gave this backtrace:
#0  0x00232bf6 in qpid::framing::BodyHolder::~BodyHolder$delete () from
/usr/lib/libqpidbroker.so.0
(gdb) bt
#0  0x00232bf6 in qpid::framing::BodyHolder::~BodyHolder$delete () from
/usr/lib/libqpidbroker.so.0
#1  0x03f977db in qpid::framing::AMQFrame::~AMQFrame () from
/usr/lib/libqpidcommon.so.0
#2  0x0022f499 in qpid::broker::Message::~Message$delete () from
/usr/lib/libqpidbroker.so.0
#3  0x006f4338 in rhm::bdbstore::DataTokenImpl::~DataTokenImpl$delete () from
/usr/lib/libbdbstore.so.0
#4  0x006f53f5 in rhm::bdbstore::JournalImpl::aio_wr_callback () from
/usr/lib/libbdbstore.so.0
#5  0x0073fffa in rhm::journal::wmgr::get_events () from /usr/lib/libbdbstore.so.0
#6  0x0073b24b in rhm::journal::wmgr::write_flush () from /usr/lib/libbdbstore.so.0
#7  0x0073dca5 in rhm::journal::wmgr::dequeue () from /usr/lib/libbdbstore.so.0
#8  0x00711429 in rhm::journal::jcntl::dequeue_data_record () from
/usr/lib/libbdbstore.so.0
#9  0x006fcfa7 in rhm::bdbstore::JournalImpl::dequeue_data_record () from
/usr/lib/libbdbstore.so.0
#10 0x006c03e9 in rhm::bdbstore::BdbMessageStore::async_dequeue () from
/usr/lib/libbdbstore.so.0
#11 0x006c91a6 in rhm::bdbstore::BdbMessageStore::dequeue () from
/usr/lib/libbdbstore.so.0
#12 0x002369e0 in qpid::broker::MessageStoreModule::dequeue () from
/usr/lib/libqpidbroker.so.0
#13 0x001d5279 in qpid::broker::Queue::dequeue () from /usr/lib/libqpidbroker.so.0
#14 0x0024f0fc in qpid::broker::SemanticState::ConsumerImpl::deliver () from
/usr/lib/libqpidbroker.so.0
#15 0x001d809b in qpid::broker::Queue::dispatch () from /usr/lib/libqpidbroker.so.0
#16 0x0024806b in qpid::broker::SemanticState::ConsumerImpl::doOutput () from
/usr/lib/libqpidbroker.so.0
#17 0x03fc2eba in qpid::sys::AggregateOutput::doOutput () from
/usr/lib/libqpidcommon.so.0
#18 0x00254a6d in qpid::broker::SemanticState::doOutput () from
/usr/lib/libqpidbroker.so.0
#19 0x03fc2eba in qpid::sys::AggregateOutput::doOutput () from
/usr/lib/libqpidcommon.so.0
#20 0x001e95ae in qpid::broker::Connection::doOutput () from
/usr/lib/libqpidbroker.so.0
#21 0x001ad60c in qpid::amqp_0_10::Connection::canEncode () from
/usr/lib/libqpidbroker.so.0
#22 0x03fc4302 in qpid::sys::AsynchIOHandler::idle () from
/usr/lib/libqpidcommon.so.0
#23 0x0029fb08 in
boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void,
boost::_mfi::mf1<void, qpid::sys::AsynchIOHandler, qpid::sys::AsynchIO&>,
boost::_bi::list2<boost::_bi::value<qpid::sys::AsynchIOHandler*>, boost::arg<1>
> >, void, qpid::sys::AsynchIO&>::invoke () from /usr/lib/libqpidbroker.so.0
#24 0x03f89cde in boost::function1<void, qpid::sys::AsynchIO&,
std::allocator<boost::function_base> >::operator() () from
/usr/lib/libqpidcommon.so.0
#25 0x03f88780 in qpid::sys::AsynchIO::writeable () from /usr/lib/libqpidcommon.so.0
#26 0x03f8a8b4 in
boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void,
boost::_mfi::mf1<void, qpid::sys::AsynchIO, qpid::sys::DispatchHandle&>,
boost::_bi::list2<boost::_bi::value<qpid::sys::AsynchIO*>, boost::arg<1> > >,
void, qpid::sys::DispatchHandle&>::invoke () from /usr/lib/libqpidcommon.so.0
#27 0x03fc963e in boost::function1<void, qpid::sys::DispatchHandle&,
std::allocator<boost::function_base> >::operator() () from
/usr/lib/libqpidcommon.so.0
#28 0x03fc5ff9 in qpid::sys::DispatchHandle::dispatchCallbacks () from
/usr/lib/libqpidcommon.so.0
#29 0x03fc673b in qpid::sys::Dispatcher::run () from /usr/lib/libqpidcommon.so.0
#30 0x03f8b01b in qpid::sys::Thread::runRunnable () from /usr/lib/libqpidcommon.so.0
#31 0x0063e3cc in start_thread () from /lib/tls/libpthread.so.0
#32 0x0049e1ae in clone () from /lib/tls/libc.so.6

** A third run didn't crash the main qpidd process, but probably just a thread
... anyway, this came in the log file to Qpidd:

2008-may-19 12:41:53 debug Exception: Device or resource busy
(./qpid/sys/posix/Mutex.h:155)
2008-may-19 12:41:53 debug Exception: Exception: Device or resource busy
(./qpid/sys/posix/Mutex.h:155)

The perftest program hangs doing sub_done.

Doing a new test (without restarting qpidd), the qpidd writes this to stderr:
Unexpected I/O response (RHM_IORES_BUSY) on queue perftest0".

The qpid.log file says:
2008-may-19 12:55:55 debug Exception: Unexpected I/O response (RHM_IORES_BUSY)
on queue perftest0". (JournalImpl.cpp:477)
2008-may-19 12:55:55 debug Exception: Channel 1 is not attached
(qpid/broker/SessionHandler.cpp:72)
2008-may-19 12:55:55 error Session detached due to: not-attached: Channel 1 is
not attached (qpid/broker/SessionHandler.cpp:72)
2008-may-19 12:55:55 debug Exception: Channel 1 is not attached
(qpid/broker/SessionHandler.cpp:72)
2008-may-19 12:55:55 error Session detached due to: not-attached: Channel 1 is
not attached (qpid/broker/SessionHandler.cpp:72)

A fourth attempt, also causing thread crash, this time with : 
2008-may-19 13:03:21 debug Exception: Channel 1 is not attached
(qpid/broker/SessionHandler.cpp:72)
2008-may-19 13:03:21 error Session detached due to: not-attached: Channel 1 is
not attached (qpid/broker/SessionHandler.cpp:72)
2008-may-19 13:03:21 trace RECV [10.34.33.204:49427]: Frame[Bbe; channel=1;
{MessageTransferBody: destination=; accept-mode=1; acquire-mode=0; }]
2008-may-19 13:03:21 debug Exception: Channel 1 is not attached
(qpid/broker/SessionHandler.cpp:72)
2008-may-19 13:03:21 error Session detached due to: not-attached: Channel 1 is
not attached (qpid/broker/SessionHandler.cpp:72)
2008-may-19 13:03:21 trace RECV [10.34.33.204:49427]: Frame[be; channel=1;
header (35 bytes); properties={{MessageProperties: content-length=256;
}{DeliveryProperties: delivery-mode=2; routing-key=perftest0; }}]
2008-may-19 13:03:21 debug Exception: Channel 1 is not attached
(qpid/broker/SessionHandler.cpp:72)
2008-may-19 13:03:21 error Session detached due to: not-attached: Channel 1 is
not attached (qpid/broker/SessionHandler.cpp:72)
2008-may-19 13:03:21 trace SENT [10.34.33.204:49427]: Frame[BEbe; channel=0;
{ConnectionCloseBody: reply-code=501; reply-text=Exception: Exception: Device or
resource busy (./qpid/sys/posix/Mutex.h:155); }]
2008-may-19 13:03:21 debug DISCONNECTED [10.34.33.204:49429]
[root@dhcp-lab-167 ~]# tail qpid.log 
2008-may-19 13:03:21 debug Exception: Channel 1 is not attached
(qpid/broker/SessionHandler.cpp:72)
2008-may-19 13:03:21 error Session detached due to: not-attached: Channel 1 is
not attached (qpid/broker/SessionHandler.cpp:72)
2008-may-19 13:03:21 trace RECV [10.34.33.204:49427]: Frame[Bbe; channel=1;
{MessageTransferBody: destination=; accept-mode=1; acquire-mode=0; }]
2008-may-19 13:03:21 debug Exception: Channel 1 is not attached
(qpid/broker/SessionHandler.cpp:72)
2008-may-19 13:03:21 error Session detached due to: not-attached: Channel 1 is
not attached (qpid/broker/SessionHandler.cpp:72)
2008-may-19 13:03:21 trace RECV [10.34.33.204:49427]: Frame[be; channel=1;
header (35 bytes); properties={{MessageProperties: content-length=256;
}{DeliveryProperties: delivery-mode=2; routing-key=perftest0; }}]
2008-may-19 13:03:21 debug Exception: Channel 1 is not attached
(qpid/broker/SessionHandler.cpp:72)
2008-may-19 13:03:21 error Session detached due to: not-attached: Channel 1 is
not attached (qpid/broker/SessionHandler.cpp:72)
2008-may-19 13:03:21 trace SENT [10.34.33.204:49427]: Frame[BEbe; channel=0;
{ConnectionCloseBody: reply-code=501; reply-text=Exception: Exception: Device or
resource busy (./qpid/sys/posix/Mutex.h:155); }]
2008-may-19 13:03:21 debug DISCONNECTED [10.34.33.204:49429]

The perftest program prints a "SubscribeThread exception: Closed: " message

Hope this make some sense to somebody ...

Comment 1 David Sommerseth 2008-05-19 13:46:01 UTC
This crash seems to happen only with --store-async yes ... with --store-async no
the tests seems to go better.

Comment 2 Jeff Needle 2008-05-19 19:02:39 UTC
About 4,000,000 packets later, I haven't been able to reproduce this with a
broker running on a RHEL4 i386 installed system booted with mem=1G and perftest
running on a RHEL5 x86_64 system.

Comment 3 David Sommerseth 2008-05-20 07:05:52 UTC
Closing this bug now.  Most probably it's a local environment error in a desktop
VM which caused Qpidd to core dump.  Tested on physical hardware with RHEL4 i386
and x86_64 extensively withouth triggering the same issue.  Even run over night
without any obvious problems on a i386 box.


Comment 4 Kim van der Riet 2008-05-20 11:36:01 UTC
I ran a RHEL5.1 client against the RHEL4 broker overnight, and achieved 11,300
iterations using the broker and client parameters in comment #1. No problems
were found.

Comment 5 Jeff Needle 2008-05-21 15:39:51 UTC
I also ran a RHEL4 client against a RHEL4 broker running on a 1G xen guest to
more closely simulate the original environment.  Over 40,000,000 messages passed
without issue.  David's original test wouldn't get to more than 500,000 or so
before dumping core.


Note You need to log in before you can comment on or make changes to this bug.