Bug 450519

Summary: broker appears to hang on detaching session
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-cppAssignee: Alan Conway <aconway>
Status: CLOSED CURRENTRELEASE QA Contact: Kim van der Riet <kim.vdriet>
Severity: medium Docs Contact:
Priority: urgent    
Version: beta   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-12-02 16:06:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gordon Sim 2008-06-09 11:57:07 UTC
Original mail from Kim:

I ran a durable soak test over the weekend, but it froze after 7 1/2
hours on Friday night. The test was run from a local svn build, not from
rpms.

Build data:
qpid:  663874 2008-06-06 05:56:45 -0400 (Fri, 06 Jun 2008)
store:   2137 2008-06-05 18:07:06 -0400 (Thu, 05 Jun 2008)

The broker and perftest were run on the same box (et-virt07).The broker
was started as follows:
./qpidd --load-module /home/kpvdr/store/cpp/lib/.libs/libbdbstore.so --data-dir
/home/kpvdr/rhm --num-jfiles 16 --jfile-size-pgs 96 --auth no

Perftest was run in an infinite loop as follows:
./perftest --durable yes --count 50000 --nsub 2 --npub 2 --qt 4 --summary

The pstacks are as follows:
Broker:
Thread 13 (Thread 1084496192 (LWP 30235)):
#0  0x00002b3be6e83687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x00002b3be61205b2 in qpid::broker::Timer::run ()
#2  0x00002b3be64d3dda in qpid::sys::Thread::runRunnable ()
#3  0x00002b3be6e7f2f7 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 12 (Thread 1094986048 (LWP 30236)):
#0  0x00002b3be6e83496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00002b3be612054c in qpid::broker::Timer::run ()
#2  0x00002b3be64d3dda in qpid::sys::Thread::runRunnable ()
#3  0x00002b3be6e7f2f7 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 11 (Thread 1105475904 (LWP 30237)):
#0  0x00002b3be6e83687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x00002b3be61205b2 in qpid::broker::Timer::run ()
#2  0x00002b3be64d3dda in qpid::sys::Thread::runRunnable ()
#3  0x00002b3be6e7f2f7 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 10 (Thread 1115965760 (LWP 30238)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002b3be64c7a19 in qpid::sys::Poller::wait ()
#2  0x00002b3be65063a0 in qpid::sys::Dispatcher::run ()
#3  0x00002b3be64d3dda in qpid::sys::Thread::runRunnable ()
#4  0x00002b3be6e7f2f7 in start_thread () from /lib64/libpthread.so.0
#5  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 9 (Thread 1126455616 (LWP 30239)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002b3be64c7a19 in qpid::sys::Poller::wait ()
#2  0x00002b3be65063a0 in qpid::sys::Dispatcher::run ()
#3  0x00002b3be64d3dda in qpid::sys::Thread::runRunnable ()
#4  0x00002b3be6e7f2f7 in start_thread () from /lib64/libpthread.so.0
#5  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 8 (Thread 1136945472 (LWP 30240)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002b3be64c7a19 in qpid::sys::Poller::wait ()
#2  0x00002b3be65063a0 in qpid::sys::Dispatcher::run ()
#3  0x00002b3be64d3dda in qpid::sys::Thread::runRunnable ()
#4  0x00002b3be6e7f2f7 in start_thread () from /lib64/libpthread.so.0
#5  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 7 (Thread 1147435328 (LWP 30241)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002b3be64c7a19 in qpid::sys::Poller::wait ()
#2  0x00002b3be65063a0 in qpid::sys::Dispatcher::run ()
#3  0x00002b3be64d3dda in qpid::sys::Thread::runRunnable ()
#4  0x00002b3be6e7f2f7 in start_thread () from /lib64/libpthread.so.0
#5  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 6 (Thread 1157925184 (LWP 30242)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002b3be64c7a19 in qpid::sys::Poller::wait ()
#2  0x00002b3be65063a0 in qpid::sys::Dispatcher::run ()
#3  0x00002b3be64d3dda in qpid::sys::Thread::runRunnable ()
#4  0x00002b3be6e7f2f7 in start_thread () from /lib64/libpthread.so.0
#5  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 5 (Thread 1168415040 (LWP 30243)):
#0  0x0000003abf264c90 in std::_Rb_tree_increment ()
#1  0x00002b3be611d295 in std::_Rb_tree<qpid::SessionId, qpid::SessionId,
std::_Identity<qpid::SessionId>, std::less<qpid::SessionId>,
std::allocator<qpid::SessionId> >::erase () from
/home/kpvdr/qpid/cpp/src/.libs/libqpidbroker.so.0
#2  0x00002b3be6115be8 in qpid::broker::SessionState::~SessionState$delete ()
#3  0x00002b3be611d5de in qpid::broker::SessionHandler::handleDetach ()
#4  0x00002b3be64ac1f2 in
qpid::framing::AMQP_AllOperations::SessionHandler::Invoker::visit () from
/home/kpvdr/qpid/cpp/src/.libs/libqpidcommon.so.0
#5  0x00002b3be64d86fe in qpid::framing::invoke<qpid::amqp_0_10::SessionHandler>
() from /home/kpvdr/qpid/cpp/src/.libs/libqpidcommon.so.0
#6  0x00002b3be64d4e23 in qpid::amqp_0_10::SessionHandler::invoke ()
#7  0x00002b3be64d6b52 in qpid::amqp_0_10::SessionHandler::handleIn ()
#8  0x00002b3be60beb8d in qpid::broker::Connection::received ()
#9  0x00002b3be609b6a0 in qpid::amqp_0_10::Connection::decode ()
#10 0x00002b3be6503a7f in qpid::sys::AsynchIOHandler::readbuff ()
#11 0x00002b3be64d392f in boost::function2<void, qpid::sys::AsynchIO&,
qpid::sys::AsynchIO::BufferBase*, std::allocator<boost::function_base>
>::operator() ()
#12 0x00002b3be64d0dc9 in qpid::sys::AsynchIO::readable ()
#13 0x00002b3be650675f in boost::function1<void, qpid::sys::DispatchHandle&,
std::allocator<boost::function_base> >::operator() ()
#14 0x00002b3be65062e9 in qpid::sys::DispatchHandle::dispatchCallbacks ()
#15 0x00002b3be65063e5 in qpid::sys::Dispatcher::run ()
#16 0x00002b3be64d3dda in qpid::sys::Thread::runRunnable ()
#17 0x00002b3be6e7f2f7 in start_thread () from /lib64/libpthread.so.0
#18 0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 4 (Thread 1178904896 (LWP 30244)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002b3be64c7a19 in qpid::sys::Poller::wait ()
#2  0x00002b3be65063a0 in qpid::sys::Dispatcher::run ()
#3  0x00002b3be64d3dda in qpid::sys::Thread::runRunnable ()
#4  0x00002b3be6e7f2f7 in start_thread () from /lib64/libpthread.so.0
#5  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 1189394752 (LWP 30245)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002b3be64c7a19 in qpid::sys::Poller::wait ()
#2  0x00002b3be65063a0 in qpid::sys::Dispatcher::run ()
#3  0x00002b3be64d3dda in qpid::sys::Thread::runRunnable ()
#4  0x00002b3be6e7f2f7 in start_thread () from /lib64/libpthread.so.0
#5  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 1199884608 (LWP 30264)):
#0  0x00002b3be6e83687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x00002b3be61205b2 in qpid::broker::Timer::run ()
#2  0x00002b3be64d3dda in qpid::sys::Thread::runRunnable ()
#3  0x00002b3be6e7f2f7 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47536279213856 (LWP 30221)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002b3be64c7a19 in qpid::sys::Poller::wait ()
#2  0x00002b3be65063a0 in qpid::sys::Dispatcher::run ()
#3  0x00002b3be609da76 in qpid::broker::Broker::run ()
#4  0x00000000004077cf in main ()
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6

Perftest: (see thread 14 - it is the only one doing anything)
Thread 18 (Thread 1087953216 (LWP 32270)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 17 (Thread 1108932928 (LWP 32272)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 16 (Thread 1129912640 (LWP 32274)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 15 (Thread 1150892352 (LWP 32276)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 14 (Thread 1161382208 (LWP 32277)):
#0  0x00002af72a145496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00002af7294262c6 in qpid::client::SessionImpl::close ()
#2  0x0000000000419dfa in SubscribeThread::run ()
#3  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#4  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#5  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 13 (Thread 1171872064 (LWP 32278)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 12 (Thread 1192851776 (LWP 32280)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 11 (Thread 1213831488 (LWP 32282)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 10 (Thread 1234811200 (LWP 32284)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 9 (Thread 1255790912 (LWP 32286)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 8 (Thread 1276770624 (LWP 32288)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 7 (Thread 1297750336 (LWP 32290)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 6 (Thread 1318730048 (LWP 32292)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 5 (Thread 1339709760 (LWP 32294)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 4 (Thread 1360689472 (LWP 32296)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 1381669184 (LWP 32298)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 1402648896 (LWP 32300)):
#0  0x0000003ab00d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00002af729789a19 in qpid::sys::Poller::wait ()
#2  0x00002af7297c83a0 in qpid::sys::Dispatcher::run ()
#3  0x00002af7293ff1d9 in qpid::client::Connector::run ()
#4  0x00002af729795dda in qpid::sys::Thread::runRunnable ()
#5  0x00002af72a1412f7 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ab00d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47241053435712 (LWP 32255)):
#0  0x00002af72a1425b5 in pthread_join () from /lib64/libpthread.so.0
#1  0x000000000040d925 in main ()
#0  0x00002af72a1425b5 in pthread_join () from /lib64/libpthread.so.0

Kim

Comment 1 Gordon Sim 2008-06-09 11:59:14 UTC
Broker continues using CPU when this occurs. Suspect this is a result of
SessionManager::forget() not protecting the Attached set from concurrent access
resulting in an infinite loop if its modified during an attmpted erase.

Comment 2 Alan Conway 2008-06-09 13:09:49 UTC
Fixed lock revision 664695, need to re-run soak tests.

Comment 3 Mike Bonnet 2008-06-20 03:53:27 UTC
qpidc-0.2.667603-1.el5, qpidc-perftest-0.2.667603-1.el5, qpidd-0.2.667603-1.el5, and rhm-0.2.2153-1.el5 have been pushed to the staging repo for testing