Description of problem: Let's have two nodes cluster which is under perftest load, each perftest client is run with unique --base-name=<ID> switch,some messages are dure. The messages are On top of that there are qmf queries running (qpid-stat -b and qpid-cluster). The test launches multiple perftest clients in about one moment and waits for them until they finish. There is then huge stress when clients try to get connection to the broker and first minutes are really stressy for broker and for the machine as well. There are observed perftest clients hang /locked up following way: [root@mrg-qe-01 ~]# pstack 27976 Thread 6 (Thread 0xb7efeb90 (LWP 27977)): #0 0x0023d410 in __kernel_vsyscall () #1 0x00985486 in epoll_wait () from /lib/libc.so.6 #2 0x004ccafa in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib/libqpidcommon.so.2 #3 0x004cd666 in qpid::sys::Poller::run() () from /usr/lib/libqpidcommon.so.2 #4 0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable(void*) () from /usr/lib/libqpidcommon.so.2 #5 0x003a3832 in start_thread () from /lib/libpthread.so.0 #6 0x00984e0e in clone () from /lib/libc.so.6 Thread 5 (Thread 0xb72ffb90 (LWP 27978)): #0 0x0023d410 in __kernel_vsyscall () #1 0x00985486 in epoll_wait () from /lib/libc.so.6 #2 0x004ccafa in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib/libqpidcommon.so.2 #3 0x004cd666 in qpid::sys::Poller::run() () from /usr/lib/libqpidcommon.so.2 #4 0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable(void*) () from /usr/lib/libqpidcommon.so.2 #5 0x003a3832 in start_thread () from /lib/libpthread.so.0 #6 0x00984e0e in clone () from /lib/libc.so.6 Thread 4 (Thread 0xb5efdb90 (LWP 27980)): #0 0x0023d410 in __kernel_vsyscall () #1 0x00985486 in epoll_wait () from /lib/libc.so.6 #2 0x004ccafa in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib/libqpidcommon.so.2 #3 0x004cd666 in qpid::sys::Poller::run() () from /usr/lib/libqpidcommon.so.2 #4 0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable(void*) () from /usr/lib/libqpidcommon.so.2 #5 0x003a3832 in start_thread () from /lib/libpthread.so.0 #6 0x00984e0e in clone () from /lib/libc.so.6 Thread 3 (Thread 0xb54fcb90 (LWP 27988)): #0 0x0023d410 in __kernel_vsyscall () #1 0x003a7bc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x009914dd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x007eebf1 in qpid::sys::Waitable::wait() () from /usr/lib/libqpidclient.so.2 #4 0x007ec4ee in qpid::client::LocalQueueImpl::get(qpid::client::Message&, qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2 #5 0x007ed0be in qpid::client::LocalQueueImpl::get(qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2 #6 0x007ed2a7 in qpid::client::LocalQueueImpl::pop(qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2 #7 0x007e9d79 in qpid::client::LocalQueue::pop(qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2 #8 0x0805dde6 in ?? () #9 0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable(void*) () from /usr/lib/libqpidcommon.so.2 #10 0x003a3832 in start_thread () from /lib/libpthread.so.0 #11 0x00984e0e in clone () from /lib/libc.so.6 Thread 2 (Thread 0xb4afbb90 (LWP 27989)): #0 0x0023d410 in __kernel_vsyscall () #1 0x00985486 in epoll_wait () from /lib/libc.so.6 #2 0x004ccafa in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib/libqpidcommon.so.2 #3 0x004cd666 in qpid::sys::Poller::run() () from /usr/lib/libqpidcommon.so.2 #4 0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable(void*) () from /usr/lib/libqpidcommon.so.2 #5 0x003a3832 in start_thread () from /lib/libpthread.so.0 #6 0x00984e0e in clone () from /lib/libc.so.6 Thread 1 (Thread 0xb7eff720 (LWP 27976)): #0 0x0023d410 in __kernel_vsyscall () #1 0x003a7bc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x009914dd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x007eebf1 in qpid::sys::Waitable::wait() () from /usr/lib/libqpidclient.so.2 #4 0x007ec4ee in qpid::client::LocalQueueImpl::get(qpid::client::Message&, qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2 #5 0x007ed0be in qpid::client::LocalQueueImpl::get(qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2 #6 0x007ed2a7 in qpid::client::LocalQueueImpl::pop(qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2 #7 0x007e9d79 in qpid::client::LocalQueue::pop(qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2 #8 0x0805f222 in ?? () #9 0x0805f841 in ?? () #10 0x0805247c in std::ios_base::Init::~Init() () #11 0x008c8e9c in __libc_start_main () from /lib/libc.so.6 #12 0x08051111 in std::ios_base::Init::~Init() () [root@mrg-qe-01 ~]# pstack 27976 [root@mrg-qe-01 ~]# ps 27976 PID TTY STAT TIME COMMAND 27976 pts/0 Sl+ 0:12 perftest --unique-data yes --base-name=mrg-qe-01.lab.eng.brq.redhat.com_perftest_002_ -s --mechanism ANONYMOUS --user [root@mrg-qe-01 ~]# gdb GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. (gdb) attach 27976 Attaching to process 27976 Reading symbols from /usr/bin/perftest...Reading symbols from /usr/lib/debug/usr/bin/perftest.debug...warning: section .gnu.liblist not found in /usr/lib/debug/usr/bin/perftest.debug warning: section .dynbss not found in /usr/lib/debug/usr/bin/perftest.debug warning: section .gnu.conflict not found in /usr/lib/debug/usr/bin/perftest.debug done. done. Reading symbols from /usr/lib/libqpidclient.so.2...Reading symbols from /usr/lib/debug/usr/lib/libqpidclient.so.2.0.0.debug...done. done. ... Reading symbols from /usr/lib/sasl2/liblogin.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib/sasl2/liblogin.so.2 warning: section .gnu.liblist not found in /usr/lib/debug/usr/bin/perftest.debug warning: section .dynbss not found in /usr/lib/debug/usr/bin/perftest.debug warning: section .gnu.conflict not found in /usr/lib/debug/usr/bin/perftest.debug 0x0023d410 in __kernel_vsyscall () (gdb) info threads 6 Thread 0xb7efeb90 (LWP 27977) 0x0023d410 in __kernel_vsyscall () 5 Thread 0xb72ffb90 (LWP 27978) 0x0023d410 in __kernel_vsyscall () 4 Thread 0xb5efdb90 (LWP 27980) 0x0023d410 in __kernel_vsyscall () 3 Thread 0xb54fcb90 (LWP 27988) 0x0023d410 in __kernel_vsyscall () 2 Thread 0xb4afbb90 (LWP 27989) 0x0023d410 in __kernel_vsyscall () * 1 Thread 0xb7eff720 (LWP 27976) 0x0023d410 in __kernel_vsyscall () (gdb) thread apply all bt Thread 6 (Thread 0xb7efeb90 (LWP 27977)): #0 0x0023d410 in __kernel_vsyscall () #1 0x00985486 in epoll_wait () from /lib/libc.so.6 #2 0x004ccafa in qpid::sys::Poller::wait (this=0x96bd0b0, timeout=...) at qpid/sys/epoll/EpollPoller.cpp:524 #3 0x004cd666 in qpid::sys::Poller::run (this=0x96bd0b0) at qpid/sys/epoll/EpollPoller.cpp:479 #4 0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable (p=0x96bd0b0) at qpid/sys/posix/Thread.cpp:35 #5 0x003a3832 in start_thread () from /lib/libpthread.so.0 #6 0x00984e0e in clone () from /lib/libc.so.6 Thread 5 (Thread 0xb72ffb90 (LWP 27978)): #0 0x0023d410 in __kernel_vsyscall () #1 0x00985486 in epoll_wait () from /lib/libc.so.6 #2 0x004ccafa in qpid::sys::Poller::wait (this=0x96bd0b0, timeout=...) at qpid/sys/epoll/EpollPoller.cpp:524 #3 0x004cd666 in qpid::sys::Poller::run (this=0x96bd0b0) at qpid/sys/epoll/EpollPoller.cpp:479 #4 0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable (p=0x96bd0b0) at qpid/sys/posix/Thread.cpp:35 #5 0x003a3832 in start_thread () from /lib/libpthread.so.0 #6 0x00984e0e in clone () from /lib/libc.so.6 Thread 4 (Thread 0xb5efdb90 (LWP 27980)): #0 0x0023d410 in __kernel_vsyscall () #1 0x00985486 in epoll_wait () from /lib/libc.so.6 #2 0x004ccafa in qpid::sys::Poller::wait (this=0x96bd0b0, timeout=...) at qpid/sys/epoll/EpollPoller.cpp:524 #3 0x004cd666 in qpid::sys::Poller::run (this=0x96bd0b0) at qpid/sys/epoll/EpollPoller.cpp:479 #4 0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable (p=0x96bd0b0) at qpid/sys/posix/Thread.cpp:35 #5 0x003a3832 in start_thread () from /lib/libpthread.so.0 #6 0x00984e0e in clone () from /lib/libc.so.6 Thread 3 (Thread 0xb54fcb90 (LWP 27988)): #0 0x0023d410 in __kernel_vsyscall () #1 0x003a7bc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x009914dd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x007eebf1 in wait (this=0x96bf100) at ../include/qpid/sys/posix/Condition.h:63 #4 wait (this=0x96bf100) at ../include/qpid/sys/Monitor.h:41 #5 qpid::sys::Waitable::wait (this=0x96bf100) at qpid/sys/Waitable.h:88 #6 0x007ec4ee in pop (this=0x98c5be0, result=..., timeout=...) at qpid/sys/BlockingQueue.h:55 #7 qpid::client::LocalQueueImpl::get (this=0x98c5be0, result=..., timeout=...) at qpid/client/LocalQueueImpl.cpp:49 #8 0x007ed0be in qpid::client::LocalQueueImpl::get (this=0x98c5be0, timeout=...) at qpid/client/LocalQueueImpl.cpp:40 #9 0x007ed2a7 in qpid::client::LocalQueueImpl::pop (this=0x98c5be0, timeout=...) at qpid/client/LocalQueueImpl.cpp:36 ---Type <return> to continue, or q <return> to quit--- #10 0x007e9d79 in qpid::client::LocalQueue::pop (this=0xb54fc260, timeout=...) at qpid/client/LocalQueue.cpp:43 #11 0x0805dde6 in qpid::tests::SubscribeThread::run (this=0xb7301530) at perftest.cpp:626 #12 0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable (p=0xb7301530) at qpid/sys/posix/Thread.cpp:35 #13 0x003a3832 in start_thread () from /lib/libpthread.so.0 #14 0x00984e0e in clone () from /lib/libc.so.6 Thread 2 (Thread 0xb4afbb90 (LWP 27989)): #0 0x0023d410 in __kernel_vsyscall () #1 0x00985486 in epoll_wait () from /lib/libc.so.6 #2 0x004ccafa in qpid::sys::Poller::wait (this=0x96bd0b0, timeout=...) at qpid/sys/epoll/EpollPoller.cpp:524 #3 0x004cd666 in qpid::sys::Poller::run (this=0x96bd0b0) at qpid/sys/epoll/EpollPoller.cpp:479 #4 0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable (p=0x96bd0b0) at qpid/sys/posix/Thread.cpp:35 #5 0x003a3832 in start_thread () from /lib/libpthread.so.0 #6 0x00984e0e in clone () from /lib/libc.so.6 Thread 1 (Thread 0xb7eff720 (LWP 27976)): #0 0x0023d410 in __kernel_vsyscall () #1 0x003a7bc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x009914dd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x007eebf1 in wait (this=0x98c7718) at ../include/qpid/sys/posix/Condition.h:63 #4 wait (this=0x98c7718) at ../include/qpid/sys/Monitor.h:41 #5 qpid::sys::Waitable::wait (this=0x98c7718) at qpid/sys/Waitable.h:88 #6 0x007ec4ee in pop (this=0x98c7c48, result=..., timeout=...) at qpid/sys/BlockingQueue.h:55 #7 qpid::client::LocalQueueImpl::get (this=0x98c7c48, result=..., timeout=...) at qpid/client/LocalQueueImpl.cpp:49 #8 0x007ed0be in qpid::client::LocalQueueImpl::get (this=0x98c7c48, timeout=...) at qpid/client/LocalQueueImpl.cpp:40 #9 0x007ed2a7 in qpid::client::LocalQueueImpl::pop (this=0x98c7c48, timeout=...) at qpid/client/LocalQueueImpl.cpp:36 #10 0x007e9d79 in qpid::client::LocalQueue::pop (this=0xbf876b9c, timeout=...) at qpid/client/LocalQueue.cpp:43 #11 0x0805f222 in qpid::tests::Controller::process(size_t, qpid::client::LocalQueue, std::string, boost::function<void ()(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&),std::allocator<void> >) (this=0xbf876cc8, n=1, lq=..., queue=..., msgFn=...) at perftest.cpp:382 #12 0x0805f841 in qpid::tests::Controller::run (this=0xbf876cc8) at perftest.cpp:422 #13 0x0805247c in main (argc=Cannot access memory at address 0x80 ) at perftest.cpp:719 (gdb) quit A debugging session is active. Inferior 1 [process 27976] will be detached. Quit anyway? (y or n) n Not confirmed. (gdb) detach Version-Release number of selected component (if applicable): qpid-cpp-server-0.7.935473-1.el5 python-qpid-0.7.934605-1.el5 python-qmf-0.7.934605-1.el5 qpid-tools-0.7.934605-2.el5 How reproducible: >80% Steps to Reproduce: follow https://bugzilla.redhat.com/show_bug.cgi?id=558526#c5 Actual results: Perftest client hangs locks up. Expected results: Perftest client should not hang / lock up. Additional info:
So far seen on RHEL 5.5 i386 / x86_64 and also apart of the cluster on RHEL4.8 (one occurrecnce, I do not have pstack yet) Another RHEL 5.5 x86_64 case: [root@mrg-qe-02 ~]# gdb GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. (gdb) attach 14717 Attaching to process 14717 Reading symbols from /usr/bin/perftest...Reading symbols from /usr/lib/debug/usr/bin/perftest.debug...warning: section .gnu.liblist not found in /usr/lib/debug/usr/bin/perftest.debug warning: section .dynbss not found in /usr/lib/debug/usr/bin/perftest.debug warning: section .gnu.conflict not found in /usr/lib/debug/usr/bin/perftest.debug done. ... Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] [New Thread 0x443ec940 (LWP 14807)] [New Thread 0x40efd940 (LWP 14806)] [New Thread 0x439eb940 (LWP 14779)] [New Thread 0x425e9940 (LWP 14754)] [New Thread 0x41be8940 (LWP 14718)] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done. ... Loaded symbols for /usr/lib64/sasl2/liblogin.so.2 warning: section .gnu.liblist not found in /usr/lib/debug/usr/bin/perftest.debug warning: section .dynbss not found in /usr/lib/debug/usr/bin/perftest.debug warning: section .gnu.conflict not found in /usr/lib/debug/usr/bin/perftest.debug 0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) info threads 6 Thread 0x41be8940 (LWP 14718) 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 5 Thread 0x425e9940 (LWP 14754) 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 4 Thread 0x439eb940 (LWP 14779) 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 3 Thread 0x40efd940 (LWP 14806) 0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 0x443ec940 (LWP 14807) 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 * 1 Thread 0x2b6050505860 (LWP 14717) 0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) thread apply all bt Thread 6 (Thread 0x41be8940 (LWP 14718)): #0 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x0000003c563295af in qpid::sys::Poller::wait (this=0xdf82dc0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524 #2 0x0000003c56329fd2 in qpid::sys::Poller::run (this=0xdf82dc0) at qpid/sys/epoll/EpollPoller.cpp:479 #3 0x0000003c563201ca in qpid::sys::(anonymous namespace)::runRunnable (p=0x6) at qpid/sys/posix/Thread.cpp:35 #4 0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000036a10d3d1d in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x425e9940 (LWP 14754)): #0 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x0000003c563295af in qpid::sys::Poller::wait (this=0xdf82dc0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524 #2 0x0000003c56329fd2 in qpid::sys::Poller::run (this=0xdf82dc0) at qpid/sys/epoll/EpollPoller.cpp:479 #3 0x0000003c563201ca in qpid::sys::(anonymous namespace)::runRunnable (p=0x6) at qpid/sys/posix/Thread.cpp:35 #4 0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000036a10d3d1d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x439eb940 (LWP 14779)): #0 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x0000003c563295af in qpid::sys::Poller::wait (this=0xdf82dc0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524 #2 0x0000003c56329fd2 in qpid::sys::Poller::run (this=0xdf82dc0) at qpid/sys/epoll/EpollPoller.cpp:479 #3 0x0000003c563201ca in qpid::sys::(anonymous namespace)::runRunnable (p=0x6) at qpid/sys/posix/Thread.cpp:35 #4 0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000036a10d3d1d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x40efd940 (LWP 14806)): #0 0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003c568aa629 in wait (this=0xe3944f0) at ../include/qpid/sys/posix/Condition.h:63 #2 wait (this=0xe3944f0) at ../include/qpid/sys/Monitor.h:41 #3 qpid::sys::Waitable::wait (this=0xe3944f0) at qpid/sys/Waitable.h:88 #4 0x0000003c568a9860 in pop (this=0xe393c90, result=..., timeout=<value optimized out>) at qpid/sys/BlockingQueue.h:55 #5 qpid::client::LocalQueueImpl::get (this=0xe393c90, result=..., timeout=<value optimized out>) at qpid/client/LocalQueueImpl.cpp:49 #6 0x0000003c568aa069 in qpid::client::LocalQueueImpl::get (this=0xe393c90, timeout=...) at qpid/client/LocalQueueImpl.cpp:40 #7 0x0000003c568aa209 in qpid::client::LocalQueueImpl::pop (this=0x80, timeout=...) at qpid/client/LocalQueueImpl.cpp:36 #8 0x0000003c568a696c in qpid::client::LocalQueue::pop (this=<value optimized out>, timeout=...) at qpid/client/LocalQueue.cpp:43 #9 0x000000000041c752 in qpid::tests::SubscribeThread::run (this=0xdf8b260) at perftest.cpp:626 #10 0x0000003c563201ca in qpid::sys::(anonymous namespace)::runRunnable (p=0xe39451c) at qpid/sys/posix/Thread.cpp:35 #11 0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0 #12 0x00000036a10d3d1d in clone () from /lib64/libc.so.6 ---Type <return> to continue, or q <return> to quit--- Thread 2 (Thread 0x443ec940 (LWP 14807)): #0 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x0000003c563295af in qpid::sys::Poller::wait (this=0xdf82dc0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524 #2 0x0000003c56329fd2 in qpid::sys::Poller::run (this=0xdf82dc0) at qpid/sys/epoll/EpollPoller.cpp:479 #3 0x0000003c563201ca in qpid::sys::(anonymous namespace)::runRunnable (p=0x6) at qpid/sys/posix/Thread.cpp:35 #4 0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000036a10d3d1d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2b6050505860 (LWP 14717)): #0 0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003c568aa629 in wait (this=0x2aaaac204310) at ../include/qpid/sys/posix/Condition.h:63 #2 wait (this=0x2aaaac204310) at ../include/qpid/sys/Monitor.h:41 #3 qpid::sys::Waitable::wait (this=0x2aaaac204310) at qpid/sys/Waitable.h:88 #4 0x0000003c568a9860 in pop (this=0x2aaaac2020a0, result=..., timeout=<value optimized out>) at qpid/sys/BlockingQueue.h:55 #5 qpid::client::LocalQueueImpl::get (this=0x2aaaac2020a0, result=..., timeout=<value optimized out>) at qpid/client/LocalQueueImpl.cpp:49 #6 0x0000003c568aa069 in qpid::client::LocalQueueImpl::get (this=0x2aaaac2020a0, timeout=...) at qpid/client/LocalQueueImpl.cpp:40 #7 0x0000003c568aa209 in qpid::client::LocalQueueImpl::pop (this=0x80, timeout=...) at qpid/client/LocalQueueImpl.cpp:36 #8 0x0000003c568a696c in qpid::client::LocalQueue::pop (this=<value optimized out>, timeout=...) at qpid/client/LocalQueue.cpp:43 #9 0x000000000040fb47 in qpid::tests::Controller::process(size_t, qpid::client::LocalQueue, std::string, boost::function<void ()(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&),std::allocator<void> >) (this=<value optimized out>, n=1, lq=..., queue=<value optimized out>, msgFn=...) at perftest.cpp:382 #10 0x0000000000412a76 in qpid::tests::Controller::run (this=0x7fff43f3eb30) at perftest.cpp:422 #11 0x000000000040cc1b in main (argc=234371408, argv=<value optimized out>) at perftest.cpp:719 (gdb) quit
The backtrace and behavior seems identical to bug 501668. Going to prove that an exception was thrown to prove that.
I took some time to retest the issue and although I first believed that there is no issue I triggered cases when perftest (c++ client library) hangs without notification that queue policy was violated: One case I saw: perftest log: [root@mrg-qe-02 bz587190]# cat perftest_001.log 2010-05-24 10:52:53 info Connecting to tcp:localhost:5672 2010-05-24 10:52:53 debug Created IO thread: 0 2010-05-24 10:52:53 debug TCPConnector created for \x00- 2010-05-24 10:52:53 debug RECV [40796 localhost:5672] INIT(0-10) 2010-05-24 10:52:53 debug CyrusSasl::start(ANONYMOUS) 2010-05-24 10:52:53 debug min_ssf: 0, max_ssf: 256 2010-05-24 10:52:53 debug getUserFromSettings(): guest 2010-05-24 10:52:53 debug CyrusSasl::start(ANONYMOUS): selected ANONYMOUS response: 'guest.eng.brq.redhat.com' 2010-05-24 10:52:53 debug Known-brokers for connection: amqp:tcp:10.34.33.55:5672,tcp:10.34.45.2:5672,tcp:192.168.122.1:5672 2010-05-24 10:52:53 debug No security layer in place 2010-05-24 10:52:53 debug SessionState::SessionState .: 0x7661098 2010-05-24 10:52:54 debug Exception constructed: Closed by client 2010-05-24 10:52:54 info Connecting to tcp:localhost:5672 2010-05-24 10:52:54 debug TCPConnector created for \x00- 2010-05-24 10:52:54 debug RECV [40797 localhost:5672] INIT(0-10) 2010-05-24 10:52:54 debug CyrusSasl::start(ANONYMOUS) 2010-05-24 10:52:54 debug min_ssf: 0, max_ssf: 256 2010-05-24 10:52:54 debug getUserFromSettings(): guest 2010-05-24 10:52:54 debug CyrusSasl::start(ANONYMOUS): selected ANONYMOUS response: 'guest.eng.brq.redhat.com' 2010-05-24 10:52:54 debug Known-brokers for connection: amqp:tcp:10.34.33.55:5672,tcp:10.34.45.2:5672,tcp:192.168.122.1:5672 2010-05-24 10:52:54 debug No security layer in place 2010-05-24 10:52:54 debug SessionState::SessionState .: 0x7457568 2010-05-24 10:52:54 info Connecting to tcp:localhost:5672 2010-05-24 10:52:54 debug Created IO thread: 1 2010-05-24 10:52:54 debug TCPConnector created for \x00- 2010-05-24 10:52:54 debug RECV [40798 localhost:5672] INIT(0-10) 2010-05-24 10:52:54 debug CyrusSasl::start(ANONYMOUS) 2010-05-24 10:52:54 debug min_ssf: 0, max_ssf: 256 2010-05-24 10:52:54 debug getUserFromSettings(): guest 2010-05-24 10:52:54 debug CyrusSasl::start(ANONYMOUS): selected ANONYMOUS response: 'guest.eng.brq.redhat.com' 2010-05-24 10:52:54 debug Known-brokers for connection: amqp:tcp:10.34.33.55:5672,tcp:10.34.45.2:5672,tcp:192.168.122.1:5672 2010-05-24 10:52:54 debug No security layer in place 2010-05-24 10:52:54 debug SessionState::SessionState .: 0x7662a58 2010-05-24 10:52:54 info Connecting to tcp:localhost:5672 2010-05-24 10:52:54 debug Created IO thread: 2 2010-05-24 10:52:54 debug TCPConnector created for \x00- 2010-05-24 10:52:54 debug RECV [40799 localhost:5672] INIT(0-10) 2010-05-24 10:52:54 debug CyrusSasl::start(ANONYMOUS) 2010-05-24 10:52:54 debug min_ssf: 0, max_ssf: 256 2010-05-24 10:52:54 debug getUserFromSettings(): guest 2010-05-24 10:52:54 debug CyrusSasl::start(ANONYMOUS): selected ANONYMOUS response: 'guest.eng.brq.redhat.com' 2010-05-24 10:52:54 debug Known-brokers for connection: amqp:tcp:10.34.33.55:5672,tcp:10.34.45.2:5672,tcp:192.168.122.1:5672 2010-05-24 10:52:54 debug No security layer in place 2010-05-24 10:52:54 debug SessionState::SessionState .: 0x7866088 2010-05-24 10:52:54 warning Connection closed 2010-05-24 10:52:54 debug Exception constructed: Connection closed 2010-05-24 10:52:54 debug Exception constructed: Connection closed PublishThread exception: Connection closed perftest hang state: Thread 5 (Thread 0x42503940 (LWP 2958)): #0 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x000000313872bfef in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib64/libqpidcommon.so.2 #2 0x000000313872c9d2 in qpid::sys::Poller::run() () from /usr/lib64/libqpidcommon.so.2 #3 0x000000313872358a in ?? () from /usr/lib64/libqpidcommon.so.2 #4 0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000036a10d3d1d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x43905940 (LWP 2960)): #0 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x000000313872bfef in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib64/libqpidcommon.so.2 #2 0x000000313872c9d2 in qpid::sys::Poller::run() () from /usr/lib64/libqpidcommon.so.2 #3 0x000000313872358a in ?? () from /usr/lib64/libqpidcommon.so.2 #4 0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000036a10d3d1d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x4145b940 (LWP 2961)): #0 0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000031382873db in qpid::client::LocalQueueImpl::get(qpid::client::Message&, qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2 #2 0x0000003138287c49 in qpid::client::LocalQueueImpl::get(qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2 #3 0x0000003138287de9 in qpid::client::LocalQueueImpl::pop(qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2 #4 0x0000003138284dfc in qpid::client::LocalQueue::pop(qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2 #5 0x000000000041b0e2 in ?? () #6 0x000000313872358a in ?? () from /usr/lib64/libqpidcommon.so.2 #7 0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0 #8 0x00000036a10d3d1d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x44306940 (LWP 2962)): #0 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x000000313872bfef in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib64/libqpidcommon.so.2 #2 0x000000313872c9d2 in qpid::sys::Poller::run() () from /usr/lib64/libqpidcommon.so.2 #3 0x000000313872358a in ?? () from /usr/lib64/libqpidcommon.so.2 #4 0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000036a10d3d1d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2b8b9ba6c870 (LWP 2957)): #0 0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000031382873db in qpid::client::LocalQueueImpl::get(qpid::client::Message&, qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2 #2 0x0000003138287c49 in qpid::client::LocalQueueImpl::get(qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2 #3 0x0000003138287de9 in qpid::client::LocalQueueImpl::pop(qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2 #4 0x0000003138284dfc in qpid::client::LocalQueue::pop(qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2 #5 0x000000000040f7c7 in ?? () #6 0x0000000000413f66 in ?? () #7 0x000000000040ccdb in std::ios_base::Init::~Init() () #8 0x00000036a101d994 in __libc_start_main () from /lib64/libc.so.6 #9 0x000000000040aa79 in std::ios_base::Init::~Init() () #10 0x00007fffe2fdbc48 in ?? () #11 0x0000000000000000 in ?? () (gdb) info threads 5 Thread 0x42503940 (LWP 2958) 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 4 Thread 0x43905940 (LWP 2960) 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 3 Thread 0x4145b940 (LWP 2961) 0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 0x44306940 (LWP 2962) 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 * 1 Thread 0x2b8b9ba6c870 (LWP 2957) 0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) thread apply all bt Thread 5 (Thread 0x42503940 (LWP 2958)): #0 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x000000313872bfef in qpid::sys::Poller::wait (this=0x74601f0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524 #2 0x000000313872c9d2 in qpid::sys::Poller::run (this=0x74601f0) at qpid/sys/epoll/EpollPoller.cpp:479 #3 0x000000313872358a in qpid::sys::(anonymous namespace)::runRunnable (p=0x5) at qpid/sys/posix/Thread.cpp:35 #4 0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000036a10d3d1d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x43905940 (LWP 2960)): #0 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x000000313872bfef in qpid::sys::Poller::wait (this=0x74601f0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524 #2 0x000000313872c9d2 in qpid::sys::Poller::run (this=0x74601f0) at qpid/sys/epoll/EpollPoller.cpp:479 #3 0x000000313872358a in qpid::sys::(anonymous namespace)::runRunnable (p=0x5) at qpid/sys/posix/Thread.cpp:35 #4 0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000036a10d3d1d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x4145b940 (LWP 2961)): #0 0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000031382873db in pop (this=0x7662d90, result=..., timeout=<value optimized out>) at ../include/qpid/sys/posix/Condition.h:63 #2 qpid::client::LocalQueueImpl::get (this=0x7662d90, result=..., timeout=<value optimized out>) at qpid/client/LocalQueueImpl.cpp:49 #3 0x0000003138287c49 in qpid::client::LocalQueueImpl::get (this=0x7662d90, timeout=...) at qpid/client/LocalQueueImpl.cpp:40 #4 0x0000003138287de9 in qpid::client::LocalQueueImpl::pop (this=0x80, timeout=...) at qpid/client/LocalQueueImpl.cpp:36 #5 0x0000003138284dfc in qpid::client::LocalQueue::pop (this=<value optimized out>, timeout=...) at qpid/client/LocalQueue.cpp:43 #6 0x000000000041b0e2 in qpid::tests::SubscribeThread::run (this=0x74535a0) at perftest.cpp:626 #7 0x000000313872358a in qpid::sys::(anonymous namespace)::runRunnable (p=0x2aaab00018ec) at qpid/sys/posix/Thread.cpp:35 #8 0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0 #9 0x00000036a10d3d1d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x44306940 (LWP 2962)): #0 0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x000000313872bfef in qpid::sys::Poller::wait (this=0x74601f0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524 #2 0x000000313872c9d2 in qpid::sys::Poller::run (this=0x74601f0) at qpid/sys/epoll/EpollPoller.cpp:479 #3 0x000000313872358a in qpid::sys::(anonymous namespace)::runRunnable (p=0x5) at qpid/sys/posix/Thread.cpp:35 #4 0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000036a10d3d1d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2b8b9ba6c870 (LWP 2957)): #0 0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #1 0x00000031382873db in pop (this=0x2aaab0002000, result=..., timeout=<value optimized out>) at ../include/qpid/sys/posix/Condition.h:63 #2 qpid::client::LocalQueueImpl::get (this=0x2aaab0002000, result=..., timeout=<value optimized out>) at qpid/client/LocalQueueImpl.cpp:49 #3 0x0000003138287c49 in qpid::client::LocalQueueImpl::get (this=0x2aaab0002000, timeout=...) at qpid/client/LocalQueueImpl.cpp:40 #4 0x0000003138287de9 in qpid::client::LocalQueueImpl::pop (this=0x80, timeout=...) at qpid/client/LocalQueueImpl.cpp:36 #5 0x0000003138284dfc in qpid::client::LocalQueue::pop (this=<value optimized out>, timeout=...) at qpid/client/LocalQueue.cpp:43 #6 0x000000000040f7c7 in qpid::tests::Controller::process(size_t, qpid::client::LocalQueue, std::string, boost::function<void ()(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&),std::allocator<void> >) (this=<value optimized out>, n=1, lq=..., queue=<value optimized out>, msgFn=...) at perftest.cpp:382 #7 0x0000000000413f66 in qpid::tests::Controller::run (this=0x7fffe2fdb980) at perftest.cpp:422 #8 0x000000000040ccdb in main (argc=121976304, argv=<value optimized out>) at perftest.cpp:719 broker log: ... 2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-180(local) 2010-05-24 10:52:53 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-180 2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-180(local) 2010-05-24 10:52:53 info new client connection 10.34.45.2:1903-181(local) 2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-181(local) ssf=0 2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-181(local) 2010-05-24 10:52:53 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-181 2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-181(local) 2010-05-24 10:52:53 info new client connection 10.34.45.2:1903-182(local) 2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-182(local) ssf=0 2010-05-24 10:52:53 info new client connection 10.34.45.2:1903-183(local) 2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-183(local) ssf=0 2010-05-24 10:52:53 info new client connection 10.34.45.2:1903-184(local) 2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-184(local) ssf=0 2010-05-24 10:52:53 info new client connection 10.34.45.2:1903-185(local) 2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-185(local) ssf=0 2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-185(local) 2010-05-24 10:52:54 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-185 2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-185(local) 2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-186(local) 2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-186(local) ssf=0 2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-187(local) 2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-187(local) ssf=0 2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-188(local) 2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-188(local) ssf=0 2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-189(local) 2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-189(local) ssf=0 2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-186(local) 2010-05-24 10:52:54 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-186 2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-186(local) 2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-189(local) 2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-190(local) 2010-05-24 10:52:54 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-189 2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-189(local) 2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-190(local) ssf=0 2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-191(local) 2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-191(local) ssf=0 2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-192(local) 2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-192(local) ssf=0 2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-193(local) 2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-193(local) ssf=0 2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-193(local) 2010-05-24 10:52:55 info new client connection 10.34.45.2:1903-194(local) 2010-05-24 10:52:55 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-193 2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-193(local) 2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-194(local) ssf=0 2010-05-24 10:52:55 info new client connection 10.34.45.2:1903-195(local) 2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-195(local) ssf=0 2010-05-24 10:52:55 info new client connection 10.34.45.2:1903-196(local) 2010-05-24 10:52:55 info new client connection 10.34.45.2:1903-197(local) 2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-196(local) ssf=0 2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-197(local) ssf=0 2010-05-24 10:52:55 info new client connection 10.34.45.2:1903-198(local) 2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-198(local) ssf=0 2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-197(local) 2010-05-24 10:52:55 info new client connection 10.34.45.2:1903-199(local) 2010-05-24 10:52:56 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-197 2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-197(local) 2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-199(local) ssf=0 2010-05-24 10:52:56 info new client connection 10.34.45.2:1903-200(local) 2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-200(local) ssf=0 2010-05-24 10:52:56 info new client connection 10.34.45.2:1903-201(local) 2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-201(local) ssf=0 2010-05-24 10:52:56 info new client connection 10.34.45.2:1903-202(local) 2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-198(local) 2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-202(local) ssf=0 2010-05-24 10:52:56 info new client connection 10.34.45.2:1903-203(local) 2010-05-24 10:52:56 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-198 2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-198(local) 2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-203(local) ssf=0 2010-05-24 10:52:56 info new client connection 10.34.45.2:1903-204(local) 2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-204(local) ssf=0 2010-05-24 10:52:57 info new client connection 10.34.45.2:1903-205(local) 2010-05-24 10:52:57 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-205(local) ssf=0 2010-05-24 10:52:57 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-201(local) 2010-05-24 10:52:57 info new client connection 10.34.45.2:1903-206(local) 2010-05-24 10:52:57 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-201 2010-05-24 10:52:57 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-201(local) 2010-05-24 10:52:57 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-206(local) ssf=0 2010-05-24 10:52:57 info new client connection 10.34.45.2:1903-207(local) 2010-05-24 10:52:57 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-207(local) ssf=0 2010-05-24 10:52:57 info new client connection 10.34.45.2:1903-208(local) 2010-05-24 10:52:57 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-208(local) ssf=0 2010-05-24 10:53:18 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-205(local) 2010-05-24 10:53:19 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-205 2010-05-24 10:53:19 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-205(local) 2010-05-24 10:53:19 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-203(local) 2010-05-24 10:53:19 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-203 2010-05-24 10:53:19 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-203(local) 2010-05-24 10:53:19 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-204(local) 2010-05-24 10:53:19 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-204 2010-05-24 10:53:19 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-204(local) 2010-05-24 10:53:24 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-208(local) 2010-05-24 10:53:24 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-208 2010-05-24 10:53:24 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-208(local) 2010-05-24 10:53:24 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-206(local) 2010-05-24 10:53:24 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-206 2010-05-24 10:53:24 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-206(local) 2010-05-24 10:53:24 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-207(local) 2010-05-24 10:53:24 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-207 2010-05-24 10:53:24 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-207(local) 2010-05-24 10:53:26 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-192(local) 2010-05-24 10:53:26 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-192 2010-05-24 10:53:26 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-192(local) 2010-05-24 10:53:26 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-190(local) 2010-05-24 10:53:26 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-190 2010-05-24 10:53:26 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-190(local) 2010-05-24 10:53:26 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-191(local) 2010-05-24 10:53:26 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-191 2010-05-24 10:53:26 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-191(local) 2010-05-24 10:53:36 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-184(local) 2010-05-24 10:53:36 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-184 2010-05-24 10:53:36 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-184(local) 2010-05-24 10:53:36 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-182(local) 2010-05-24 10:53:36 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-182 2010-05-24 10:53:36 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-182(local) 2010-05-24 10:53:36 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-183(local) 2010-05-24 10:53:36 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-183 2010-05-24 10:53:36 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-183(local) 2010-05-24 10:53:40 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-202(local) 2010-05-24 10:53:40 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-202 2010-05-24 10:53:40 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-202(local) 2010-05-24 10:53:40 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-199(local) 2010-05-24 10:53:40 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-199 2010-05-24 10:53:40 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-199(local) 2010-05-24 10:53:40 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-200(local) 2010-05-24 10:53:40 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-200 2010-05-24 10:53:40 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-200(local) 2010-05-24 10:53:42 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-196(local) 2010-05-24 10:53:42 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-196 2010-05-24 10:53:42 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-196(local) 2010-05-24 10:53:42 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-194(local) 2010-05-24 10:53:42 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-194 2010-05-24 10:53:42 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-194(local) 2010-05-24 10:53:42 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-195(local) 2010-05-24 10:53:42 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-195 2010-05-24 10:53:42 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-195(local) The bug is seen when below attached reproducer is used. All broker modules used clustering enabled (can be even one node). Packages: condor-qmf-7.4.3-0.14.el5 openais-0.80.6-16.el5_5.1 openais-debuginfo-0.80.6-16.el5_5.1 openais-devel-0.80.6-16.el5_5.1 python-qmf-0.7.946106-3.el5 python-qpid-0.7.946106-1.el5 qmf-0.7.946106-1.el5 qmf-devel-0.7.946106-1.el5 qpid-cpp-client-0.7.946106-1.el5 qpid-cpp-client-devel-0.7.946106-1.el5 qpid-cpp-client-devel-docs-0.7.946106-1.el5 qpid-cpp-client-ssl-0.7.946106-1.el5 qpid-cpp-mrg-debuginfo-0.7.946106-1.el5 qpid-cpp-server-0.7.946106-1.el5 qpid-cpp-server-cluster-0.7.946106-1.el5 qpid-cpp-server-devel-0.7.946106-1.el5 qpid-cpp-server-ssl-0.7.946106-1.el5 qpid-cpp-server-store-0.7.946106-1.el5 qpid-cpp-server-xml-0.7.946106-1.el5 qpid-java-client-0.7.946106-1.el5 qpid-java-common-0.7.946106-1.el5 qpid-tools-0.7.946106-2.el5 [root@mrg-qe-02 tmp]# cat /etc/ais/openais.conf # Please read the openais.conf.5 manual page totem { version: 2 secauth: off threads: 0 interface { ringnumber: 0 bindnetaddr: 10.34.45.0 mcastaddr: 225.0.10.11 mcastport: 54001 } } logging { debug: on timestamp: on to_file: yes logfile: /tmp/openais.log } amf { mode: disabled } [root@mrg-qe-02 tmp]# cat /etc/qpidd.conf # # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. # # Configuration file for qpidd. Entries are of the form: # name=value # # (Note: no spaces on either side of '='). Using default settings: # "qpidd --help" or "man qpidd" for more details. cluster-mechanism=ANONYMOUS cluster-name=X auth=no log-enable=info+ log-enable=debug+:cluster log-to-file=/tmp/qpidd.conf truncate=yes
Created attachment 416157 [details] The reproducer How to run: 1] installqpidd and openais 2] configure qpidd and openais as detailed above 3] start openais 4] killall perftest qpidd ; ./bz587190.sh 20 1 0 ANONYMOUS; while true; do ./bz587190.sh $(( $((${RANDOM}%4)) + 4)) 0 0 ANONYMOUS|| break; done You should see hang like this: start (7 0 0 ANONYMOUS) perftest: no process killed tcp 0 0 0.0.0.0:5672 0.0.0.0:* LISTEN 1903/qpidd Brokers broker cluster uptime conn sess exch queue ============================================================== 10.34.33.55:5672 X(ACTIVE) 7m 59s 1 1 8 103 Launching the clients: ....... 7|0|7 > 10:52:56 up 11 days, 18 min, 2 users, load average: 4.43, 3.81, 2.40< 7|0|7 > 10:53:06 up 11 days, 19 min, 2 users, load average: 5.14, 3.98, 2.47< 7|0|7 > 10:53:16 up 11 days, 19 min, 2 users, load average: 4.88, 3.96, 2.48< 5|2|7 > 10:53:26 up 11 days, 19 min, 2 users, load average: 5.33, 4.10, 2.54< 3|4|7 > 10:53:36 up 11 days, 19 min, 2 users, load average: 5.95, 4.27, 2.61< 1|6|7 > 10:53:46 up 11 days, 19 min, 2 users, load average: 5.48, 4.23, 2.61< 1|6|7 > 10:53:56 up 11 days, 19 min, 2 users, load average: 4.64, 4.09, 2.59< 1|6|7 > 10:54:07 up 11 days, 20 min, 2 users, load average: 3.92, 3.95, 2.56< 1|6|7 > 10:54:17 up 11 days, 20 min, 2 users, load average: 3.32, 3.82, 2.53< 1|6|7 > 10:54:27 up 11 days, 20 min, 2 users, load average: 2.81, 3.70, 2.50< 1|6|7 > 10:54:37 up 11 days, 20 min, 2 users, load average: 2.38, 3.57, 2.47< 1|6|7 > 10:54:47 up 11 days, 20 min, 2 users, load average: 2.01, 3.46, 2.45< ... 1|6|7 > 11:17:29 up 11 days, 43 min, 2 users, load average: 0.00, 0.05, 0.56<
Small clarification: When cluster is not enabled, no such behavior is seen (tested as is twice with and without clustering enabled).
There was found case on RHEL4.8 i386 (non clustered environment) with same hang, so the issue is not dependent on cluster module. The real reproducibility in cluster environment is much more rapid. Interesting might be Thread#1 qpid::tests::Controller::process( queue=...) call with queue parameter value. Following perftest state trace is from RHEL 4.8 i386: Attaching to process 27355 Reading symbols from /usr/bin/perftest...Reading symbols from /usr/lib/debug/usr/bin/perftest.debug...done. Using host libthread_db library "/lib/tls/libthread_db.so.1". done. Reading symbols from /usr/lib/libqpidclient.so.2...Reading symbols from /usr/lib/debug/usr/lib/libqpidclient.so.2.0.0.debug...done. done. Loaded symbols for /usr/lib/libqpidclient.so.2 Reading symbols from /usr/lib/libqpidcommon.so.2...Reading symbols from /usr/lib/debug/usr/lib/libqpidcommon.so.2.0.0.debug...done. done. Loaded symbols for /usr/lib/libqpidcommon.so.2 Reading symbols from /usr/lib/libboost_program_options.so.1...done. ... [Thread debugging using libthread_db enabled] [New Thread -1209071840 (LWP 27355)] [New Thread -1251034208 (LWP 27360)] [New Thread -1240544352 (LWP 27359)] [New Thread -1230054496 (LWP 27358)] [New Thread -1219564640 (LWP 27357)] [New Thread -1209074784 (LWP 27356)] Loaded symbols for /lib/tls/libpthread.so.0 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /usr/lib/qpid/client/sslconnector.so...Reading symbols from /usr/lib/debug/usr/lib/qpid/client/sslconnector.so.debug...done. done. Loaded symbols for /usr/lib/qpid/client/sslconnector.so Reading symbols from /usr/lib/libsslcommon.so.2...Reading symbols from /usr/lib/debug/usr/lib/libsslcommon.so.2.0.0.debug...done. done. ... 0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) info threads 6 Thread -1209074784 (LWP 27356) 0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 5 Thread -1219564640 (LWP 27357) 0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 4 Thread -1230054496 (LWP 27358) 0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 3 Thread -1240544352 (LWP 27359) 0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 2 Thread -1251034208 (LWP 27360) 0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 1 Thread -1209071840 (LWP 27355) 0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) thread apply all bt Thread 6 (Thread -1209074784 (LWP 27356)): #0 0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x00bb862e in epoll_wait () from /lib/tls/libc.so.6 #2 0x00421020 in qpid::sys::Poller::wait (this=0x8bf2ca8, timeout={nanosecs = 9223372036854775807}) at qpid/sys/epoll/EpollPoller.cpp:524 #3 0x00421f5c in qpid::sys::Poller::run (this=0x8bf2ca8) at qpid/sys/epoll/EpollPoller.cpp:479 #4 0x004154f1 in qpid::sys::(anonymous namespace)::runRunnable (p=0xfffffffc) at qpid/sys/posix/Thread.cpp:35 #5 0x00d515cc in start_thread () from /lib/tls/libpthread.so.0 #6 0x00bb7fae in clone () from /lib/tls/libc.so.6 Thread 5 (Thread -1219564640 (LWP 27357)): #0 0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x00bb862e in epoll_wait () from /lib/tls/libc.so.6 #2 0x00421020 in qpid::sys::Poller::wait (this=0x8bf2ca8, timeout={nanosecs = 9223372036854775807}) at qpid/sys/epoll/EpollPoller.cpp:524 #3 0x00421f5c in qpid::sys::Poller::run (this=0x8bf2ca8) at qpid/sys/epoll/EpollPoller.cpp:479 #4 0x004154f1 in qpid::sys::(anonymous namespace)::runRunnable (p=0xfffffffc) at qpid/sys/posix/Thread.cpp:35 #5 0x00d515cc in start_thread () from /lib/tls/libpthread.so.0 #6 0x00bb7fae in clone () from /lib/tls/libc.so.6 Thread 4 (Thread -1230054496 (LWP 27358)): #0 0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x00d53ef6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0x00bc4ad7 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libc.so.6 #3 0x00177b90 in qpid::client::Bounds::expand (this=0x8dfa940, sizeRequired=1036, block=true) at ../include/qpid/sys/posix/Condition.h:63 #4 0x001d5a1a in qpid::client::SessionImpl::sendFrame (this=0x8bf3400, frame=@0x40c, canBlock=false) at /usr/include/boost/shared_ptr.hpp:252 #5 0x001d5a80 in qpid::client::SessionImpl::handleOut (this=0xfffffffc, frame=@0xfffffffc) at qpid/client/SessionImpl.cpp:500 #6 0x001d6013 in qpid::client::SessionImpl::sendContent (this=0x8bf3400, content=@0x8dfc158) at qpid/client/SessionImpl.cpp:428 #7 0x001dc30b in qpid::client::SessionImpl::sendCommand (this=0x8bf3400, command=@0xb6aed0f0, content=0x8dfc158) at qpid/client/SessionImpl.cpp:398 #8 0x001dc9db in qpid::client::SessionImpl::send (this=0xfffffffc, command=@0xfffffffc, content=@0xfffffffc) at qpid/client/SessionImpl.cpp:306 #9 0x0016503d in qpid::client::no_keyword::AsyncSession_0_10::messageTransfer (this=0x8bf2bc8, destination=@0x8bf2bd8, acceptMode=1 '\001', acquireMode=0 '\0', content=@0xb6aed2e0, sync=false) at /usr/include/boost/shared_ptr.hpp:252 #10 0x08061155 in qpid::tests::PublishThread::run (this=0x8bf2ba8) at ../../src/boost/parameter/aux_/arg_list.hpp:73 #11 0x004154f1 in qpid::sys::(anonymous namespace)::runRunnable (p=0xfffffffc) at qpid/sys/posix/Thread.cpp:35 #12 0x00d515cc in start_thread () from /lib/tls/libpthread.so.0 #13 0x00bb7fae in clone () from /lib/tls/libc.so.6 Thread 3 (Thread -1240544352 (LWP 27359)): #0 0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x00d53ef6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0x00bc4ad7 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libc.so.6 #3 0x001c1fcc in qpid::client::LocalQueueImpl::get (this=0x92006d8, result=@0xb60ec280, timeout={nanosecs = 9223372036854775807}) at ../include/qpid/sys/posix/Condition.h:63 #4 0x001c427a in qpid::client::LocalQueueImpl::get (this=0x92006d8, timeout={nanosecs = 9223372036854775804}) at qpid/client/LocalQueueImpl.cpp:40 #5 0x001c447d in qpid::client::LocalQueueImpl::pop (this=0xfffffffc, timeout={nanosecs = 9223372032559808512}) at qpid/client/LocalQueueImpl.cpp:36 #6 0x001c0e4d in qpid::client::LocalQueue::pop (this=0xfffffffc, timeout={nanosecs = 9223372036854775804}) at qpid/client/LocalQueue.cpp:43 #7 0x08061b14 in qpid::tests::SubscribeThread::run (this=0x8dfb368) at perftest.cpp:626 #8 0x004154f1 in qpid::sys::(anonymous namespace)::runRunnable (p=0xfffffffc) at qpid/sys/posix/Thread.cpp:35 #9 0x00d515cc in start_thread () from /lib/tls/libpthread.so.0 #10 0x00bb7fae in clone () from /lib/tls/libc.so.6 Thread 2 (Thread -1251034208 (LWP 27360)): #0 0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x00bb862e in epoll_wait () from /lib/tls/libc.so.6 #2 0x00421020 in qpid::sys::Poller::wait (this=0x8bf2ca8, timeout={nanosecs = 9223372036854775807}) at qpid/sys/epoll/EpollPoller.cpp:524 #3 0x00421f5c in qpid::sys::Poller::run (this=0x8bf2ca8) at qpid/sys/epoll/EpollPoller.cpp:479 #4 0x004154f1 in qpid::sys::(anonymous namespace)::runRunnable (p=0xfffffffc) at qpid/sys/posix/Thread.cpp:35 #5 0x00d515cc in start_thread () from /lib/tls/libpthread.so.0 #6 0x00bb7fae in clone () from /lib/tls/libc.so.6 Thread 1 (Thread -1209071840 (LWP 27355)): #0 0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x00d53ef6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0x00bc4ad7 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libc.so.6 #3 0x001c1fcc in qpid::client::LocalQueueImpl::get (this=0x92008b8, result=@0xbfe15660, timeout={nanosecs = 9223372036854775807}) at ../include/qpid/sys/posix/Condition.h:63 #4 0x001c427a in qpid::client::LocalQueueImpl::get (this=0x92008b8, timeout={nanosecs = 9223372036854775804}) at qpid/client/LocalQueueImpl.cpp:40 #5 0x001c447d in qpid::client::LocalQueueImpl::pop (this=0xfffffffc, timeout={nanosecs = 9223372032559808512}) at qpid/client/LocalQueueImpl.cpp:36 #6 0x001c0e4d in qpid::client::LocalQueue::pop (this=0xfffffffc, timeout={nanosecs = 9223372036854775804}) at qpid/client/LocalQueue.cpp:43 #7 0x08057576 in qpid::tests::Controller::process (this=0xbfe15a50, n=1, lq= {<qpid::client::Handle<qpid::client::LocalQueueImpl>> = {impl = 0x92008b8}, <No data fields>}, queue=Cannot access memory at address 0x0 ) at perftest.cpp:382 #8 0x08062af7 in qpid::tests::Controller::run (this=0xbfe15a50) at /usr/lib/gcc/i386-redhat-linux/3.4.6/../../../../include/c++/3.4.6/ext/new_allocator.h:62 #9 0x080500d0 in main (argc=15, argv=0xbfe15c34) at ../../include/qpid/client/Session_0_10.h:553 ---Type <return> to continue, or q <return> to quit--- #10 0x00b00e93 in __libc_start_main () from /lib/tls/libc.so.6 #11 0x0804f281 in _start () (gdb) detach Detaching from program: /usr/bin/perftest, process 27355
I cannot reproduce this on mrg32 (ran the script for 40 minutes) but by inspection there is a race condition in the use of Bounds in SessionImpl::sendFrame. This function sends the frame first, then calls Bounds::expand(). But it's possible the network thread calls Bounds::reduce() between sending the frame and calling expand. If the Bounds::current value is 0 that reduce() is lost. If enough reduce() calls are lost in this way eventually we will deadlock. The patch below should correct the problem: diff --git a/qpid/cpp/src/qpid/client/SessionImpl.cpp b/qpid/cpp/src/qpid/client/SessionImpl.cpp index b7ff430..b507625 100644 --- a/qpid/cpp/src/qpid/client/SessionImpl.cpp +++ b/qpid/cpp/src/qpid/client/SessionImpl.cpp @@ -510,8 +510,8 @@ void SessionImpl::proxyOut(AMQFrame& frame) // network thread void SessionImpl::sendFrame(AMQFrame& frame, bool canBlock) { - channel.handle(frame); connection->expand(frame.encodedSize(), canBlock); + channel.handle(frame); } void SessionImpl::deliver(AMQFrame& frame) // network thread
I agree with Alan's assessment.
Note that though Alan's analysis of the bug is sound, the suggested change would introduce a different hang (very reliably)... solution still sought.
Fixed on trunk as r948936 and by commit d040ae874743e188779161c54d9e460f5b9ab9fe in release repo.
The issue has been fixed, tested on RHEL 4.8 / 5.5 i386 / x86_64 on packages: python-qmf-0.7.946106-11.el5 python-qpid-0.7.946106-13.el5 qmf-*0.7.946106-12.el5 qpid-cpp-*-0.7.946106-12.el5 qpid-java-*0.7.946106-7.el5 qpid-tools-0.7.946106-9.el5 -> VERIFIED