Bug 587190 - qpid c++ (perftest) clients hang when ran concurrently in the batches agains the clustered broker
Summary: qpid c++ (perftest) clients hang when ran concurrently in the batches agains ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: Development
Hardware: All
OS: Linux
urgent
medium
Target Milestone: 1.3
: ---
Assignee: Gordon Sim
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-29 09:56 UTC by Frantisek Reznicek
Modified: 2015-11-16 01:12 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The reproducer (1.91 KB, application/x-shellscript)
2010-05-24 15:19 UTC, Frantisek Reznicek
no flags Details

Description Frantisek Reznicek 2010-04-29 09:56:53 UTC
Description of problem:

Let's have two nodes cluster which is under perftest load, each perftest client is run with unique --base-name=<ID> switch,some messages are dure. The messages are On top of that there are qmf queries running (qpid-stat -b and qpid-cluster).

The test launches multiple perftest clients in about one moment and waits for them until they finish. There is then huge stress when clients try to get connection to the broker and first minutes are really stressy for broker and for the machine as well.

There are observed perftest clients hang /locked up following way:

  [root@mrg-qe-01 ~]# pstack 27976
  Thread 6 (Thread 0xb7efeb90 (LWP 27977)):
  #0  0x0023d410 in __kernel_vsyscall ()
  #1  0x00985486 in epoll_wait () from /lib/libc.so.6
  #2  0x004ccafa in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib/libqpidcommon.so.2
  #3  0x004cd666 in qpid::sys::Poller::run() () from /usr/lib/libqpidcommon.so.2
  #4  0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable(void*) () from /usr/lib/libqpidcommon.so.2
  #5  0x003a3832 in start_thread () from /lib/libpthread.so.0
  #6  0x00984e0e in clone () from /lib/libc.so.6
  Thread 5 (Thread 0xb72ffb90 (LWP 27978)):
  #0  0x0023d410 in __kernel_vsyscall ()
  #1  0x00985486 in epoll_wait () from /lib/libc.so.6
  #2  0x004ccafa in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib/libqpidcommon.so.2
  #3  0x004cd666 in qpid::sys::Poller::run() () from /usr/lib/libqpidcommon.so.2
  #4  0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable(void*) () from /usr/lib/libqpidcommon.so.2
  #5  0x003a3832 in start_thread () from /lib/libpthread.so.0
  #6  0x00984e0e in clone () from /lib/libc.so.6
  Thread 4 (Thread 0xb5efdb90 (LWP 27980)):
  #0  0x0023d410 in __kernel_vsyscall ()
  #1  0x00985486 in epoll_wait () from /lib/libc.so.6
  #2  0x004ccafa in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib/libqpidcommon.so.2
  #3  0x004cd666 in qpid::sys::Poller::run() () from /usr/lib/libqpidcommon.so.2
  #4  0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable(void*) () from /usr/lib/libqpidcommon.so.2
  #5  0x003a3832 in start_thread () from /lib/libpthread.so.0
  #6  0x00984e0e in clone () from /lib/libc.so.6
  Thread 3 (Thread 0xb54fcb90 (LWP 27988)):
  #0  0x0023d410 in __kernel_vsyscall ()
  #1  0x003a7bc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  #2  0x009914dd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
  #3  0x007eebf1 in qpid::sys::Waitable::wait() () from /usr/lib/libqpidclient.so.2
  #4  0x007ec4ee in qpid::client::LocalQueueImpl::get(qpid::client::Message&, qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2
  #5  0x007ed0be in qpid::client::LocalQueueImpl::get(qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2
  #6  0x007ed2a7 in qpid::client::LocalQueueImpl::pop(qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2
  #7  0x007e9d79 in qpid::client::LocalQueue::pop(qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2
  #8  0x0805dde6 in ?? ()
  #9  0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable(void*) () from /usr/lib/libqpidcommon.so.2
  #10 0x003a3832 in start_thread () from /lib/libpthread.so.0
  #11 0x00984e0e in clone () from /lib/libc.so.6
  Thread 2 (Thread 0xb4afbb90 (LWP 27989)):
  #0  0x0023d410 in __kernel_vsyscall ()
  #1  0x00985486 in epoll_wait () from /lib/libc.so.6
  #2  0x004ccafa in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib/libqpidcommon.so.2
  #3  0x004cd666 in qpid::sys::Poller::run() () from /usr/lib/libqpidcommon.so.2
  #4  0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable(void*) () from /usr/lib/libqpidcommon.so.2
  #5  0x003a3832 in start_thread () from /lib/libpthread.so.0
  #6  0x00984e0e in clone () from /lib/libc.so.6
  Thread 1 (Thread 0xb7eff720 (LWP 27976)):
  #0  0x0023d410 in __kernel_vsyscall ()
  #1  0x003a7bc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  #2  0x009914dd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
  #3  0x007eebf1 in qpid::sys::Waitable::wait() () from /usr/lib/libqpidclient.so.2
  #4  0x007ec4ee in qpid::client::LocalQueueImpl::get(qpid::client::Message&, qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2
  #5  0x007ed0be in qpid::client::LocalQueueImpl::get(qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2
  #6  0x007ed2a7 in qpid::client::LocalQueueImpl::pop(qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2
  #7  0x007e9d79 in qpid::client::LocalQueue::pop(qpid::sys::Duration) () from /usr/lib/libqpidclient.so.2
  #8  0x0805f222 in ?? ()
  #9  0x0805f841 in ?? ()
  #10 0x0805247c in std::ios_base::Init::~Init() ()
  #11 0x008c8e9c in __libc_start_main () from /lib/libc.so.6
  #12 0x08051111 in std::ios_base::Init::~Init() ()
  [root@mrg-qe-01 ~]# pstack 27976
  [root@mrg-qe-01 ~]# ps 27976
    PID TTY      STAT   TIME COMMAND
  27976 pts/0    Sl+    0:12 perftest --unique-data yes --base-name=mrg-qe-01.lab.eng.brq.redhat.com_perftest_002_ -s --mechanism ANONYMOUS --user
  [root@mrg-qe-01 ~]# gdb
  GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
  Copyright (C) 2009 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "i386-redhat-linux-gnu".
  For bug reporting instructions, please see:
  <http://www.gnu.org/software/gdb/bugs/>.
  (gdb) attach 27976
  Attaching to process 27976
  Reading symbols from /usr/bin/perftest...Reading symbols from /usr/lib/debug/usr/bin/perftest.debug...warning: section .gnu.liblist not found in /usr/lib/debug/usr/bin/perftest.debug
  warning: section .dynbss not found in /usr/lib/debug/usr/bin/perftest.debug
  warning: section .gnu.conflict not found in /usr/lib/debug/usr/bin/perftest.debug
  done.
  done.
  Reading symbols from /usr/lib/libqpidclient.so.2...Reading symbols from /usr/lib/debug/usr/lib/libqpidclient.so.2.0.0.debug...done.
  done.
  ...
  Reading symbols from /usr/lib/sasl2/liblogin.so.2...(no debugging symbols found)...done.
  Loaded symbols for /usr/lib/sasl2/liblogin.so.2
  warning: section .gnu.liblist not found in /usr/lib/debug/usr/bin/perftest.debug
  warning: section .dynbss not found in /usr/lib/debug/usr/bin/perftest.debug
  warning: section .gnu.conflict not found in /usr/lib/debug/usr/bin/perftest.debug
  0x0023d410 in __kernel_vsyscall ()
  (gdb) info threads
    6 Thread 0xb7efeb90 (LWP 27977)  0x0023d410 in __kernel_vsyscall ()
    5 Thread 0xb72ffb90 (LWP 27978)  0x0023d410 in __kernel_vsyscall ()
    4 Thread 0xb5efdb90 (LWP 27980)  0x0023d410 in __kernel_vsyscall ()
    3 Thread 0xb54fcb90 (LWP 27988)  0x0023d410 in __kernel_vsyscall ()
    2 Thread 0xb4afbb90 (LWP 27989)  0x0023d410 in __kernel_vsyscall ()
  * 1 Thread 0xb7eff720 (LWP 27976)  0x0023d410 in __kernel_vsyscall ()
  (gdb) thread apply all bt
  
  Thread 6 (Thread 0xb7efeb90 (LWP 27977)):
  #0  0x0023d410 in __kernel_vsyscall ()
  #1  0x00985486 in epoll_wait () from /lib/libc.so.6
  #2  0x004ccafa in qpid::sys::Poller::wait (this=0x96bd0b0, timeout=...) at qpid/sys/epoll/EpollPoller.cpp:524
  #3  0x004cd666 in qpid::sys::Poller::run (this=0x96bd0b0) at qpid/sys/epoll/EpollPoller.cpp:479
  #4  0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable (p=0x96bd0b0) at qpid/sys/posix/Thread.cpp:35
  #5  0x003a3832 in start_thread () from /lib/libpthread.so.0
  #6  0x00984e0e in clone () from /lib/libc.so.6
  
  Thread 5 (Thread 0xb72ffb90 (LWP 27978)):
  #0  0x0023d410 in __kernel_vsyscall ()
  #1  0x00985486 in epoll_wait () from /lib/libc.so.6
  #2  0x004ccafa in qpid::sys::Poller::wait (this=0x96bd0b0, timeout=...) at qpid/sys/epoll/EpollPoller.cpp:524
  #3  0x004cd666 in qpid::sys::Poller::run (this=0x96bd0b0) at qpid/sys/epoll/EpollPoller.cpp:479
  #4  0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable (p=0x96bd0b0) at qpid/sys/posix/Thread.cpp:35
  #5  0x003a3832 in start_thread () from /lib/libpthread.so.0
  #6  0x00984e0e in clone () from /lib/libc.so.6
  
  Thread 4 (Thread 0xb5efdb90 (LWP 27980)):
  #0  0x0023d410 in __kernel_vsyscall ()
  #1  0x00985486 in epoll_wait () from /lib/libc.so.6
  #2  0x004ccafa in qpid::sys::Poller::wait (this=0x96bd0b0, timeout=...) at qpid/sys/epoll/EpollPoller.cpp:524
  #3  0x004cd666 in qpid::sys::Poller::run (this=0x96bd0b0) at qpid/sys/epoll/EpollPoller.cpp:479
  #4  0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable (p=0x96bd0b0) at qpid/sys/posix/Thread.cpp:35
  #5  0x003a3832 in start_thread () from /lib/libpthread.so.0
  #6  0x00984e0e in clone () from /lib/libc.so.6
  
  Thread 3 (Thread 0xb54fcb90 (LWP 27988)):
  #0  0x0023d410 in __kernel_vsyscall ()
  #1  0x003a7bc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  #2  0x009914dd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
  #3  0x007eebf1 in wait (this=0x96bf100) at ../include/qpid/sys/posix/Condition.h:63
  #4  wait (this=0x96bf100) at ../include/qpid/sys/Monitor.h:41
  #5  qpid::sys::Waitable::wait (this=0x96bf100) at qpid/sys/Waitable.h:88
  #6  0x007ec4ee in pop (this=0x98c5be0, result=..., timeout=...) at qpid/sys/BlockingQueue.h:55
  #7  qpid::client::LocalQueueImpl::get (this=0x98c5be0, result=..., timeout=...) at qpid/client/LocalQueueImpl.cpp:49
  #8  0x007ed0be in qpid::client::LocalQueueImpl::get (this=0x98c5be0, timeout=...) at qpid/client/LocalQueueImpl.cpp:40
  #9  0x007ed2a7 in qpid::client::LocalQueueImpl::pop (this=0x98c5be0, timeout=...) at qpid/client/LocalQueueImpl.cpp:36
  ---Type <return> to continue, or q <return> to quit---
  #10 0x007e9d79 in qpid::client::LocalQueue::pop (this=0xb54fc260, timeout=...) at qpid/client/LocalQueue.cpp:43
  #11 0x0805dde6 in qpid::tests::SubscribeThread::run (this=0xb7301530) at perftest.cpp:626
  #12 0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable (p=0xb7301530) at qpid/sys/posix/Thread.cpp:35
  #13 0x003a3832 in start_thread () from /lib/libpthread.so.0
  #14 0x00984e0e in clone () from /lib/libc.so.6
  
  Thread 2 (Thread 0xb4afbb90 (LWP 27989)):
  #0  0x0023d410 in __kernel_vsyscall ()
  #1  0x00985486 in epoll_wait () from /lib/libc.so.6
  #2  0x004ccafa in qpid::sys::Poller::wait (this=0x96bd0b0, timeout=...) at qpid/sys/epoll/EpollPoller.cpp:524
  #3  0x004cd666 in qpid::sys::Poller::run (this=0x96bd0b0) at qpid/sys/epoll/EpollPoller.cpp:479
  #4  0x004c2311 in qpid::sys::(anonymous namespace)::runRunnable (p=0x96bd0b0) at qpid/sys/posix/Thread.cpp:35
  #5  0x003a3832 in start_thread () from /lib/libpthread.so.0
  #6  0x00984e0e in clone () from /lib/libc.so.6
  
  Thread 1 (Thread 0xb7eff720 (LWP 27976)):
  #0  0x0023d410 in __kernel_vsyscall ()
  #1  0x003a7bc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  #2  0x009914dd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
  #3  0x007eebf1 in wait (this=0x98c7718) at ../include/qpid/sys/posix/Condition.h:63
  #4  wait (this=0x98c7718) at ../include/qpid/sys/Monitor.h:41
  #5  qpid::sys::Waitable::wait (this=0x98c7718) at qpid/sys/Waitable.h:88
  #6  0x007ec4ee in pop (this=0x98c7c48, result=..., timeout=...) at qpid/sys/BlockingQueue.h:55
  #7  qpid::client::LocalQueueImpl::get (this=0x98c7c48, result=..., timeout=...) at qpid/client/LocalQueueImpl.cpp:49
  #8  0x007ed0be in qpid::client::LocalQueueImpl::get (this=0x98c7c48, timeout=...) at qpid/client/LocalQueueImpl.cpp:40
  #9  0x007ed2a7 in qpid::client::LocalQueueImpl::pop (this=0x98c7c48, timeout=...) at qpid/client/LocalQueueImpl.cpp:36
  #10 0x007e9d79 in qpid::client::LocalQueue::pop (this=0xbf876b9c, timeout=...) at qpid/client/LocalQueue.cpp:43
  #11 0x0805f222 in qpid::tests::Controller::process(size_t, qpid::client::LocalQueue, std::string, boost::function<void ()(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&),std::allocator<void> >) (this=0xbf876cc8, n=1, lq=..., queue=..., msgFn=...)
      at perftest.cpp:382
  #12 0x0805f841 in qpid::tests::Controller::run (this=0xbf876cc8) at perftest.cpp:422
  #13 0x0805247c in main (argc=Cannot access memory at address 0x80
  ) at perftest.cpp:719
  (gdb) quit
  A debugging session is active.
  
          Inferior 1 [process 27976] will be detached.
  
  Quit anyway? (y or n) n
  Not confirmed.
  (gdb) detach



Version-Release number of selected component (if applicable):
qpid-cpp-server-0.7.935473-1.el5
python-qpid-0.7.934605-1.el5
python-qmf-0.7.934605-1.el5
qpid-tools-0.7.934605-2.el5

How reproducible:
>80%

Steps to Reproduce:
follow https://bugzilla.redhat.com/show_bug.cgi?id=558526#c5
  
Actual results:
Perftest client hangs locks up.

Expected results:
Perftest client should not hang / lock up.


Additional info:

Comment 1 Frantisek Reznicek 2010-04-29 09:59:45 UTC
So far seen on RHEL 5.5 i386 / x86_64 and also apart of the cluster on RHEL4.8 (one occurrecnce, I do not have pstack yet)


Another RHEL 5.5 x86_64 case:

  [root@mrg-qe-02 ~]# gdb
  GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
  Copyright (C) 2009 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "x86_64-redhat-linux-gnu".
  For bug reporting instructions, please see:
  <http://www.gnu.org/software/gdb/bugs/>.
  (gdb) attach 14717
  Attaching to process 14717
  Reading symbols from /usr/bin/perftest...Reading symbols from /usr/lib/debug/usr/bin/perftest.debug...warning: section .gnu.liblist not found in /usr/lib/debug/usr/bin/perftest.debug
  warning: section .dynbss not found in /usr/lib/debug/usr/bin/perftest.debug
  warning: section .gnu.conflict not found in /usr/lib/debug/usr/bin/perftest.debug
  done.
  ...
  Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
  [Thread debugging using libthread_db enabled]
  [New Thread 0x443ec940 (LWP 14807)]
  [New Thread 0x40efd940 (LWP 14806)]
  [New Thread 0x439eb940 (LWP 14779)]
  [New Thread 0x425e9940 (LWP 14754)]
  [New Thread 0x41be8940 (LWP 14718)]
  Loaded symbols for /lib64/libpthread.so.0
  Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done.
  ...
  Loaded symbols for /usr/lib64/sasl2/liblogin.so.2
  warning: section .gnu.liblist not found in /usr/lib/debug/usr/bin/perftest.debug
  warning: section .dynbss not found in /usr/lib/debug/usr/bin/perftest.debug
  warning: section .gnu.conflict not found in /usr/lib/debug/usr/bin/perftest.debug
  0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  (gdb) info threads
    6 Thread 0x41be8940 (LWP 14718)  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
    5 Thread 0x425e9940 (LWP 14754)  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
    4 Thread 0x439eb940 (LWP 14779)  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
    3 Thread 0x40efd940 (LWP 14806)  0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    2 Thread 0x443ec940 (LWP 14807)  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
  * 1 Thread 0x2b6050505860 (LWP 14717)  0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  (gdb) thread apply all bt
  
  Thread 6 (Thread 0x41be8940 (LWP 14718)):
  #0  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
  #1  0x0000003c563295af in qpid::sys::Poller::wait (this=0xdf82dc0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524
  #2  0x0000003c56329fd2 in qpid::sys::Poller::run (this=0xdf82dc0) at qpid/sys/epoll/EpollPoller.cpp:479
  #3  0x0000003c563201ca in qpid::sys::(anonymous namespace)::runRunnable (p=0x6) at qpid/sys/posix/Thread.cpp:35
  #4  0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0
  #5  0x00000036a10d3d1d in clone () from /lib64/libc.so.6
  
  Thread 5 (Thread 0x425e9940 (LWP 14754)):
  #0  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
  #1  0x0000003c563295af in qpid::sys::Poller::wait (this=0xdf82dc0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524
  #2  0x0000003c56329fd2 in qpid::sys::Poller::run (this=0xdf82dc0) at qpid/sys/epoll/EpollPoller.cpp:479
  #3  0x0000003c563201ca in qpid::sys::(anonymous namespace)::runRunnable (p=0x6) at qpid/sys/posix/Thread.cpp:35
  #4  0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0
  #5  0x00000036a10d3d1d in clone () from /lib64/libc.so.6
  
  Thread 4 (Thread 0x439eb940 (LWP 14779)):
  #0  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
  #1  0x0000003c563295af in qpid::sys::Poller::wait (this=0xdf82dc0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524
  #2  0x0000003c56329fd2 in qpid::sys::Poller::run (this=0xdf82dc0) at qpid/sys/epoll/EpollPoller.cpp:479
  #3  0x0000003c563201ca in qpid::sys::(anonymous namespace)::runRunnable (p=0x6) at qpid/sys/posix/Thread.cpp:35
  #4  0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0
  #5  0x00000036a10d3d1d in clone () from /lib64/libc.so.6
  
  Thread 3 (Thread 0x40efd940 (LWP 14806)):
  #0  0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  #1  0x0000003c568aa629 in wait (this=0xe3944f0) at ../include/qpid/sys/posix/Condition.h:63
  #2  wait (this=0xe3944f0) at ../include/qpid/sys/Monitor.h:41
  #3  qpid::sys::Waitable::wait (this=0xe3944f0) at qpid/sys/Waitable.h:88
  #4  0x0000003c568a9860 in pop (this=0xe393c90, result=..., timeout=<value optimized out>) at qpid/sys/BlockingQueue.h:55
  #5  qpid::client::LocalQueueImpl::get (this=0xe393c90, result=..., timeout=<value optimized out>) at qpid/client/LocalQueueImpl.cpp:49
  #6  0x0000003c568aa069 in qpid::client::LocalQueueImpl::get (this=0xe393c90, timeout=...) at qpid/client/LocalQueueImpl.cpp:40
  #7  0x0000003c568aa209 in qpid::client::LocalQueueImpl::pop (this=0x80, timeout=...) at qpid/client/LocalQueueImpl.cpp:36
  #8  0x0000003c568a696c in qpid::client::LocalQueue::pop (this=<value optimized out>, timeout=...) at qpid/client/LocalQueue.cpp:43
  #9  0x000000000041c752 in qpid::tests::SubscribeThread::run (this=0xdf8b260) at perftest.cpp:626
  #10 0x0000003c563201ca in qpid::sys::(anonymous namespace)::runRunnable (p=0xe39451c) at qpid/sys/posix/Thread.cpp:35
  #11 0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0
  #12 0x00000036a10d3d1d in clone () from /lib64/libc.so.6
  ---Type <return> to continue, or q <return> to quit---
  
  Thread 2 (Thread 0x443ec940 (LWP 14807)):
  #0  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
  #1  0x0000003c563295af in qpid::sys::Poller::wait (this=0xdf82dc0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524
  #2  0x0000003c56329fd2 in qpid::sys::Poller::run (this=0xdf82dc0) at qpid/sys/epoll/EpollPoller.cpp:479
  #3  0x0000003c563201ca in qpid::sys::(anonymous namespace)::runRunnable (p=0x6) at qpid/sys/posix/Thread.cpp:35
  #4  0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0
  #5  0x00000036a10d3d1d in clone () from /lib64/libc.so.6
  
  Thread 1 (Thread 0x2b6050505860 (LWP 14717)):
  #0  0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  #1  0x0000003c568aa629 in wait (this=0x2aaaac204310) at ../include/qpid/sys/posix/Condition.h:63
  #2  wait (this=0x2aaaac204310) at ../include/qpid/sys/Monitor.h:41
  #3  qpid::sys::Waitable::wait (this=0x2aaaac204310) at qpid/sys/Waitable.h:88
  #4  0x0000003c568a9860 in pop (this=0x2aaaac2020a0, result=..., timeout=<value optimized out>) at qpid/sys/BlockingQueue.h:55
  #5  qpid::client::LocalQueueImpl::get (this=0x2aaaac2020a0, result=..., timeout=<value optimized out>) at qpid/client/LocalQueueImpl.cpp:49
  #6  0x0000003c568aa069 in qpid::client::LocalQueueImpl::get (this=0x2aaaac2020a0, timeout=...) at qpid/client/LocalQueueImpl.cpp:40
  #7  0x0000003c568aa209 in qpid::client::LocalQueueImpl::pop (this=0x80, timeout=...) at qpid/client/LocalQueueImpl.cpp:36
  #8  0x0000003c568a696c in qpid::client::LocalQueue::pop (this=<value optimized out>, timeout=...) at qpid/client/LocalQueue.cpp:43
  #9  0x000000000040fb47 in qpid::tests::Controller::process(size_t, qpid::client::LocalQueue, std::string, boost::function<void ()(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&),std::allocator<void> >) (this=<value optimized out>, n=1, lq=...,
      queue=<value optimized out>, msgFn=...) at perftest.cpp:382
  #10 0x0000000000412a76 in qpid::tests::Controller::run (this=0x7fff43f3eb30) at perftest.cpp:422
  #11 0x000000000040cc1b in main (argc=234371408, argv=<value optimized out>) at perftest.cpp:719
  (gdb) quit

Comment 2 Frantisek Reznicek 2010-04-30 10:02:31 UTC
The backtrace and behavior seems identical to bug 501668. Going to prove that an exception was thrown to prove that.

Comment 3 Frantisek Reznicek 2010-05-24 15:13:05 UTC
I took some time to retest the issue and although I first believed that there is no issue I triggered cases when perftest (c++ client library) hangs without notification that queue policy was violated:

One case I saw:

perftest log:
[root@mrg-qe-02 bz587190]# cat perftest_001.log
2010-05-24 10:52:53 info Connecting to tcp:localhost:5672
2010-05-24 10:52:53 debug Created IO thread: 0
2010-05-24 10:52:53 debug TCPConnector created for \x00-

2010-05-24 10:52:53 debug RECV [40796 localhost:5672] INIT(0-10)
2010-05-24 10:52:53 debug CyrusSasl::start(ANONYMOUS)
2010-05-24 10:52:53 debug min_ssf: 0, max_ssf: 256
2010-05-24 10:52:53 debug getUserFromSettings(): guest
2010-05-24 10:52:53 debug CyrusSasl::start(ANONYMOUS): selected ANONYMOUS response: 'guest.eng.brq.redhat.com'
2010-05-24 10:52:53 debug Known-brokers for connection: amqp:tcp:10.34.33.55:5672,tcp:10.34.45.2:5672,tcp:192.168.122.1:5672
2010-05-24 10:52:53 debug No security layer in place
2010-05-24 10:52:53 debug SessionState::SessionState .: 0x7661098
2010-05-24 10:52:54 debug Exception constructed: Closed by client
2010-05-24 10:52:54 info Connecting to tcp:localhost:5672
2010-05-24 10:52:54 debug TCPConnector created for \x00-

2010-05-24 10:52:54 debug RECV [40797 localhost:5672] INIT(0-10)
2010-05-24 10:52:54 debug CyrusSasl::start(ANONYMOUS)
2010-05-24 10:52:54 debug min_ssf: 0, max_ssf: 256
2010-05-24 10:52:54 debug getUserFromSettings(): guest
2010-05-24 10:52:54 debug CyrusSasl::start(ANONYMOUS): selected ANONYMOUS response: 'guest.eng.brq.redhat.com'
2010-05-24 10:52:54 debug Known-brokers for connection: amqp:tcp:10.34.33.55:5672,tcp:10.34.45.2:5672,tcp:192.168.122.1:5672
2010-05-24 10:52:54 debug No security layer in place
2010-05-24 10:52:54 debug SessionState::SessionState .: 0x7457568
2010-05-24 10:52:54 info Connecting to tcp:localhost:5672
2010-05-24 10:52:54 debug Created IO thread: 1
2010-05-24 10:52:54 debug TCPConnector created for \x00-

2010-05-24 10:52:54 debug RECV [40798 localhost:5672] INIT(0-10)
2010-05-24 10:52:54 debug CyrusSasl::start(ANONYMOUS)
2010-05-24 10:52:54 debug min_ssf: 0, max_ssf: 256
2010-05-24 10:52:54 debug getUserFromSettings(): guest
2010-05-24 10:52:54 debug CyrusSasl::start(ANONYMOUS): selected ANONYMOUS response: 'guest.eng.brq.redhat.com'
2010-05-24 10:52:54 debug Known-brokers for connection: amqp:tcp:10.34.33.55:5672,tcp:10.34.45.2:5672,tcp:192.168.122.1:5672
2010-05-24 10:52:54 debug No security layer in place
2010-05-24 10:52:54 debug SessionState::SessionState .: 0x7662a58
2010-05-24 10:52:54 info Connecting to tcp:localhost:5672
2010-05-24 10:52:54 debug Created IO thread: 2
2010-05-24 10:52:54 debug TCPConnector created for \x00-

2010-05-24 10:52:54 debug RECV [40799 localhost:5672] INIT(0-10)
2010-05-24 10:52:54 debug CyrusSasl::start(ANONYMOUS)
2010-05-24 10:52:54 debug min_ssf: 0, max_ssf: 256
2010-05-24 10:52:54 debug getUserFromSettings(): guest
2010-05-24 10:52:54 debug CyrusSasl::start(ANONYMOUS): selected ANONYMOUS response: 'guest.eng.brq.redhat.com'
2010-05-24 10:52:54 debug Known-brokers for connection: amqp:tcp:10.34.33.55:5672,tcp:10.34.45.2:5672,tcp:192.168.122.1:5672
2010-05-24 10:52:54 debug No security layer in place
2010-05-24 10:52:54 debug SessionState::SessionState .: 0x7866088
2010-05-24 10:52:54 warning Connection closed
2010-05-24 10:52:54 debug Exception constructed: Connection closed
2010-05-24 10:52:54 debug Exception constructed: Connection closed
PublishThread exception: Connection closed

perftest hang state:
Thread 5 (Thread 0x42503940 (LWP 2958)):
#0  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
#1  0x000000313872bfef in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib64/libqpidcommon.so.2
#2  0x000000313872c9d2 in qpid::sys::Poller::run() () from /usr/lib64/libqpidcommon.so.2
#3  0x000000313872358a in ?? () from /usr/lib64/libqpidcommon.so.2
#4  0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0
#5  0x00000036a10d3d1d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x43905940 (LWP 2960)):
#0  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
#1  0x000000313872bfef in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib64/libqpidcommon.so.2
#2  0x000000313872c9d2 in qpid::sys::Poller::run() () from /usr/lib64/libqpidcommon.so.2
#3  0x000000313872358a in ?? () from /usr/lib64/libqpidcommon.so.2
#4  0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0
#5  0x00000036a10d3d1d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x4145b940 (LWP 2961)):
#0  0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000031382873db in qpid::client::LocalQueueImpl::get(qpid::client::Message&, qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2
#2  0x0000003138287c49 in qpid::client::LocalQueueImpl::get(qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2
#3  0x0000003138287de9 in qpid::client::LocalQueueImpl::pop(qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2
#4  0x0000003138284dfc in qpid::client::LocalQueue::pop(qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2
#5  0x000000000041b0e2 in ?? ()
#6  0x000000313872358a in ?? () from /usr/lib64/libqpidcommon.so.2
#7  0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0
#8  0x00000036a10d3d1d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x44306940 (LWP 2962)):
#0  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
#1  0x000000313872bfef in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib64/libqpidcommon.so.2
#2  0x000000313872c9d2 in qpid::sys::Poller::run() () from /usr/lib64/libqpidcommon.so.2
#3  0x000000313872358a in ?? () from /usr/lib64/libqpidcommon.so.2
#4  0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0
#5  0x00000036a10d3d1d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b8b9ba6c870 (LWP 2957)):
#0  0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000031382873db in qpid::client::LocalQueueImpl::get(qpid::client::Message&, qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2
#2  0x0000003138287c49 in qpid::client::LocalQueueImpl::get(qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2
#3  0x0000003138287de9 in qpid::client::LocalQueueImpl::pop(qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2
#4  0x0000003138284dfc in qpid::client::LocalQueue::pop(qpid::sys::Duration) () from /usr/lib64/libqpidclient.so.2
#5  0x000000000040f7c7 in ?? ()
#6  0x0000000000413f66 in ?? ()
#7  0x000000000040ccdb in std::ios_base::Init::~Init() ()
#8  0x00000036a101d994 in __libc_start_main () from /lib64/libc.so.6
#9  0x000000000040aa79 in std::ios_base::Init::~Init() ()
#10 0x00007fffe2fdbc48 in ?? ()
#11 0x0000000000000000 in ?? ()


(gdb) info threads
  5 Thread 0x42503940 (LWP 2958)  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
  4 Thread 0x43905940 (LWP 2960)  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
  3 Thread 0x4145b940 (LWP 2961)  0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x44306940 (LWP 2962)  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
* 1 Thread 0x2b8b9ba6c870 (LWP 2957)  0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) thread apply all bt

Thread 5 (Thread 0x42503940 (LWP 2958)):
#0  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
#1  0x000000313872bfef in qpid::sys::Poller::wait (this=0x74601f0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524
#2  0x000000313872c9d2 in qpid::sys::Poller::run (this=0x74601f0) at qpid/sys/epoll/EpollPoller.cpp:479
#3  0x000000313872358a in qpid::sys::(anonymous namespace)::runRunnable (p=0x5) at qpid/sys/posix/Thread.cpp:35
#4  0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0
#5  0x00000036a10d3d1d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x43905940 (LWP 2960)):
#0  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
#1  0x000000313872bfef in qpid::sys::Poller::wait (this=0x74601f0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524
#2  0x000000313872c9d2 in qpid::sys::Poller::run (this=0x74601f0) at qpid/sys/epoll/EpollPoller.cpp:479
#3  0x000000313872358a in qpid::sys::(anonymous namespace)::runRunnable (p=0x5) at qpid/sys/posix/Thread.cpp:35
#4  0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0
#5  0x00000036a10d3d1d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x4145b940 (LWP 2961)):
#0  0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000031382873db in pop (this=0x7662d90, result=..., timeout=<value optimized out>) at ../include/qpid/sys/posix/Condition.h:63
#2  qpid::client::LocalQueueImpl::get (this=0x7662d90, result=..., timeout=<value optimized out>) at qpid/client/LocalQueueImpl.cpp:49
#3  0x0000003138287c49 in qpid::client::LocalQueueImpl::get (this=0x7662d90, timeout=...) at qpid/client/LocalQueueImpl.cpp:40
#4  0x0000003138287de9 in qpid::client::LocalQueueImpl::pop (this=0x80, timeout=...) at qpid/client/LocalQueueImpl.cpp:36
#5  0x0000003138284dfc in qpid::client::LocalQueue::pop (this=<value optimized out>, timeout=...) at qpid/client/LocalQueue.cpp:43
#6  0x000000000041b0e2 in qpid::tests::SubscribeThread::run (this=0x74535a0) at perftest.cpp:626
#7  0x000000313872358a in qpid::sys::(anonymous namespace)::runRunnable (p=0x2aaab00018ec) at qpid/sys/posix/Thread.cpp:35
#8  0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0
#9  0x00000036a10d3d1d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x44306940 (LWP 2962)):
#0  0x00000036a10d4108 in epoll_wait () from /lib64/libc.so.6
#1  0x000000313872bfef in qpid::sys::Poller::wait (this=0x74601f0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:524
#2  0x000000313872c9d2 in qpid::sys::Poller::run (this=0x74601f0) at qpid/sys/epoll/EpollPoller.cpp:479
#3  0x000000313872358a in qpid::sys::(anonymous namespace)::runRunnable (p=0x5) at qpid/sys/posix/Thread.cpp:35
#4  0x00000036a1c0673d in start_thread () from /lib64/libpthread.so.0
#5  0x00000036a10d3d1d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x2b8b9ba6c870 (LWP 2957)):
#0  0x00000036a1c0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#1  0x00000031382873db in pop (this=0x2aaab0002000, result=..., timeout=<value optimized out>) at ../include/qpid/sys/posix/Condition.h:63
#2  qpid::client::LocalQueueImpl::get (this=0x2aaab0002000, result=..., timeout=<value optimized out>) at qpid/client/LocalQueueImpl.cpp:49
#3  0x0000003138287c49 in qpid::client::LocalQueueImpl::get (this=0x2aaab0002000, timeout=...) at qpid/client/LocalQueueImpl.cpp:40
#4  0x0000003138287de9 in qpid::client::LocalQueueImpl::pop (this=0x80, timeout=...) at qpid/client/LocalQueueImpl.cpp:36
#5  0x0000003138284dfc in qpid::client::LocalQueue::pop (this=<value optimized out>, timeout=...) at qpid/client/LocalQueue.cpp:43
#6  0x000000000040f7c7 in qpid::tests::Controller::process(size_t, qpid::client::LocalQueue, std::string, boost::function<void ()(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&),std::allocator<void> >) (this=<value optimized out>, n=1, lq=...,
    queue=<value optimized out>, msgFn=...) at perftest.cpp:382
#7  0x0000000000413f66 in qpid::tests::Controller::run (this=0x7fffe2fdb980) at perftest.cpp:422
#8  0x000000000040ccdb in main (argc=121976304, argv=<value optimized out>) at perftest.cpp:719




broker log:
...
2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-180(local)
2010-05-24 10:52:53 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-180
2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-180(local)
2010-05-24 10:52:53 info new client connection 10.34.45.2:1903-181(local)
2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-181(local) ssf=0
2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-181(local)
2010-05-24 10:52:53 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-181
2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-181(local)
2010-05-24 10:52:53 info new client connection 10.34.45.2:1903-182(local)
2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-182(local) ssf=0
2010-05-24 10:52:53 info new client connection 10.34.45.2:1903-183(local)
2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-183(local) ssf=0
2010-05-24 10:52:53 info new client connection 10.34.45.2:1903-184(local)
2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-184(local) ssf=0
2010-05-24 10:52:53 info new client connection 10.34.45.2:1903-185(local)
2010-05-24 10:52:53 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-185(local) ssf=0
2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-185(local)
2010-05-24 10:52:54 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-185
2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-185(local)
2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-186(local)
2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-186(local) ssf=0
2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-187(local)
2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-187(local) ssf=0
2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-188(local)
2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-188(local) ssf=0
2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-189(local)
2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-189(local) ssf=0
2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-186(local)
2010-05-24 10:52:54 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-186
2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-186(local)
2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-189(local)
2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-190(local)
2010-05-24 10:52:54 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-189
2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-189(local)
2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-190(local) ssf=0
2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-191(local)
2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-191(local) ssf=0
2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-192(local)
2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-192(local) ssf=0
2010-05-24 10:52:54 info new client connection 10.34.45.2:1903-193(local)
2010-05-24 10:52:54 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-193(local) ssf=0
2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-193(local)
2010-05-24 10:52:55 info new client connection 10.34.45.2:1903-194(local)
2010-05-24 10:52:55 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-193
2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-193(local)
2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-194(local) ssf=0
2010-05-24 10:52:55 info new client connection 10.34.45.2:1903-195(local)
2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-195(local) ssf=0
2010-05-24 10:52:55 info new client connection 10.34.45.2:1903-196(local)
2010-05-24 10:52:55 info new client connection 10.34.45.2:1903-197(local)
2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-196(local) ssf=0
2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-197(local) ssf=0
2010-05-24 10:52:55 info new client connection 10.34.45.2:1903-198(local)
2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-198(local) ssf=0
2010-05-24 10:52:55 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-197(local)
2010-05-24 10:52:55 info new client connection 10.34.45.2:1903-199(local)
2010-05-24 10:52:56 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-197
2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-197(local)
2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-199(local) ssf=0
2010-05-24 10:52:56 info new client connection 10.34.45.2:1903-200(local)
2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-200(local) ssf=0
2010-05-24 10:52:56 info new client connection 10.34.45.2:1903-201(local)
2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-201(local) ssf=0
2010-05-24 10:52:56 info new client connection 10.34.45.2:1903-202(local)
2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-198(local)
2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-202(local) ssf=0
2010-05-24 10:52:56 info new client connection 10.34.45.2:1903-203(local)
2010-05-24 10:52:56 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-198
2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-198(local)
2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-203(local) ssf=0
2010-05-24 10:52:56 info new client connection 10.34.45.2:1903-204(local)
2010-05-24 10:52:56 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-204(local) ssf=0
2010-05-24 10:52:57 info new client connection 10.34.45.2:1903-205(local)
2010-05-24 10:52:57 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-205(local) ssf=0
2010-05-24 10:52:57 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-201(local)
2010-05-24 10:52:57 info new client connection 10.34.45.2:1903-206(local)
2010-05-24 10:52:57 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-201
2010-05-24 10:52:57 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-201(local)
2010-05-24 10:52:57 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-206(local) ssf=0
2010-05-24 10:52:57 info new client connection 10.34.45.2:1903-207(local)
2010-05-24 10:52:57 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-207(local) ssf=0
2010-05-24 10:52:57 info new client connection 10.34.45.2:1903-208(local)
2010-05-24 10:52:57 debug cluster(10.34.45.2:1903 READY) initialized connection: 10.34.45.2:1903-208(local) ssf=0
2010-05-24 10:53:18 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-205(local)
2010-05-24 10:53:19 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-205
2010-05-24 10:53:19 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-205(local)
2010-05-24 10:53:19 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-203(local)
2010-05-24 10:53:19 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-203
2010-05-24 10:53:19 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-203(local)
2010-05-24 10:53:19 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-204(local)
2010-05-24 10:53:19 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-204
2010-05-24 10:53:19 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-204(local)
2010-05-24 10:53:24 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-208(local)
2010-05-24 10:53:24 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-208
2010-05-24 10:53:24 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-208(local)
2010-05-24 10:53:24 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-206(local)
2010-05-24 10:53:24 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-206
2010-05-24 10:53:24 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-206(local)
2010-05-24 10:53:24 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-207(local)
2010-05-24 10:53:24 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-207
2010-05-24 10:53:24 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-207(local)
2010-05-24 10:53:26 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-192(local)
2010-05-24 10:53:26 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-192
2010-05-24 10:53:26 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-192(local)
2010-05-24 10:53:26 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-190(local)
2010-05-24 10:53:26 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-190
2010-05-24 10:53:26 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-190(local)
2010-05-24 10:53:26 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-191(local)
2010-05-24 10:53:26 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-191
2010-05-24 10:53:26 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-191(local)
2010-05-24 10:53:36 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-184(local)
2010-05-24 10:53:36 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-184
2010-05-24 10:53:36 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-184(local)
2010-05-24 10:53:36 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-182(local)
2010-05-24 10:53:36 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-182
2010-05-24 10:53:36 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-182(local)
2010-05-24 10:53:36 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-183(local)
2010-05-24 10:53:36 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-183
2010-05-24 10:53:36 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-183(local)
2010-05-24 10:53:40 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-202(local)
2010-05-24 10:53:40 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-202
2010-05-24 10:53:40 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-202(local)
2010-05-24 10:53:40 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-199(local)
2010-05-24 10:53:40 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-199
2010-05-24 10:53:40 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-199(local)
2010-05-24 10:53:40 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-200(local)
2010-05-24 10:53:40 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-200
2010-05-24 10:53:40 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-200(local)
2010-05-24 10:53:42 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-196(local)
2010-05-24 10:53:42 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-196
2010-05-24 10:53:42 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-196(local)
2010-05-24 10:53:42 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-194(local)
2010-05-24 10:53:42 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-194
2010-05-24 10:53:42 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-194(local)
2010-05-24 10:53:42 debug cluster(10.34.45.2:1903 READY) local close of replicated connection 10.34.45.2:1903-195(local)
2010-05-24 10:53:42 info cluster(10.34.45.2:1903 READY) connection closed 10.34.45.2:1903-195
2010-05-24 10:53:42 debug cluster(10.34.45.2:1903 READY) deleted connection: 10.34.45.2:1903-195(local)


The bug is seen when below attached reproducer is used.
All broker modules used clustering enabled (can be even one node).
Packages:
condor-qmf-7.4.3-0.14.el5
openais-0.80.6-16.el5_5.1
openais-debuginfo-0.80.6-16.el5_5.1
openais-devel-0.80.6-16.el5_5.1
python-qmf-0.7.946106-3.el5
python-qpid-0.7.946106-1.el5
qmf-0.7.946106-1.el5
qmf-devel-0.7.946106-1.el5
qpid-cpp-client-0.7.946106-1.el5
qpid-cpp-client-devel-0.7.946106-1.el5
qpid-cpp-client-devel-docs-0.7.946106-1.el5
qpid-cpp-client-ssl-0.7.946106-1.el5
qpid-cpp-mrg-debuginfo-0.7.946106-1.el5
qpid-cpp-server-0.7.946106-1.el5
qpid-cpp-server-cluster-0.7.946106-1.el5
qpid-cpp-server-devel-0.7.946106-1.el5
qpid-cpp-server-ssl-0.7.946106-1.el5
qpid-cpp-server-store-0.7.946106-1.el5
qpid-cpp-server-xml-0.7.946106-1.el5
qpid-java-client-0.7.946106-1.el5
qpid-java-common-0.7.946106-1.el5
qpid-tools-0.7.946106-2.el5

[root@mrg-qe-02 tmp]# cat /etc/ais/openais.conf
# Please read the openais.conf.5 manual page

totem {
        version: 2
        secauth: off
        threads: 0
        interface {
                ringnumber: 0
bindnetaddr: 10.34.45.0
mcastaddr: 225.0.10.11
mcastport: 54001
        }
}

logging {
debug: on
timestamp: on
to_file: yes
logfile: /tmp/openais.log
}

amf {
        mode: disabled
}
[root@mrg-qe-02 tmp]# cat /etc/qpidd.conf
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#
# Configuration file for qpidd. Entries are of the form:
#   name=value
#
# (Note: no spaces on either side of '='). Using default settings:
# "qpidd --help" or "man qpidd" for more details.
cluster-mechanism=ANONYMOUS
cluster-name=X
auth=no
log-enable=info+
log-enable=debug+:cluster
log-to-file=/tmp/qpidd.conf
truncate=yes

Comment 4 Frantisek Reznicek 2010-05-24 15:19:36 UTC
Created attachment 416157 [details]
The reproducer

How to run:
1] installqpidd and openais
2] configure qpidd and openais as detailed above
3] start openais
4] killall perftest qpidd ; ./bz587190.sh 20 1 0 ANONYMOUS; while true; do ./bz587190.sh $(( $((${RANDOM}%4)) + 4)) 0 0 ANONYMOUS|| break; done

You should see hang like this:
start (7 0 0 ANONYMOUS)
perftest: no process killed
tcp        0      0 0.0.0.0:5672                0.0.0.0:*                   LISTEN      1903/qpidd
Brokers
  broker            cluster    uptime  conn  sess  exch  queue
  ==============================================================
  10.34.33.55:5672  X(ACTIVE)  7m 59s     1     1     8   103
Launching the clients: .......
7|0|7 > 10:52:56 up 11 days, 18 min,  2 users,  load average: 4.43, 3.81, 2.40<
7|0|7 > 10:53:06 up 11 days, 19 min,  2 users,  load average: 5.14, 3.98, 2.47<
7|0|7 > 10:53:16 up 11 days, 19 min,  2 users,  load average: 4.88, 3.96, 2.48<
5|2|7 > 10:53:26 up 11 days, 19 min,  2 users,  load average: 5.33, 4.10, 2.54<
3|4|7 > 10:53:36 up 11 days, 19 min,  2 users,  load average: 5.95, 4.27, 2.61<
1|6|7 > 10:53:46 up 11 days, 19 min,  2 users,  load average: 5.48, 4.23, 2.61<
1|6|7 > 10:53:56 up 11 days, 19 min,  2 users,  load average: 4.64, 4.09, 2.59<
1|6|7 > 10:54:07 up 11 days, 20 min,  2 users,  load average: 3.92, 3.95, 2.56<
1|6|7 > 10:54:17 up 11 days, 20 min,  2 users,  load average: 3.32, 3.82, 2.53<
1|6|7 > 10:54:27 up 11 days, 20 min,  2 users,  load average: 2.81, 3.70, 2.50<
1|6|7 > 10:54:37 up 11 days, 20 min,  2 users,  load average: 2.38, 3.57, 2.47<
1|6|7 > 10:54:47 up 11 days, 20 min,  2 users,  load average: 2.01, 3.46, 2.45<
...
1|6|7 > 11:17:29 up 11 days, 43 min,  2 users,  load average: 0.00, 0.05, 0.56<

Comment 5 Frantisek Reznicek 2010-05-24 15:21:56 UTC
Small clarification: When cluster is not enabled, no such behavior is seen (tested as is twice with and without clustering enabled).

Comment 6 Frantisek Reznicek 2010-05-25 07:26:01 UTC
There was found case on RHEL4.8 i386 (non clustered environment) with same hang, so the issue is not dependent on cluster module. The real reproducibility in cluster environment is much more rapid.

Interesting might be Thread#1 qpid::tests::Controller::process( queue=...) call with queue parameter value.

Following perftest state trace is from RHEL 4.8 i386:

Attaching to process 27355
Reading symbols from /usr/bin/perftest...Reading symbols from /usr/lib/debug/usr/bin/perftest.debug...done.
Using host libthread_db library "/lib/tls/libthread_db.so.1".
done.
Reading symbols from /usr/lib/libqpidclient.so.2...Reading symbols from /usr/lib/debug/usr/lib/libqpidclient.so.2.0.0.debug...done.
done.
Loaded symbols for /usr/lib/libqpidclient.so.2
Reading symbols from /usr/lib/libqpidcommon.so.2...Reading symbols from /usr/lib/debug/usr/lib/libqpidcommon.so.2.0.0.debug...done.
done.
Loaded symbols for /usr/lib/libqpidcommon.so.2
Reading symbols from /usr/lib/libboost_program_options.so.1...done.
...
[Thread debugging using libthread_db enabled]
[New Thread -1209071840 (LWP 27355)]
[New Thread -1251034208 (LWP 27360)]
[New Thread -1240544352 (LWP 27359)]
[New Thread -1230054496 (LWP 27358)]
[New Thread -1219564640 (LWP 27357)]
[New Thread -1209074784 (LWP 27356)]
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/qpid/client/sslconnector.so...Reading symbols from /usr/lib/debug/usr/lib/qpid/client/sslconnector.so.debug...done.
done.
Loaded symbols for /usr/lib/qpid/client/sslconnector.so
Reading symbols from /usr/lib/libsslcommon.so.2...Reading symbols from /usr/lib/debug/usr/lib/libsslcommon.so.2.0.0.debug...done.
done.
...
0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) info threads
  6 Thread -1209074784 (LWP 27356)  0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  5 Thread -1219564640 (LWP 27357)  0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  4 Thread -1230054496 (LWP 27358)  0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  3 Thread -1240544352 (LWP 27359)  0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  2 Thread -1251034208 (LWP 27360)  0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  1 Thread -1209071840 (LWP 27355)  0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) thread apply all bt

Thread 6 (Thread -1209074784 (LWP 27356)):
#0  0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00bb862e in epoll_wait () from /lib/tls/libc.so.6
#2  0x00421020 in qpid::sys::Poller::wait (this=0x8bf2ca8, timeout={nanosecs = 9223372036854775807}) at qpid/sys/epoll/EpollPoller.cpp:524
#3  0x00421f5c in qpid::sys::Poller::run (this=0x8bf2ca8) at qpid/sys/epoll/EpollPoller.cpp:479
#4  0x004154f1 in qpid::sys::(anonymous namespace)::runRunnable (p=0xfffffffc) at qpid/sys/posix/Thread.cpp:35
#5  0x00d515cc in start_thread () from /lib/tls/libpthread.so.0
#6  0x00bb7fae in clone () from /lib/tls/libc.so.6

Thread 5 (Thread -1219564640 (LWP 27357)):
#0  0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00bb862e in epoll_wait () from /lib/tls/libc.so.6
#2  0x00421020 in qpid::sys::Poller::wait (this=0x8bf2ca8, timeout={nanosecs = 9223372036854775807}) at qpid/sys/epoll/EpollPoller.cpp:524
#3  0x00421f5c in qpid::sys::Poller::run (this=0x8bf2ca8) at qpid/sys/epoll/EpollPoller.cpp:479
#4  0x004154f1 in qpid::sys::(anonymous namespace)::runRunnable (p=0xfffffffc) at qpid/sys/posix/Thread.cpp:35
#5  0x00d515cc in start_thread () from /lib/tls/libpthread.so.0
#6  0x00bb7fae in clone () from /lib/tls/libc.so.6

Thread 4 (Thread -1230054496 (LWP 27358)):
#0  0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00d53ef6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
#2  0x00bc4ad7 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libc.so.6
#3  0x00177b90 in qpid::client::Bounds::expand (this=0x8dfa940, sizeRequired=1036, block=true) at ../include/qpid/sys/posix/Condition.h:63
#4  0x001d5a1a in qpid::client::SessionImpl::sendFrame (this=0x8bf3400, frame=@0x40c, canBlock=false) at /usr/include/boost/shared_ptr.hpp:252
#5  0x001d5a80 in qpid::client::SessionImpl::handleOut (this=0xfffffffc, frame=@0xfffffffc) at qpid/client/SessionImpl.cpp:500
#6  0x001d6013 in qpid::client::SessionImpl::sendContent (this=0x8bf3400, content=@0x8dfc158) at qpid/client/SessionImpl.cpp:428
#7  0x001dc30b in qpid::client::SessionImpl::sendCommand (this=0x8bf3400, command=@0xb6aed0f0, content=0x8dfc158)
    at qpid/client/SessionImpl.cpp:398
#8  0x001dc9db in qpid::client::SessionImpl::send (this=0xfffffffc, command=@0xfffffffc, content=@0xfffffffc) at qpid/client/SessionImpl.cpp:306
#9  0x0016503d in qpid::client::no_keyword::AsyncSession_0_10::messageTransfer (this=0x8bf2bc8, destination=@0x8bf2bd8, acceptMode=1 '\001',
    acquireMode=0 '\0', content=@0xb6aed2e0, sync=false) at /usr/include/boost/shared_ptr.hpp:252
#10 0x08061155 in qpid::tests::PublishThread::run (this=0x8bf2ba8) at ../../src/boost/parameter/aux_/arg_list.hpp:73
#11 0x004154f1 in qpid::sys::(anonymous namespace)::runRunnable (p=0xfffffffc) at qpid/sys/posix/Thread.cpp:35
#12 0x00d515cc in start_thread () from /lib/tls/libpthread.so.0
#13 0x00bb7fae in clone () from /lib/tls/libc.so.6

Thread 3 (Thread -1240544352 (LWP 27359)):
#0  0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00d53ef6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
#2  0x00bc4ad7 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libc.so.6
#3  0x001c1fcc in qpid::client::LocalQueueImpl::get (this=0x92006d8, result=@0xb60ec280, timeout={nanosecs = 9223372036854775807})
    at ../include/qpid/sys/posix/Condition.h:63
#4  0x001c427a in qpid::client::LocalQueueImpl::get (this=0x92006d8, timeout={nanosecs = 9223372036854775804})
    at qpid/client/LocalQueueImpl.cpp:40
#5  0x001c447d in qpid::client::LocalQueueImpl::pop (this=0xfffffffc, timeout={nanosecs = 9223372032559808512})
    at qpid/client/LocalQueueImpl.cpp:36
#6  0x001c0e4d in qpid::client::LocalQueue::pop (this=0xfffffffc, timeout={nanosecs = 9223372036854775804}) at qpid/client/LocalQueue.cpp:43
#7  0x08061b14 in qpid::tests::SubscribeThread::run (this=0x8dfb368) at perftest.cpp:626
#8  0x004154f1 in qpid::sys::(anonymous namespace)::runRunnable (p=0xfffffffc) at qpid/sys/posix/Thread.cpp:35
#9  0x00d515cc in start_thread () from /lib/tls/libpthread.so.0
#10 0x00bb7fae in clone () from /lib/tls/libc.so.6

Thread 2 (Thread -1251034208 (LWP 27360)):
#0  0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00bb862e in epoll_wait () from /lib/tls/libc.so.6
#2  0x00421020 in qpid::sys::Poller::wait (this=0x8bf2ca8, timeout={nanosecs = 9223372036854775807}) at qpid/sys/epoll/EpollPoller.cpp:524
#3  0x00421f5c in qpid::sys::Poller::run (this=0x8bf2ca8) at qpid/sys/epoll/EpollPoller.cpp:479
#4  0x004154f1 in qpid::sys::(anonymous namespace)::runRunnable (p=0xfffffffc) at qpid/sys/posix/Thread.cpp:35
#5  0x00d515cc in start_thread () from /lib/tls/libpthread.so.0
#6  0x00bb7fae in clone () from /lib/tls/libc.so.6

Thread 1 (Thread -1209071840 (LWP 27355)):
#0  0x00ad27a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00d53ef6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
#2  0x00bc4ad7 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libc.so.6
#3  0x001c1fcc in qpid::client::LocalQueueImpl::get (this=0x92008b8, result=@0xbfe15660, timeout={nanosecs = 9223372036854775807})
    at ../include/qpid/sys/posix/Condition.h:63
#4  0x001c427a in qpid::client::LocalQueueImpl::get (this=0x92008b8, timeout={nanosecs = 9223372036854775804})
    at qpid/client/LocalQueueImpl.cpp:40
#5  0x001c447d in qpid::client::LocalQueueImpl::pop (this=0xfffffffc, timeout={nanosecs = 9223372032559808512})
    at qpid/client/LocalQueueImpl.cpp:36
#6  0x001c0e4d in qpid::client::LocalQueue::pop (this=0xfffffffc, timeout={nanosecs = 9223372036854775804}) at qpid/client/LocalQueue.cpp:43
#7  0x08057576 in qpid::tests::Controller::process (this=0xbfe15a50, n=1, lq=
        {<qpid::client::Handle<qpid::client::LocalQueueImpl>> = {impl = 0x92008b8}, <No data fields>}, queue=Cannot access memory at address 0x0
) at perftest.cpp:382
#8  0x08062af7 in qpid::tests::Controller::run (this=0xbfe15a50)
    at /usr/lib/gcc/i386-redhat-linux/3.4.6/../../../../include/c++/3.4.6/ext/new_allocator.h:62
#9  0x080500d0 in main (argc=15, argv=0xbfe15c34) at ../../include/qpid/client/Session_0_10.h:553
---Type <return> to continue, or q <return> to quit---
#10 0x00b00e93 in __libc_start_main () from /lib/tls/libc.so.6
#11 0x0804f281 in _start ()
(gdb) detach
Detaching from program: /usr/bin/perftest, process 27355

Comment 7 Alan Conway 2010-05-25 19:51:51 UTC
I cannot reproduce this on mrg32 (ran the script for 40 minutes) but by inspection there is a race condition in the use of Bounds in SessionImpl::sendFrame. This function sends the frame first, then calls Bounds::expand(). But it's possible the network thread calls Bounds::reduce() between sending the frame and calling expand. If the Bounds::current value is 0 that reduce() is lost. If enough reduce() calls are lost in this way eventually we will deadlock.

The patch below should correct the problem:

diff --git a/qpid/cpp/src/qpid/client/SessionImpl.cpp b/qpid/cpp/src/qpid/client/SessionImpl.cpp
index b7ff430..b507625 100644
--- a/qpid/cpp/src/qpid/client/SessionImpl.cpp
+++ b/qpid/cpp/src/qpid/client/SessionImpl.cpp
@@ -510,8 +510,8 @@ void SessionImpl::proxyOut(AMQFrame& frame) // network thread
 
 void SessionImpl::sendFrame(AMQFrame& frame, bool canBlock)
 {
-    channel.handle(frame);
     connection->expand(frame.encodedSize(), canBlock);
+    channel.handle(frame);
 }
 
 void SessionImpl::deliver(AMQFrame& frame) // network thread

Comment 8 Gordon Sim 2010-05-26 07:30:48 UTC
I agree with Alan's assessment.

Comment 10 Gordon Sim 2010-05-26 11:33:27 UTC
Note that though Alan's analysis of the bug is sound, the suggested change would introduce a different hang (very reliably)... solution still sought.

Comment 11 Gordon Sim 2010-05-27 18:26:02 UTC
Fixed on trunk as r948936 and by commit d040ae874743e188779161c54d9e460f5b9ab9fe in release repo.

Comment 12 Frantisek Reznicek 2010-08-25 15:21:56 UTC
The issue has been fixed, tested on RHEL 4.8 / 5.5  i386 / x86_64 on packages:
python-qmf-0.7.946106-11.el5
python-qpid-0.7.946106-13.el5
qmf-*0.7.946106-12.el5
qpid-cpp-*-0.7.946106-12.el5
qpid-java-*0.7.946106-7.el5
qpid-tools-0.7.946106-9.el5

-> VERIFIED


Note You need to log in before you can comment on or make changes to this bug.