Description of problem: When running one of the verification tests for bug 604688 very occasionally the qpid-perftest will hang before exiting after the broker abruptly disconnects. How reproducible: Rarely, perhaps 1 in 100/200 exits. Steps to Reproduce: 1. Run qpid broker: while [ ! -f core.* ]; do date; src/qpidd --auth no& sleep 1; kill %%; sleep 1; done 2. Run qpid-perftest (in another window at the same time) while [ ! -f core.* ]; do date; src/tests/qpid-perftest -Prdma -b 20.0.40.14 --qt 4 --count 10; done Actual results: Very rarely qpid-perftest hangs before exiting Expected results: Should continue forever Additional info: When hung the stacks traces look like: (gdb) thread apply all bt Thread 2 (Thread 0x418c4940 (LWP 19023)): #0 0x0000003f8d0d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x00002acf9c11d291 in qpid::sys::Poller::wait (this=0xa8536b0, timeout=<value optimized out>) at ../../src/qpid/cpp/src/qpid/sys/epoll/EpollPoller.cpp:563 #2 0x00002acf9c11dd97 in qpid::sys::Poller::run (this=0xa8536b0) at ../../src/qpid/cpp/src/qpid/sys/epoll/EpollPoller.cpp:515 #3 0x00002acf9c1160ba in qpid::sys::(anonymous namespace)::runRunnable (p=0x5) at ../../src/qpid/cpp/src/qpid/sys/posix/Thread.cpp:35 #4 0x0000003f8d80673d in start_thread (arg=<value optimized out>) at pthread_create.c:301 #5 0x0000003f8d0d3d1d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2acf9c6bbf70 (LWP 19003)): #0 0x0000003f8d80aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002acf9bda86e6 in wait (this=0xa864a40, id=...) at ../../src/qpid/cpp/include/qpid/sys/posix/Condition.h:63 #2 qpid::client::SessionImpl::waitForCompletionImpl (this=0xa864a40, id=...) at ../../src/qpid/cpp/src/qpid/client/SessionImpl.cpp:180 #3 0x00002acf9bda8790 in qpid::client::SessionImpl::waitForCompletion (this=0xa864a40, id=...) at ../../src/qpid/cpp/src/qpid/client/SessionImpl.cpp:173 #4 0x00002acf9bd90108 in qpid::client::Future::wait (this=0x7fff75b08350, session=...) at ../../src/qpid/cpp/src/qpid/client/Future.cpp:31 #5 0x00002acf9bda0ea8 in qpid::client::SessionBase_0_10::sync (this=<value optimized out>) at ../../src/qpid/cpp/src/qpid/client/SessionBase_0_10.cpp:50 #6 0x0000000000414672 in qpid::tests::Setup::queueInit (this=<value optimized out>, name=..., durable=<value optimized out>, settings=<value optimized out>) at ../../../src/qpid/cpp/src/tests/qpid-perftest.cpp:268 #7 0x0000000000414be2 in qpid::tests::Setup::run (this=0x7fff75b08a40) at ../../../src/qpid/cpp/src/tests/qpid-perftest.cpp:284 #8 0x000000000040be92 in main (argc=<value optimized out>, argv=<value optimized out>) at ../../../src/qpid/cpp/src/tests/qpid-perftest.cpp:694
It appears that in this rare case something prevents the shutdownHandler from being called in ConnectionImpl so that it doesn't know that the connection has gone away.
It seems that at least one possibility for this hang is when the rdma disconnects happen very early in the lifetime of the connection, perhaps even before connection establishment
This hang seems to have been fixed by the bug fixing work on BZ631973. It no longer exhibits Fixed on trunk
Tested fix without hang for 24 hours. Checked into 1.3 branch
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Rarely clients using the RDMA transport could hang if the broker abruptly disconnects.
It's not clear to me whether this is the same bug or a different one. However I believe that this issue should also be fixed by the same set of changes. It looks to me that you were reproducing the bug rather than establishing its absence, so if you see this problem in the item under test please raise it as a new bug.
(In reply to comment #8) > It looks to me that you were reproducing the bug rather than establishing its > absence, so if you see this problem in the item under test please raise it as a > new bug. yes, exactly - this happened when I tried to reproduce this issue
continuously tested for 3days, verified with: # rpm -qa | grep -Ei '(qpid)' | sort -u python-qpid-0.7.946106-14.el5 qpid-cpp-client-0.7.946106-26.el5 qpid-cpp-client-devel-0.7.946106-26.el5 qpid-cpp-client-devel-docs-0.7.946106-26.el5 qpid-cpp-client-rdma-0.7.946106-26.el5 qpid-cpp-client-ssl-0.7.946106-26.el5 qpid-cpp-mrg-debuginfo-0.7.946106-26.el5 qpid-cpp-server-0.7.946106-26.el5 qpid-cpp-server-cluster-0.7.946106-26.el5 qpid-cpp-server-devel-0.7.946106-26.el5 qpid-cpp-server-rdma-0.7.946106-26.el5 qpid-cpp-server-ssl-0.7.946106-26.el5 qpid-cpp-server-store-0.7.946106-26.el5 qpid-cpp-server-xml-0.7.946106-26.el5 qpid-java-client-0.7.946106-14.el5 qpid-java-common-0.7.946106-14.el5 qpid-java-example-0.7.946106-14.el5 qpid-tools-0.7.946106-11.el5 --> VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0217.html