Bug 631969 - Rdma client can hang on exit under rare circumstances
Summary: Rdma client can hang on exit under rare circumstances
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: beta
Hardware: All
OS: Linux
medium
medium
Target Milestone: 1.3.2-RC1
: ---
Assignee: Andrew Stitcher
QA Contact: ppecka
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-09-08 19:03 UTC by Andrew Stitcher
Modified: 2011-08-12 16:22 UTC (History)
5 users (show)

Fixed In Version: qpid-cpp-mrg-0.7.946106-26
Doc Type: Bug Fix
Doc Text:
Rarely clients using the RDMA transport could hang if the broker abruptly disconnects.
Clone Of:
: 631973 (view as bug list)
Environment:
Last Closed: 2011-02-15 12:13:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 604688 0 urgent CLOSED rdma stability issues including client crashing when broker is killed from underneath it and broker likewise 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 631973 0 medium CLOSED Rdma client can crash under rare circumstances 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2011:0217 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid bug fix and enhancement update 2011-02-15 12:10:15 UTC

Internal Links: 631973

Description Andrew Stitcher 2010-09-08 19:03:34 UTC
Description of problem:

When running one of the verification tests for bug 604688 very occasionally the qpid-perftest will hang before exiting after the broker abruptly disconnects.

How reproducible:

Rarely, perhaps 1 in 100/200 exits.

Steps to Reproduce:
1. Run qpid broker:

while [ ! -f core.* ]; do date; src/qpidd --auth no& sleep 1; kill %%; sleep 1; done

2. Run qpid-perftest (in another window at the same time)

while [ ! -f core.* ]; do date; src/tests/qpid-perftest -Prdma -b 20.0.40.14 --qt 4 --count 10; done

 
Actual results:

Very rarely qpid-perftest hangs before exiting

Expected results:

Should continue forever

Additional info:

When hung the stacks traces look like:

(gdb) thread apply all bt

Thread 2 (Thread 0x418c4940 (LWP 19023)):
#0  0x0000003f8d0d4108 in epoll_wait () from /lib64/libc.so.6
#1  0x00002acf9c11d291 in qpid::sys::Poller::wait (this=0xa8536b0, timeout=<value optimized out>) at ../../src/qpid/cpp/src/qpid/sys/epoll/EpollPoller.cpp:563
#2  0x00002acf9c11dd97 in qpid::sys::Poller::run (this=0xa8536b0) at ../../src/qpid/cpp/src/qpid/sys/epoll/EpollPoller.cpp:515
#3  0x00002acf9c1160ba in qpid::sys::(anonymous namespace)::runRunnable (p=0x5) at ../../src/qpid/cpp/src/qpid/sys/posix/Thread.cpp:35
#4  0x0000003f8d80673d in start_thread (arg=<value optimized out>) at pthread_create.c:301
#5  0x0000003f8d0d3d1d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x2acf9c6bbf70 (LWP 19003)):
#0  0x0000003f8d80aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002acf9bda86e6 in wait (this=0xa864a40, id=...) at ../../src/qpid/cpp/include/qpid/sys/posix/Condition.h:63
#2  qpid::client::SessionImpl::waitForCompletionImpl (this=0xa864a40, id=...) at ../../src/qpid/cpp/src/qpid/client/SessionImpl.cpp:180
#3  0x00002acf9bda8790 in qpid::client::SessionImpl::waitForCompletion (this=0xa864a40, id=...) at ../../src/qpid/cpp/src/qpid/client/SessionImpl.cpp:173
#4  0x00002acf9bd90108 in qpid::client::Future::wait (this=0x7fff75b08350, session=...) at ../../src/qpid/cpp/src/qpid/client/Future.cpp:31
#5  0x00002acf9bda0ea8 in qpid::client::SessionBase_0_10::sync (this=<value optimized out>) at ../../src/qpid/cpp/src/qpid/client/SessionBase_0_10.cpp:50
#6  0x0000000000414672 in qpid::tests::Setup::queueInit (this=<value optimized out>, name=..., durable=<value optimized out>, settings=<value optimized out>) at ../../../src/qpid/cpp/src/tests/qpid-perftest.cpp:268
#7  0x0000000000414be2 in qpid::tests::Setup::run (this=0x7fff75b08a40) at ../../../src/qpid/cpp/src/tests/qpid-perftest.cpp:284
#8  0x000000000040be92 in main (argc=<value optimized out>, argv=<value optimized out>) at ../../../src/qpid/cpp/src/tests/qpid-perftest.cpp:694

Comment 1 Andrew Stitcher 2010-09-08 19:08:18 UTC
It appears that in this rare case something prevents the shutdownHandler from being called in ConnectionImpl so that it doesn't know that the connection has gone away.

Comment 2 Andrew Stitcher 2010-09-22 21:40:54 UTC
It seems that at least one possibility for this hang is when the rdma disconnects happen very early in the lifetime of the connection, perhaps even before connection establishment

Comment 3 Andrew Stitcher 2010-10-12 16:59:34 UTC
This hang seems to have been fixed by the bug fixing work on BZ631973. It no longer exhibits

Fixed on trunk

Comment 4 Andrew Stitcher 2010-10-13 16:40:53 UTC
Tested fix without hang for 24 hours. Checked into 1.3 branch

Comment 5 Andrew Stitcher 2010-10-13 16:44:03 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Rarely clients using the RDMA transport could hang if the broker abruptly disconnects.

Comment 8 Andrew Stitcher 2011-01-19 16:37:47 UTC
It's not clear to me whether this is the same bug or a different one. However I believe that this issue should also be fixed by the same set of changes.

It looks to me that you were reproducing the bug rather than establishing its absence, so if you see this problem in the item under test please raise it as a new bug.

Comment 9 ppecka 2011-01-24 10:31:43 UTC
(In reply to comment #8)

> It looks to me that you were reproducing the bug rather than establishing its
> absence, so if you see this problem in the item under test please raise it as a
> new bug.
yes, exactly - this happened when I tried to reproduce this issue

Comment 10 ppecka 2011-01-24 11:00:27 UTC
continuously tested for 3days, verified with:

# rpm -qa | grep -Ei '(qpid)' | sort -u
python-qpid-0.7.946106-14.el5
qpid-cpp-client-0.7.946106-26.el5
qpid-cpp-client-devel-0.7.946106-26.el5
qpid-cpp-client-devel-docs-0.7.946106-26.el5
qpid-cpp-client-rdma-0.7.946106-26.el5
qpid-cpp-client-ssl-0.7.946106-26.el5
qpid-cpp-mrg-debuginfo-0.7.946106-26.el5
qpid-cpp-server-0.7.946106-26.el5
qpid-cpp-server-cluster-0.7.946106-26.el5
qpid-cpp-server-devel-0.7.946106-26.el5
qpid-cpp-server-rdma-0.7.946106-26.el5
qpid-cpp-server-ssl-0.7.946106-26.el5
qpid-cpp-server-store-0.7.946106-26.el5
qpid-cpp-server-xml-0.7.946106-26.el5
qpid-java-client-0.7.946106-14.el5
qpid-java-common-0.7.946106-14.el5
qpid-java-example-0.7.946106-14.el5
qpid-tools-0.7.946106-11.el5

--> VERIFIED

Comment 11 errata-xmlrpc 2011-02-15 12:13:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0217.html


Note You need to log in before you can comment on or make changes to this bug.