Hide Forgot
Description of problem: While verifying bz484691 following errors appeared in qpidd log. Used iWARP over Chelsio S310-CR. Both qpid-perftest and qpid-latency-test were running simultaneously over rdma against qpidd on the other side. While qpid-latency-test keeps working, qpid-perftest hangs with: 2011-01-31 08:37:49 error Caught exception in state: 3 with event: 1: Success 2011-01-31 08:37:49 warning Connection [192.168.1.4:44014 192.168.1.5:5672] closed PublishThread exception: Connection [192.168.1.4:44014 192.168.1.5:5672] closed (gdb) bt #0 0x000000362320aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003edf08520b in pop (this=0x1443cf90, result=..., timeout=<value optimized out>) at ../include/qpid/sys/posix/Condition.h:63 #2 qpid::client::LocalQueueImpl::get (this=0x1443cf90, result=..., timeout=<value optimized out>) at qpid/client/LocalQueueImpl.cpp:49 #3 0x0000003edf085a79 in qpid::client::LocalQueueImpl::get (this=0x1443cf90, timeout=...) at qpid/client/LocalQueueImpl.cpp:40 #4 0x0000003edf085c19 in qpid::client::LocalQueueImpl::pop (this=0x80, timeout=...) at qpid/client/LocalQueueImpl.cpp:36 #5 0x0000003edf082c2c in qpid::client::LocalQueue::pop ( this=<value optimized out>, timeout=...) at qpid/client/LocalQueue.cpp:43 #6 0x000000000040f7d7 in qpid::tests::Controller::process(size_t, qpid::client::LocalQueue, std::string, boost::function<void ()(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&),std::allocator<void> >) ( this=<value optimized out>, n=4, lq=..., queue=<value optimized out>, msgFn=...) at qpid-perftest.cpp:382 #7 0x0000000000413f76 in qpid::tests::Controller::run (this=0x7fff1400c2a0) at qpid-perftest.cpp:422 #8 0x000000000040cceb in main (argc=331444048, argv=<value optimized out>) at qpid-perftest.cpp:719 Version-Release number of selected component (if applicable): #rpm -qa | grep qpid | sort -u python-qpid-0.7.946106-15.el5 qpid-cpp-client-0.7.946106-27.el5 qpid-cpp-client-devel-0.7.946106-27.el5 qpid-cpp-client-devel-docs-0.7.946106-27.el5 qpid-cpp-client-rdma-0.7.946106-27.el5 qpid-cpp-client-ssl-0.7.946106-27.el5 qpid-cpp-server-0.7.946106-27.el5 qpid-cpp-server-cluster-0.7.946106-27.el5 qpid-cpp-server-devel-0.7.946106-27.el5 qpid-cpp-server-rdma-0.7.946106-27.el5 qpid-cpp-server-ssl-0.7.946106-27.el5 qpid-cpp-server-store-0.7.946106-27.el5 qpid-cpp-server-xml-0.7.946106-27.el5 qpid-java-client-0.7.946106-14.el5 qpid-java-common-0.7.946106-14.el5 qpid-java-example-0.7.946106-14.el5 qpid-tools-0.7.946106-12.el5 libcxgb3-1.2.5-2.el5 kernel-2.6.18-238.el5 How reproducible: 100% Steps to Reproduce: HostA 1.qpidd --auth no --mgmt-enable no --log-to-file /tmp/qpidd.log -d --log-enable info+ HostB( point 2. and 3. in autonomous shell) 2. while true; do date; qpid-perftest -b 192.168.1.5 --size 161803 --npubs 1 --count 10 --protocol rdma --log-to-file /tmp/qpid-perftest.log --log-to-stderr no --base-name "perf.$(date +%s%N)" --log-enable notice+ 2>&1 ; sleep 0.5; done>>/tmp/qpid-perftest.log 3.while true; do date; qpid-latency-test -b 192.168.1.5 --count 10 --protocol rdma --log-to-file /tmp/qpid-latency-test.log --log-to-stderr no --queue-base-name "latency.$(date +%s%N)" --log-enable notice+ 2>&1 ; sleep 0.5; done>>/tmp/qpid-latency-test.log Actual results: QPIDD LOG>>> 2011-01-31 08:16:05 error Connection exception: framing-error: Unexpected command start frame. (qpid/SessionState.cpp:57) 2011-01-31 08:16:05 error Connection 192.168.1.4:35680 closed by error: Unexpected command start frame. (qpid/SessionState.cpp:57)(501) 2011-01-31 08:16:05 error Caught exception in state: 3 with event: 1: Success 2011-01-31 08:16:05 error Caught exception in state: 3 with event: 1: Success QPID-PERFTEST LOG>>> Mon Jan 31 08:16:04 EST 2011 Processing 1 messages from perf.1296479764214104000_sub_ready . done. Sending start 1 times to perf.1296479764214104000_pub_start Processing 1 messages from perf.1296479764214104000_pub_done .2011-01-31 08 :16:05 error Caught exception in state: 3 with event: 1: Success 2011-01-31 08:16:05 error RDMA: qp=0x2636c50: Deleting queue before all wri te buffers finished 2011-01-31 08:16:05 warning Connection [192.168.1.4:35680 192.168.1.5:5672] closed PublishThread exception: Connection [192.168.1.4:35680 192.168.1.5:5672] closed Expected results: qpid-perftest not failing Additional info:
this also appeared in qpid-perftest.log tested as described above except qpid-perftest option changed --npubs 2 2011-01-31 08:47:38 error RDMA: qp=0x2db8dc0: Deleting queue before all write buffers finished 2011-01-31 08:47:38 error Caught exception in state: 3 with event: 7: Success 2011-01-31 08:47:38 error Caught exception in state: 3 with event: 7: Success Additional info: after qpidd restart not reproducible again
This may be the same underlying bug as bug 674011, it seems to have the same actual symptom (qpid-perftest hanging on exit)
Created attachment 476550 [details] The issue reproducer More details on reproducibility: quite easy with attached script, triggers the issue within 30 minutes. How to run: on one machine [root@mrg4 bz484691]# ./run.sh 192.168.1.5 0 +Hit enter after 10 seconds (once the other is up) on second machine [root@mrg5 bz484691]# ./run.sh 192.168.1.4 0 +Hit enter after 10 seconds (once the other is up)
I think this is very likely to have the same underlying cause as bug 681313 *** This bug has been marked as a duplicate of bug 681313 ***