Bug 674056 - QPID Unexpected command start frame, iWARP configuration qpid-perftest hangs
Summary: QPID Unexpected command start frame, iWARP configuration qpid-perftest hangs
Keywords:
Status: CLOSED DUPLICATE of bug 681313
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.3
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: 2.0
: ---
Assignee: Andrew Stitcher
QA Contact: ppecka
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-31 13:51 UTC by ppecka
Modified: 2011-08-12 16:21 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-13 03:37:02 UTC
Target Upstream Version:


Attachments (Terms of Use)
The issue reproducer (3.17 KB, text/plain)
2011-02-02 12:35 UTC, Frantisek Reznicek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 724968 0 medium NEW iWARP configuration qpid-perftest hangs "Connection closed" when --qt arg is supplied 2021-02-22 00:41:40 UTC

Internal Links: 724968

Description ppecka 2011-01-31 13:51:46 UTC
Description of problem:
While verifying bz484691 following errors appeared in qpidd log. Used iWARP
over Chelsio S310-CR.
Both qpid-perftest and qpid-latency-test were running simultaneously over rdma
against qpidd on the other side. While qpid-latency-test keeps working, qpid-perftest hangs with:


2011-01-31 08:37:49 error Caught exception in state: 3 with event: 1: Success
2011-01-31 08:37:49 warning Connection [192.168.1.4:44014 192.168.1.5:5672] closed
PublishThread exception: Connection [192.168.1.4:44014 192.168.1.5:5672] closed



(gdb) bt 
#0  0x000000362320aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003edf08520b in pop (this=0x1443cf90, result=..., 
    timeout=<value optimized out>) at ../include/qpid/sys/posix/Condition.h:63
#2  qpid::client::LocalQueueImpl::get (this=0x1443cf90, result=..., 
    timeout=<value optimized out>) at qpid/client/LocalQueueImpl.cpp:49
#3  0x0000003edf085a79 in qpid::client::LocalQueueImpl::get (this=0x1443cf90, 
    timeout=...) at qpid/client/LocalQueueImpl.cpp:40
#4  0x0000003edf085c19 in qpid::client::LocalQueueImpl::pop (this=0x80, 
    timeout=...) at qpid/client/LocalQueueImpl.cpp:36
#5  0x0000003edf082c2c in qpid::client::LocalQueue::pop (
    this=<value optimized out>, timeout=...) at qpid/client/LocalQueue.cpp:43
#6  0x000000000040f7d7 in qpid::tests::Controller::process(size_t, qpid::client::LocalQueue, std::string, boost::function<void ()(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&),std::allocator<void> >) (
    this=<value optimized out>, n=4, lq=..., queue=<value optimized out>, 
    msgFn=...) at qpid-perftest.cpp:382
#7  0x0000000000413f76 in qpid::tests::Controller::run (this=0x7fff1400c2a0)
    at qpid-perftest.cpp:422
#8  0x000000000040cceb in main (argc=331444048, argv=<value optimized out>)
    at qpid-perftest.cpp:719



Version-Release number of selected component (if applicable):
#rpm -qa | grep qpid | sort -u
python-qpid-0.7.946106-15.el5
qpid-cpp-client-0.7.946106-27.el5
qpid-cpp-client-devel-0.7.946106-27.el5
qpid-cpp-client-devel-docs-0.7.946106-27.el5
qpid-cpp-client-rdma-0.7.946106-27.el5
qpid-cpp-client-ssl-0.7.946106-27.el5
qpid-cpp-server-0.7.946106-27.el5
qpid-cpp-server-cluster-0.7.946106-27.el5
qpid-cpp-server-devel-0.7.946106-27.el5
qpid-cpp-server-rdma-0.7.946106-27.el5
qpid-cpp-server-ssl-0.7.946106-27.el5
qpid-cpp-server-store-0.7.946106-27.el5
qpid-cpp-server-xml-0.7.946106-27.el5
qpid-java-client-0.7.946106-14.el5
qpid-java-common-0.7.946106-14.el5
qpid-java-example-0.7.946106-14.el5
qpid-tools-0.7.946106-12.el5

libcxgb3-1.2.5-2.el5
kernel-2.6.18-238.el5

How reproducible:
100%

Steps to Reproduce:
HostA
1.qpidd --auth no --mgmt-enable no --log-to-file /tmp/qpidd.log -d --log-enable info+


HostB( point 2. and 3. in autonomous shell)
2. while true; do date; qpid-perftest -b 192.168.1.5 --size 161803 --npubs 1 --count 10 --protocol rdma --log-to-file /tmp/qpid-perftest.log --log-to-stderr no --base-name "perf.$(date +%s%N)" --log-enable notice+  2>&1 ; sleep 0.5; done>>/tmp/qpid-perftest.log

3.while true; do date; qpid-latency-test -b 192.168.1.5 --count 10 --protocol rdma --log-to-file /tmp/qpid-latency-test.log --log-to-stderr no --queue-base-name "latency.$(date +%s%N)" --log-enable notice+ 2>&1 ; sleep 0.5; done>>/tmp/qpid-latency-test.log

  
Actual results:
QPIDD LOG>>>

2011-01-31 08:16:05 error Connection exception: framing-error: Unexpected command start frame. (qpid/SessionState.cpp:57)
2011-01-31 08:16:05 error Connection 192.168.1.4:35680 closed by error: Unexpected command start frame. (qpid/SessionState.cpp:57)(501)
2011-01-31 08:16:05 error Caught exception in state: 3 with event: 1: Success
2011-01-31 08:16:05 error Caught exception in state: 3 with event: 1: Success




QPID-PERFTEST LOG>>>
Mon Jan 31 08:16:04 EST 2011
Processing 1 messages from perf.1296479764214104000_sub_ready . done.
Sending start 1 times to perf.1296479764214104000_pub_start
Processing 1 messages from perf.1296479764214104000_pub_done .2011-01-31 08
:16:05 error Caught exception in state: 3 with event: 1: Success
2011-01-31 08:16:05 error RDMA: qp=0x2636c50: Deleting queue before all wri
te buffers finished
2011-01-31 08:16:05 warning Connection [192.168.1.4:35680 192.168.1.5:5672]
 closed
PublishThread exception: Connection [192.168.1.4:35680 192.168.1.5:5672] closed



Expected results:
qpid-perftest not failing

Additional info:

Comment 1 ppecka 2011-01-31 14:06:12 UTC
this also appeared in qpid-perftest.log
tested as described above except qpid-perftest option changed --npubs 2

2011-01-31 08:47:38 error RDMA: qp=0x2db8dc0: Deleting queue before all write buffers finished
2011-01-31 08:47:38 error Caught exception in state: 3 with event: 7: Success
2011-01-31 08:47:38 error Caught exception in state: 3 with event: 7: Success

Additional info:
after qpidd restart not reproducible again

Comment 2 Andrew Stitcher 2011-02-01 16:11:38 UTC
This may be the same underlying bug as bug 674011, it seems to have the same actual symptom (qpid-perftest hanging on exit)

Comment 3 Frantisek Reznicek 2011-02-02 12:35:17 UTC
Created attachment 476550 [details]
The issue reproducer

More details on reproducibility:
quite easy with attached script, triggers the issue within 30 minutes.

How to run:
on one machine
[root@mrg4 bz484691]# ./run.sh 192.168.1.5 0
+Hit enter after 10 seconds (once the other is up)

on second machine
[root@mrg5 bz484691]# ./run.sh 192.168.1.4 0
+Hit enter after 10 seconds (once the other is up)

Comment 4 Andrew Stitcher 2011-05-13 03:37:02 UTC
I think this is very likely to have the same underlying cause as bug 681313

*** This bug has been marked as a duplicate of bug 681313 ***


Note You need to log in before you can comment on or make changes to this bug.