Bug 673550 - RDMA error in qpidd log Deleting queue before all write buffers finished
Summary: RDMA error in qpidd log Deleting queue before all write buffers finished
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 2.3
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Andrew Stitcher
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-28 17:09 UTC by ppecka
Modified: 2020-05-22 14:34 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-22 14:34:58 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description ppecka 2011-01-28 17:09:39 UTC
While verifying bz484691 following errors appeared in qpidd log. Used iWARP over Chelsio S310-CR.
Both qpid-perftest and qpid-latency-test were running simultaneously over rdma against qpidd on the other side.


Description of problem:
2011-01-28 08:41:38 error RDMA: qp=0x2aaab42d2cc0: Deleting queue before all write buffers finished
2011-01-28 08:49:38 error RDMA: qp=0x2aaab8aa1f50: Deleting queue before all write buffers finished
2011-01-28 09:33:02 error RDMA: qp=0x2aaab4964a20: Deleting queue before all write buffers finished
2011-01-28 09:36:06 error RDMA: qp=0x2aaab49e1610: Deleting queue before all write buffers finished
2011-01-28 09:42:22 error RDMA: qp=0x2aaab4b13c80: Deleting queue before all write buffers finished



Version-Release number of selected component (if applicable):
rpm -qa | grep qpid | sort -u
python-qpid-0.7.946106-15.el5
qpid-cpp-client-0.7.946106-27.el5
qpid-cpp-client-devel-0.7.946106-27.el5
qpid-cpp-client-devel-docs-0.7.946106-27.el5
qpid-cpp-client-rdma-0.7.946106-27.el5
qpid-cpp-client-ssl-0.7.946106-27.el5
qpid-cpp-server-0.7.946106-27.el5
qpid-cpp-server-cluster-0.7.946106-27.el5
qpid-cpp-server-devel-0.7.946106-27.el5
qpid-cpp-server-rdma-0.7.946106-27.el5
qpid-cpp-server-ssl-0.7.946106-27.el5
qpid-cpp-server-store-0.7.946106-27.el5
qpid-cpp-server-xml-0.7.946106-27.el5
qpid-java-client-0.7.946106-14.el5
qpid-java-common-0.7.946106-14.el5
qpid-java-example-0.7.946106-14.el5
qpid-tools-0.7.946106-12.el5

libcxgb3-1.2.5-2.el5
kernel-2.6.18-238.el5


How reproducible:


Steps to Reproduce:
HostA (192.168.1.5 )
1. qpidd --auth no --mgmt-enable no --log-to-file /tmp/qpidd.log -d

HostB (192.168.1.4)
2.
while true; do date; qpid-perftest -b 192.168.1.5 --count 100 --protocol rdma --log-to-file /tmp/qpid-perftest.log --log-to-stderr no --base-name "perf.$(date +%s%N)"  2>&1 ; sleep 0.5; done>>/tmp/qpid-perftest.log

3.
while true; do date; qpid-latency-test -b 192.168.1.5 --count 100 --protocol rdma --log-to-file /tmp/qpid-latency-test.log --log-to-stderr no --queue-base-name "latency.$(date +%s%N)" 2>&1 ; sleep 0.5; done>>/tmp/qpid-latency-test.log
  
Actual results:
Error messages in qpid.log

Expected results:
No error messages

Additional info:

Comment 1 Gordon Sim 2011-01-28 17:23:16 UTC
Is this expected?

Comment 2 Andrew Stitcher 2011-01-28 20:32:13 UTC
These messages are expected if the peer disconnects abruptly without receiving all the buffered messages that it should have received.

If the peer is does not disconnect abruptly but shuts down normally then these messages probably indicate a problem occurring and should be investigated.

If there is only one set of messages then they could well be appearing from the final interrupting of the perftest/latencytest to stop the test.

The message should probably be downgraded to warning rather than error, as it can happen without necessarily being an error (although it does look fishy in this case as the should only have been orderly shut downs here.

Comment 3 Zdenek Kraus 2013-02-05 09:35:24 UTC
Same issue was observed with qpid-0.18-14 on rhel6.4 with Mellanox infiniband devices (IPoIB).
Scenario is the same. Clients reports exit code 0, as far as I observe there was no message loss.

HW: InfiniBand: Mellanox Technologies MT26428

Log messages:
2013-01-31 23:04:08 [System] error RDMA: qp=0x4bd32950: Deleting queue before all write buffers finished
2013-01-31 23:04:19 [System] error RDMA: qp=0x52e34b50: Deleting queue before all write buffers finished
2013-01-31 23:04:22 [System] error RDMA: qp=0x6b1544c0: Deleting queue before all write buffers finished

Packages:
python-qpid-0.18-4.el6.noarch
python-qpid-qmf-0.18-14.el6.x86_64
qpid-cpp-client-0.18-14.el6.x86_64
qpid-cpp-client-devel-0.18-14.el6.x86_64
qpid-cpp-client-devel-docs-0.18-14.el6.noarch
qpid-cpp-client-rdma-0.18-14.el6.x86_64
qpid-cpp-server-0.18-14.el6.x86_64
qpid-cpp-server-devel-0.18-14.el6.x86_64
qpid-cpp-server-rdma-0.18-14.el6.x86_64
qpid-cpp-server-store-0.18-14.el6.x86_64
qpid-cpp-server-xml-0.18-14.el6.x86_64
qpid-java-client-0.18-7.el6.noarch
qpid-java-common-0.18-7.el6.noarch
qpid-java-example-0.18-7.el6.noarch
qpid-qmf-0.18-14.el6.x86_64
qpid-tools-0.18-7.el6_3.noarch

libcxgb3-1.3.1-1.el6.x86_64
kernel-2.6.32-358.el6.x86_64

Comment 4 Zdenek Kraus 2013-02-11 11:35:21 UTC
This issue is also present with Chelsio devices via iWarp:
Chelsio Communications Inc T320 10GbE Dual Port Adapter


Note You need to log in before you can comment on or make changes to this bug.