Bug 673550

Summary: RDMA error in qpidd log Deleting queue before all write buffers finished
Product: Red Hat Enterprise MRG Reporter: ppecka <ppecka>
Component: qpid-cppAssignee: Andrew Stitcher <astitcher>
Status: CLOSED WONTFIX QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: 2.3CC: gsim, iboverma, jross, zkraus
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-22 14:34:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description ppecka 2011-01-28 17:09:39 UTC
While verifying bz484691 following errors appeared in qpidd log. Used iWARP over Chelsio S310-CR.
Both qpid-perftest and qpid-latency-test were running simultaneously over rdma against qpidd on the other side.


Description of problem:
2011-01-28 08:41:38 error RDMA: qp=0x2aaab42d2cc0: Deleting queue before all write buffers finished
2011-01-28 08:49:38 error RDMA: qp=0x2aaab8aa1f50: Deleting queue before all write buffers finished
2011-01-28 09:33:02 error RDMA: qp=0x2aaab4964a20: Deleting queue before all write buffers finished
2011-01-28 09:36:06 error RDMA: qp=0x2aaab49e1610: Deleting queue before all write buffers finished
2011-01-28 09:42:22 error RDMA: qp=0x2aaab4b13c80: Deleting queue before all write buffers finished



Version-Release number of selected component (if applicable):
rpm -qa | grep qpid | sort -u
python-qpid-0.7.946106-15.el5
qpid-cpp-client-0.7.946106-27.el5
qpid-cpp-client-devel-0.7.946106-27.el5
qpid-cpp-client-devel-docs-0.7.946106-27.el5
qpid-cpp-client-rdma-0.7.946106-27.el5
qpid-cpp-client-ssl-0.7.946106-27.el5
qpid-cpp-server-0.7.946106-27.el5
qpid-cpp-server-cluster-0.7.946106-27.el5
qpid-cpp-server-devel-0.7.946106-27.el5
qpid-cpp-server-rdma-0.7.946106-27.el5
qpid-cpp-server-ssl-0.7.946106-27.el5
qpid-cpp-server-store-0.7.946106-27.el5
qpid-cpp-server-xml-0.7.946106-27.el5
qpid-java-client-0.7.946106-14.el5
qpid-java-common-0.7.946106-14.el5
qpid-java-example-0.7.946106-14.el5
qpid-tools-0.7.946106-12.el5

libcxgb3-1.2.5-2.el5
kernel-2.6.18-238.el5


How reproducible:


Steps to Reproduce:
HostA (192.168.1.5 )
1. qpidd --auth no --mgmt-enable no --log-to-file /tmp/qpidd.log -d

HostB (192.168.1.4)
2.
while true; do date; qpid-perftest -b 192.168.1.5 --count 100 --protocol rdma --log-to-file /tmp/qpid-perftest.log --log-to-stderr no --base-name "perf.$(date +%s%N)"  2>&1 ; sleep 0.5; done>>/tmp/qpid-perftest.log

3.
while true; do date; qpid-latency-test -b 192.168.1.5 --count 100 --protocol rdma --log-to-file /tmp/qpid-latency-test.log --log-to-stderr no --queue-base-name "latency.$(date +%s%N)" 2>&1 ; sleep 0.5; done>>/tmp/qpid-latency-test.log
  
Actual results:
Error messages in qpid.log

Expected results:
No error messages

Additional info:

Comment 1 Gordon Sim 2011-01-28 17:23:16 UTC
Is this expected?

Comment 2 Andrew Stitcher 2011-01-28 20:32:13 UTC
These messages are expected if the peer disconnects abruptly without receiving all the buffered messages that it should have received.

If the peer is does not disconnect abruptly but shuts down normally then these messages probably indicate a problem occurring and should be investigated.

If there is only one set of messages then they could well be appearing from the final interrupting of the perftest/latencytest to stop the test.

The message should probably be downgraded to warning rather than error, as it can happen without necessarily being an error (although it does look fishy in this case as the should only have been orderly shut downs here.

Comment 3 Zdenek Kraus 2013-02-05 09:35:24 UTC
Same issue was observed with qpid-0.18-14 on rhel6.4 with Mellanox infiniband devices (IPoIB).
Scenario is the same. Clients reports exit code 0, as far as I observe there was no message loss.

HW: InfiniBand: Mellanox Technologies MT26428

Log messages:
2013-01-31 23:04:08 [System] error RDMA: qp=0x4bd32950: Deleting queue before all write buffers finished
2013-01-31 23:04:19 [System] error RDMA: qp=0x52e34b50: Deleting queue before all write buffers finished
2013-01-31 23:04:22 [System] error RDMA: qp=0x6b1544c0: Deleting queue before all write buffers finished

Packages:
python-qpid-0.18-4.el6.noarch
python-qpid-qmf-0.18-14.el6.x86_64
qpid-cpp-client-0.18-14.el6.x86_64
qpid-cpp-client-devel-0.18-14.el6.x86_64
qpid-cpp-client-devel-docs-0.18-14.el6.noarch
qpid-cpp-client-rdma-0.18-14.el6.x86_64
qpid-cpp-server-0.18-14.el6.x86_64
qpid-cpp-server-devel-0.18-14.el6.x86_64
qpid-cpp-server-rdma-0.18-14.el6.x86_64
qpid-cpp-server-store-0.18-14.el6.x86_64
qpid-cpp-server-xml-0.18-14.el6.x86_64
qpid-java-client-0.18-7.el6.noarch
qpid-java-common-0.18-7.el6.noarch
qpid-java-example-0.18-7.el6.noarch
qpid-qmf-0.18-14.el6.x86_64
qpid-tools-0.18-7.el6_3.noarch

libcxgb3-1.3.1-1.el6.x86_64
kernel-2.6.32-358.el6.x86_64

Comment 4 Zdenek Kraus 2013-02-11 11:35:21 UTC
This issue is also present with Chelsio devices via iWarp:
Chelsio Communications Inc T320 10GbE Dual Port Adapter