Bug 955578 - Multi-threaded fetching messages from different queues causes fetch delay
Summary: Multi-threaded fetching messages from different queues causes fetch delay
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 2.3
Hardware: All
OS: Linux
high
medium
Target Milestone: 3.0
: ---
Assignee: Gordon Sim
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
Depends On:
Blocks: 958895
TreeView+ depends on / blocked
 
Reported: 2013-04-23 10:40 UTC by Pavel Moravec
Modified: 2018-12-06 15:02 UTC (History)
5 users (show)

Fixed In Version: qpid-cpp-0.22-4.el6, qpid-cpp-0.22-4.el5
Doc Type: Bug Fix
Doc Text:
It was discovered that if there was more than one thread concurrently processing the session queue, a message of interest to one thread may be processed by another thread preventing the 'correct' thread from seeing it. The fix ensures only one thread at a time can process the incoming queue. Concurrent fetch() calls now work as expected without needing to timeout to see a message as it arrives.
Clone Of:
: 958895 (view as bug list)
Environment:
Last Closed: 2014-09-24 15:07:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Reproducer (4.95 KB, text/x-c++src)
2013-04-23 10:42 UTC, Pavel Moravec
no flags Details
reproducer for rhel5 (boost 1.33.1) (6.28 KB, text/x-c++src)
2013-05-20 08:39 UTC, Frantisek Reznicek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Apache JIRA QPID-4786 0 None None None Never
Red Hat Product Errata RHEA-2014:1296 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging 3.0 Release 2014-09-24 19:00:06 UTC

Description Pavel Moravec 2013-04-23 10:40:53 UTC
Description of problem:
See attached program that:
- creates a queue (via QMF) and fetches QMF response in a response queue (response got very fast)
- runs a new thread that subscribes to a queue and fetches messages in loop
- deletes the queue (not-used at all) via QMF and fetches response in a response queue
- joins the child thread

The latest fetch takes as much time as timeout is set on it. Despite the response gets quite fast (similarly as fast the QMF response to create request).


Version-Release number of selected component (if applicable):
qpid-cpp-client-0.18-14


How reproducible:
100%


Steps to Reproduce:
1. See attached test_qmf_DeleteOnly.cpp
2. g++ test_qmf_DeleteOnly.cpp -o test_qmf_DeleteOnly.cpp -lpthread -lqpidmessaging -lqpidclient -lqpidcommon -lqpidtypes -lboost_date_time -lboost_thread-mt
3. ./test_qmf_CreateDeleteOnly 3000
(the parameter specifies fetch timeout in ms)


Actual results:
2013-Apr-23 09:20:32.299674] Fetch timeout to be set to 3000ms.
2013-Apr-23 09:20:32.299860] Connecting to amqp:127.0.0.1:5672...
2013-Apr-23 09:20:32.313633] Creating session...
2013-Apr-23 09:20:32.315619] Creating sender...
2013-Apr-23 09:20:32.316667] Declaring queue...
2013-Apr-23 09:20:32.318789]  just before QMF fetch response on queue 1reply-queue; {create:always, node:{x-declare:{auto-delete:true}}}
2013-Apr-23 09:20:32.359730]  just after QMF fetch response: _method_response
2013-Apr-23 09:20:32.360898] Creating receiver...
2013-Apr-23 09:20:32.401596] Starting thread...
2013-Apr-23 09:20:32.402169] Deleting queue...
2013-Apr-23 09:20:32.443053]  just before QMF fetch response on queue 2reply-queue; {create:always, node:{x-declare:{auto-delete:true}}}
2013-Apr-23 09:20:35.456709]  just after QMF fetch response: _method_response
2013-Apr-23 09:20:35.457935] Finalizing thread...
2013-Apr-23 09:20:35.618518] Done, press Ctrl+C to exit

- see the 3 seconds delay after "just before QMF fetch response on queue 2reply-queue".
- increasing the program parameter (fetch timeout) directly impacts the delay

Expected results:
No such delay.


Additional info:
tcpdump showed that the client gets the message quite early but it re-issue credit to the broker in order to attempt to fetch new messages.

traces of qpid client library showed second (child) thread running fetch in loop in fact gets the message, but enqueues it to internal buffer - see qpid/client/amqp0_10/IncomingMessages.cpp, method IncomingMessages::process and log:

QPID_LOG(debug, "Pushed " << *content->getMethod() << " to received queue");

The child thread did the enqueue properly, as it fetches for messages of different destination; the problem is that the parent thread fetching QMF response does not check the received queue before querying broker.

The concurrent fetching in two threads is essential for the reproducer. Moving "should_end.unlock(); thread.join();" before deleting the queue _prevents_ the delay at all.

Comment 1 Pavel Moravec 2013-04-23 10:42:24 UTC
Created attachment 738931 [details]
Reproducer

Comment 2 Justin Ross 2013-04-25 13:52:48 UTC
A jira that mentions fetch delays: https://issues.apache.org/jira/browse/QPID-4664

Comment 3 Gordon Sim 2013-04-29 14:19:51 UTC
Proposed fix available for review: https://reviews.apache.org/r/10839/.

Comment 4 Gordon Sim 2013-05-01 12:37:08 UTC
Fix committed upstream as http://svn.apache.org/viewvc?view=revision&revision=1477975

Comment 5 Frantisek Reznicek 2013-05-20 08:39:37 UTC
Created attachment 750469 [details]
reproducer for rhel5 (boost 1.33.1)

Comment 6 Frantisek Reznicek 2013-05-20 09:18:17 UTC
An testing update:
qpid-cpp-client-0.22-2.el6 comes with fix, but qpid-cpp-client-0.22-2.el5 does not.
Waiting for el5 packages with applied fix.

Comment 7 Frantisek Reznicek 2013-06-18 14:09:17 UTC
The issue has been fully fixed (long term test), tested on RHEL 5.9 / 6.4 i[36]86 / x86_64 on packages:

python-qpid-0.22-2.el5
python-qpid-proton-0.4-2.2.el5
python-qpid-proton-doc-0.4-2.2.el5
python-qpid-qmf-0.22-2.el5
qpid-cpp-client-0.22-4.el5
qpid-cpp-client-devel-0.22-4.el5
qpid-cpp-client-devel-docs-0.22-4.el5
qpid-cpp-client-rdma-0.22-4.el5
qpid-cpp-client-ssl-0.22-4.el5
qpid-cpp-mrg-debuginfo-0.22-4.el5
qpid-cpp-server-0.22-4.el5
qpid-cpp-server-devel-0.22-4.el5
qpid-cpp-server-ha-0.22-4.el5
qpid-cpp-server-rdma-0.22-4.el5
qpid-cpp-server-ssl-0.22-4.el5
qpid-cpp-server-store-0.22-4.el5
qpid-cpp-server-xml-0.22-4.el5
qpid-java-client-0.22-2.el5
qpid-java-common-0.22-2.el5
qpid-java-example-0.22-2.el5
qpid-jca-0.18-8.el5
qpid-jca-xarecovery-0.18-8.el5
qpid-jca-zip-0.18-8.el5
qpid-proton-c-0.4-2.2.el5
qpid-proton-c-devel-0.4-2.2.el5
qpid-proton-debuginfo-0.4-2.2.el5
qpid-qmf-0.22-2.el5
qpid-qmf-debuginfo-0.22-2.el5
qpid-qmf-devel-0.22-2.el5
qpid-tests-0.22-2.el5
qpid-tools-0.22-2.el5
rh-qpid-cpp-tests-0.22-4.el5
ruby-qpid-qmf-0.22-2.el5

-> VERIFIED

Comment 8 errata-xmlrpc 2014-09-24 15:07:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1296.html


Note You need to log in before you can comment on or make changes to this bug.