Bug 860691
Summary: | Poor performance when a priority queue with a ring queue policy has a large backlog | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Jason Dillaman <jdillama> | ||||
Component: | qpid-cpp | Assignee: | Alan Conway <aconway> | ||||
Status: | CLOSED ERRATA | QA Contact: | Matej Lesko <mlesko> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | high | ||||||
Version: | Development | CC: | aconway, esammons, iboverma, jross, lzhaldyb, mcressma, mgoulish, pmoravec, sauchter, zkraus | ||||
Target Milestone: | 2.5.5 | Keywords: | Patch | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | qpid-cpp-0.18-26 | Doc Type: | Bug Fix | ||||
Doc Text: |
Poor enqueueing performance was discovered when a priority queue with a ring queue policy had a large message backlog. Performance degraded quickly when messages of mixed priority were inserted into a queue with a large (more than 100,000) message backlog. The fix ensures disordered messages acquired by a consumer as part of a cluster update, but not yet acknowledged at the time a new member joins the cluster, are reordered. During normal operation, new messages are appended directly to the back of the priority queues because the order of arrival is known. Enqueueing performance is improved for priority ring queues containing large numbers of messages with mixed priorities.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 996249 (view as bug list) | Environment: | |||||
Last Closed: | 2014-10-22 16:34:07 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 698367, 996249, 1140368, 1140369 | ||||||
Attachments: |
|
Description
Jason Dillaman
2012-09-26 13:53:51 UTC
Patch reviewed by Gordon Sim and myself. Merged to 0.18 branch http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=0.18-mrghttp://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=0.18-mrg *** Bug 860415 has been marked as a duplicate of this bug. *** The commit link in comment 1 is incorrect. The correct link is: http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=0.18-mrg&id=c3a4dd856d8a82c4faaf50ac1a147b93334a0916 This indirectly fixes an issue if the queue was empty in the following bit of code qpid/cpp/src/qpid/broker/QueuePolicy.cpp bool RingQueuePolicy::checkLimit(boost::intrusive_ptr<Message> m) { .... do { QueuedMessage oldest = queue.front(); if (oldest.queue->acquireMessageAt(oldest.position, oldest) || !strict) { ... } If oldest is empty, then qpidd segfaults with the following Core was generated by `qpidd --config qpidd2.conf'. Program terminated with signal 11, Segmentation fault. #0 0x00000031c9209220 in pthread_mutex_lock () from /lib64/libpthread.so.0 (gdb) bt full #0 0x00000031c9209220 in pthread_mutex_lock () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007f5680cce94a in qpid::sys::Mutex::lock (this=<value optimized out>) at ../include/qpid/sys/posix/Mutex.h:116 e = <value optimized out> #2 0x00007f5680d58022 in ScopedLock (this=0x0, position=..., message=...) at ../include/qpid/sys/Mutex.h:33 No locals. #3 qpid::broker::Queue::acquireMessageAt (this=0x0, position=..., message=...) at qpid/broker/Queue.cpp:235 locker = {mutex = @0x1a8} __PRETTY_FUNCTION__ = "bool qpid::broker::Queue::acquireMessageAt(const qpid::framing::SequenceNumber&, qpid::broker::QueuedMessage&)" #4 0x00007f5680d706b6 in qpid::broker::RingQueuePolicy::checkLimit (this=0x222f510, m=...) at qpid/broker/QueuePolicy.cpp:272 .... (gdb) up #4 0x00007f5680d706b6 in qpid::broker::RingQueuePolicy::checkLimit (this=0x222f510, m=...) at qpid/broker/QueuePolicy.cpp:272 272 if (oldest.queue->acquireMessageAt(oldest.position, oldest) || !strict) { (gdb) print oldest $2 = {payload = {px = 0x0}, position = {<boost::equality_comparable<qpid::framing::SequenceNumber, boost::less_than_comparable<qpid::framing::SequenceNumber, boost::incrementable<qpid::framing::SequenceNumber, boost::decrementable<qpid::framing::SequenceNumber, boost::detail::empty_base<qpid::framing::SequenceNumber> > >, boost::detail::empty_base<qpid::framing::SequenceNumber>, boost::detail::true_t>, boost::detail::empty_base<qpid::framing::SequenceNumber>, boost::detail::true_t>> = {<boost::equality_comparable1<qpid::framing::SequenceNumber, boost::less_than_comparable<qpid::framing::SequenceNumber, boost::incrementable<qpid::framing::SequenceNumber, boost::decrementable<qpid::framing::SequenceNumber, boost::detail::empty_base<qpid::framing::SequenceNumber> > >, boost::detail::empty_base<qpid::framing::SequenceNumber>, boost::detail::true_t> >> = {<boost::less_than_comparable<qpid::framing::SequenceNumber, boost::incrementable<qpid::framing::SequenceNumber, boost::decrementable<qpid::framing::SequenceNumber, boost::detail::empty_base<qpid::framing::SequenceNumber> > >, boost::detail::empty_base<qpid::framing::SequenceNumber>, boost::detail::true_t>> = {<boost::less_than_comparable1<qpid::framing::SequenceNumber, boost::incrementable<qpid::framing::SequenceNumber, boost::decrementable<qpid::framing::SequenceNumber, boost::detail::empty_base<qpid::framing::SequenceNumber> > > >> = {<boost::incrementable<qpid::framing::SequenceNumber, boost::decrementable<qpid::framing::SequenceNumber, boost::detail::empty_base<qpid::framing::SequenceNumber> > >> = {<boost::decrementable<qpid::framing::SequenceNumber, boost::detail::empty_base<qpid::framing::SequenceNumber> >> = {<boost::detail::empty_base<qpid::framing::SequenceNumber>> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, value = 0}, queue = 0x0} Tested against MRG-2.3 and it handles this situation fine. There was a lot of refactoring on the ring queue, 0.18 was too far behind to backport the same fix from trunk and 0.22. It had a similar problem but I had to fix it differently in the 0.18 context. I should have put a note to that effect, will do next time. Created attachment 910801 [details] Results of Mick and Pavel's reproducers before/after the 0.18-mrg fix. I ran reproducer https://bugzilla.redhat.com/attachment.cgi?id=815467 from Bug 996249 and pmoravec's reprodcuer (below) before and after the fix commit: 0.18-mrg: c3a4dd8 Bug 860691 - QPID-4287: Poor performance when a priority queue with a ring queue policy has a large backlog The fix does indeed seem to be good (detailed results attached) However I also ran the tests at the tip of the 0.18-mrg branch, and there appears to be a regression sometime after the fix. The problem is back, although less severe :( I'm trying to find the guilty commit now. The regression was deliberate: 6240de58d1ae21ac8539b702e36f00a7cecebeb8 is the first bad commit commit 6240de58d1ae21ac8539b702e36f00a7cecebeb8 Author: Alan Conway <aconway> Date: Mon Dec 3 14:01:04 2012 -0500 Bug 882373 - Cluster ring queue unit test is failing. Regression introduced by: Bug 860691 - QPID-4287: Poor performance when a priority queue with a ring queue policy has a large backlog" Restored sorting of priority queues by message position in RingQueuePolicy::enqueued It may be possible to fix both on the 0.18 branch, I can revisit if necessary. Fixed by http://git.app.eng.bos.redhat.com/git/rh-qpid.git/commit/?h=0.18-mrg-bz996249&id=b5ece38d1d2f258c3b25a70b71b0e2b75dc1269c commit b5ece38d1d2f258c3b25a70b71b0e2b75dc1269c Author: Alan Conway <aconway> Commit: Alan Conway <aconway> Bug 996249 - Poor performance when a priority queue with a ring queue policy has a large backlog This bug has a history on the 0.18-mrg branch branch. First the performance problem was fixed by removing a binary search to insert messages in order by position: c3a4dd8 Bug 860691 - QPID-4287: Poor performance when a priority queue with a ring queue policy has a large backlog However that caused a regression, a problem with cluster updates, and was reverted by: 6240de5 Bug 882373 - Cluster ring queue unit test is failing. This current commit fixes the performance problem and the cluster regression. When inserting acquired messages as part of a cluster update, we do the search to put them in order, since they will be out of order. This is only for messages that were acquired by a consumer but not yet acknowledged at the time a new member joins the cluster, so it should be a small set. During normal operation we push new messages directly to the back of the priority queues since we know they are in order of arrival. (In reply to Alan Conway from comment #13) > Fixed by > http://git.app.eng.bos.redhat.com/git/rh-qpid.git/commit/?h=0.18-mrg- > bz996249&id=b5ece38d1d2f258c3b25a70b71b0e2b75dc1269c > > commit b5ece38d1d2f258c3b25a70b71b0e2b75dc1269c > Author: Alan Conway <aconway> > Commit: Alan Conway <aconway> > > Bug 996249 - Poor performance when a priority queue with a ring queue > policy has a large backlog > > This bug has a history on the 0.18-mrg branch branch. First the > performance problem was fixed by > removing a binary search to insert messages in order by position: > > c3a4dd8 Bug 860691 - QPID-4287: Poor performance when a priority queue > with a ring queue policy has a large backlog > > However that caused a regression, a problem with cluster updates, and > was reverted by: > > 6240de5 Bug 882373 - Cluster ring queue unit test is failing. > > This current commit fixes the performance problem and the cluster > regression. > > When inserting acquired messages as part of a cluster update, we do the > search > to put them in order, since they will be out of order. This is only for > messages > that were acquired by a consumer but not yet acknowledged at the time a > new > member joins the cluster, so it should be a small set. > > During normal operation we push new messages directly to the back of the > priority queues since we know they are in order of arrival. FYI I was curious how it improves performance and the results from my tests are very promising. I built 0.18-25 locally without and with Alan's patch and run mine reproducer against that broker. Patched broker was 3 times(!) faster. (40seconds vs. 120seconds). Great work! /me did some comparison: 1st test case: qpid-config add queue q -b train2 --limit-policy=ring --max-queue-count=50000 --flow-stop-count=0 --flow-stop-size=0 qpid-send -a q -m 200000 -b train2 --content-size=128 0.18-25 took 343seconds, 0.18-26 took 12seconds 2nd test case: qpid-config add queue q -b train2 --limit-policy=ring --max-queue-count=50000 --flow-stop-count=0 --flow-stop-size=0 for i in $(seq 1 10); do qpid-send -a q -m 25000 -b train2 --content-size=128 & done qpid-receive -a q -m 100000 -b train2 --print-content=no -f & while [ $(pgrep qpid-send | wc -w) -gt 0 ]; do sleep 1 done 0.18-25 took 500seconds, 0.18-26 took 19seconds 3rd test case: qpid-config add queue q -b train2 --limit-policy=ring --max-queue-count=50000 --flow-stop-count=0 --flow-stop-size=0 --argument=x-qpid-priorities=10 qpid-send -a q -m 200000 -b train2 --content-size=128 0.18-25 took 336seconds, 0.18-26 took 13seconds 4th test case: qpid-config add queue q -b train2 --limit-policy=ring --max-queue-count=50000 --flow-stop-count=0 --flow-stop-size=0 --argument=x-qpid-priorities=10 for i in $(seq 1 10); do qpid-send -a q -m 25000 -b train2 --content-size=128 --priority=$i & done qpid-receive -a q -m 100000 -b train2 --print-content=no -f & while [ $(pgrep qpid-send | wc -w) -gt 0 ]; do sleep 1 done lead to broker segfault (BZ to be raised) on either broker version. Simply, 0.18-26 provides really significantly better performance. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2014-1682.html |