+++ This bug was initially created as a clone of Bug #860691 +++ Description of problem: The ring queue policy utilizes a single deque to store messages in priority and FIFO order. When inserting messages of mixed priority into a queue with a large (>100,000) message backlog, the performance degrades quickly enough to potentially block all broker worker threads if they all are publishing to the same queue. Recommend separating the ring queue data structure into separate deques based upon message priority in order to avoid the potential for reshuffling the deque with each incoming message. Version-Release number of selected component (if applicable): Qpid 0.18 How reproducible: 100% Steps to Reproduce: 1. Enqueue >100,000 high-priority messages in a priority queue with a large ring size policy (we used 256 byte messages and a 1GB ring size) 2. Enqueue low-priority messages into the queue Actual results: Enqueue rate drastically decreases due to the need to shuffle messages around, in priority order, within a deque. Expected results: Enqueue rate does not decrease as dramatically --- Additional comment from Alan Conway on 2012-10-01 17:01:50 EDT --- Patch reviewed by Gordon Sim and myself. Merged to 0.18 branch http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=0.18-mrghttp://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=0.18-mrg --- Additional comment from Alan Conway on 2012-10-11 16:01:52 EDT --- --- Additional comment from Alan Conway on 2012-11-30 15:04:38 EST --- The commit link in comment 1 is incorrect. The correct link is: http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=0.18-mrg&id=c3a4dd856d8a82c4faaf50ac1a147b93334a0916 --- Additional comment from Stuart Auchterlonie on 2013-03-18 18:23:10 EDT --- This indirectly fixes an issue if the queue was empty in the following bit of code qpid/cpp/src/qpid/broker/QueuePolicy.cpp bool RingQueuePolicy::checkLimit(boost::intrusive_ptr<Message> m) { .... do { QueuedMessage oldest = queue.front(); if (oldest.queue->acquireMessageAt(oldest.position, oldest) || !strict) { ... } If oldest is empty, then qpidd segfaults with the following Core was generated by `qpidd --config qpidd2.conf'. Program terminated with signal 11, Segmentation fault. #0 0x00000031c9209220 in pthread_mutex_lock () from /lib64/libpthread.so.0 (gdb) bt full #0 0x00000031c9209220 in pthread_mutex_lock () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007f5680cce94a in qpid::sys::Mutex::lock (this=<value optimized out>) at ../include/qpid/sys/posix/Mutex.h:116 e = <value optimized out> #2 0x00007f5680d58022 in ScopedLock (this=0x0, position=..., message=...) at ../include/qpid/sys/Mutex.h:33 No locals. #3 qpid::broker::Queue::acquireMessageAt (this=0x0, position=..., message=...) at qpid/broker/Queue.cpp:235 locker = {mutex = @0x1a8} __PRETTY_FUNCTION__ = "bool qpid::broker::Queue::acquireMessageAt(const qpid::framing::SequenceNumber&, qpid::broker::QueuedMessage&)" #4 0x00007f5680d706b6 in qpid::broker::RingQueuePolicy::checkLimit (this=0x222f510, m=...) at qpid/broker/QueuePolicy.cpp:272 .... (gdb) up #4 0x00007f5680d706b6 in qpid::broker::RingQueuePolicy::checkLimit (this=0x222f510, m=...) at qpid/broker/QueuePolicy.cpp:272 272 if (oldest.queue->acquireMessageAt(oldest.position, oldest) || !strict) { (gdb) print oldest $2 = {payload = {px = 0x0}, position = {<boost::equality_comparable<qpid::framing::SequenceNumber, boost::less_than_comparable<qpid::framing::SequenceNumber, boost::incrementable<qpid::framing::SequenceNumber, boost::decrementable<qpid::framing::SequenceNumber, boost::detail::empty_base<qpid::framing::SequenceNumber> > >, boost::detail::empty_base<qpid::framing::SequenceNumber>, boost::detail::true_t>, boost::detail::empty_base<qpid::framing::SequenceNumber>, boost::detail::true_t>> = {<boost::equality_comparable1<qpid::framing::SequenceNumber, boost::less_than_comparable<qpid::framing::SequenceNumber, boost::incrementable<qpid::framing::SequenceNumber, boost::decrementable<qpid::framing::SequenceNumber, boost::detail::empty_base<qpid::framing::SequenceNumber> > >, boost::detail::empty_base<qpid::framing::SequenceNumber>, boost::detail::true_t> >> = {<boost::less_than_comparable<qpid::framing::SequenceNumber, boost::incrementable<qpid::framing::SequenceNumber, boost::decrementable<qpid::framing::SequenceNumber, boost::detail::empty_base<qpid::framing::SequenceNumber> > >, boost::detail::empty_base<qpid::framing::SequenceNumber>, boost::detail::true_t>> = {<boost::less_than_comparable1<qpid::framing::SequenceNumber, boost::incrementable<qpid::framing::SequenceNumber, boost::decrementable<qpid::framing::SequenceNumber, boost::detail::empty_base<qpid::framing::SequenceNumber> > > >> = {<boost::incrementable<qpid::framing::SequenceNumber, boost::decrementable<qpid::framing::SequenceNumber, boost::detail::empty_base<qpid::framing::SequenceNumber> > >> = {<boost::decrementable<qpid::framing::SequenceNumber, boost::detail::empty_base<qpid::framing::SequenceNumber> >> = {<boost::detail::empty_base<qpid::framing::SequenceNumber>> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, value = 0}, queue = 0x0} Tested against MRG-2.3 and it handles this situation fine.
================= Relationship with Previous Bug ========================= This bug was originally reported against 0.18 packages, but the queue code has been substantially re-written since then. It seems likely that this is a related bug in the sense that maybe a similar design decision led to similar behavior -- but I can't call it the "same" bug since it's different code. Here is the program I used to test 3.0 code with: =================== C O D E ============================== #include <sys/types.h> #include <stdlib.h> #include <time.h> #include <sys/time.h> #include <pthread.h> #include <iostream> #include <qpid/messaging/Connection.h> #include <qpid/messaging/Message.h> #include <qpid/messaging/Receiver.h> #include <qpid/messaging/Sender.h> #include <qpid/messaging/Session.h> #include <qpid/messaging/Address.h> #include <qpid/types/Variant.h> using namespace qpid::messaging; using namespace qpid::types; int main ( ) { FILE * fp = fopen ( "./output", "w" ); int messages_per_loop = 10000; struct timeval loop_start_time, loop_stop_time, loop_duration; std::string broker = "localhost:5801"; Message m = Message("This is a message with exactly 50 bytes in it!!!!!"); std::string connectionOptions; Connection connection ( broker, connectionOptions ); connection.open(); Session session = connection.createSession(); Sender sender = session.createSender("my_queue;{create:always,node:{type:queue,x-declare:{arguments:{qpid.max_count:1000000,qpid.policy_type:ring,x-qpid-priorities:10}}}}"); for ( int j = 0; j < 1000; ++ j ) { gettimeofday ( & loop_start_time, 0 ); for ( int i = 0; i < messages_per_loop; ++ i ) { m.setPriority ( i % 10 ); sender.send(m); } gettimeofday ( & loop_stop_time, 0 ); timersub ( & loop_stop_time, & loop_start_time, & loop_duration ); fprintf ( fp, "loop %d duration: %d.%.6d for %d messages.\n", j, loop_duration.tv_sec, loop_duration.tv_usec, messages_per_loop ); fprintf ( stderr, "loop %d duration: %d.%.6d for %d messages.\n", j, loop_duration.tv_sec, loop_duration.tv_usec, messages_per_loop ); } session.close(); connection.close(); fprintf ( stderr, "main done.\n" ); fclose ( fp ); return 0; } ======================== B U I L D S C R I P T ======================== #! /bin/bash # build against installed packages clear echo "=============================================================" echo " making x860691 " echo "=============================================================" g++ -g -O2 \ -lqpidmessaging \ -lpthread \ x860691.cpp \ -o x860691 echo "Done." ================= R U N S C R I P T =================================== #! /bin/bash # This starts the broker, and then runs the program. pkill qpidd sleep 2 QPIDD=qpidd rm -rf /tmp/mick mkdir /tmp/mick $QPIDD \ --no-module-dir \ -p 5801 \ --data-dir /tmp/mick/data_1 \ --auth=no \ --mgmt-enable=yes \ --log-enable info+ \ --log-to-file /tmp/mick/qpidd_1.log \ --log-source yes \ -d echo "started broker from $QPIDD " sleep 3 echo "------------- starting x860691 -------------" ./x860691 echo "------------- x860691 done ----------------" =============== R E S U L T S ================================== # Times are in seconds. # Look what happens at loop 111. loop 0 duration: 0.252403 for 10000 messages. loop 1 duration: 0.252310 for 10000 messages. loop 2 duration: 0.257319 for 10000 messages. loop 3 duration: 0.255952 for 10000 messages. loop 4 duration: 0.253862 for 10000 messages. loop 5 duration: 0.252477 for 10000 messages. loop 6 duration: 0.253371 for 10000 messages. loop 7 duration: 0.249907 for 10000 messages. loop 8 duration: 0.248323 for 10000 messages. loop 9 duration: 0.251017 for 10000 messages. loop 10 duration: 0.227576 for 10000 messages. loop 11 duration: 0.236354 for 10000 messages. loop 12 duration: 0.230183 for 10000 messages. loop 13 duration: 0.238863 for 10000 messages. loop 14 duration: 0.216682 for 10000 messages. loop 15 duration: 0.238783 for 10000 messages. loop 16 duration: 0.239381 for 10000 messages. loop 17 duration: 0.257949 for 10000 messages. loop 18 duration: 0.258311 for 10000 messages. loop 19 duration: 0.257791 for 10000 messages. loop 20 duration: 0.257983 for 10000 messages. loop 21 duration: 0.238887 for 10000 messages. loop 22 duration: 0.238654 for 10000 messages. loop 23 duration: 0.238637 for 10000 messages. loop 24 duration: 0.257613 for 10000 messages. loop 25 duration: 0.257867 for 10000 messages. loop 26 duration: 0.258348 for 10000 messages. loop 27 duration: 0.237449 for 10000 messages. loop 28 duration: 0.238628 for 10000 messages. loop 29 duration: 0.238441 for 10000 messages. loop 30 duration: 0.243115 for 10000 messages. loop 31 duration: 0.256069 for 10000 messages. loop 32 duration: 0.258815 for 10000 messages. loop 33 duration: 0.257699 for 10000 messages. loop 34 duration: 0.241550 for 10000 messages. loop 35 duration: 0.232738 for 10000 messages. loop 36 duration: 0.231728 for 10000 messages. loop 37 duration: 0.231604 for 10000 messages. loop 38 duration: 0.259751 for 10000 messages. loop 39 duration: 0.254752 for 10000 messages. loop 40 duration: 0.251621 for 10000 messages. loop 41 duration: 0.252169 for 10000 messages. loop 42 duration: 0.259258 for 10000 messages. loop 43 duration: 0.279428 for 10000 messages. loop 44 duration: 0.282687 for 10000 messages. loop 45 duration: 0.257281 for 10000 messages. loop 46 duration: 0.225576 for 10000 messages. loop 47 duration: 0.234707 for 10000 messages. loop 48 duration: 0.255344 for 10000 messages. loop 49 duration: 0.257359 for 10000 messages. loop 50 duration: 0.257119 for 10000 messages. loop 51 duration: 0.257096 for 10000 messages. loop 52 duration: 0.240745 for 10000 messages. loop 53 duration: 0.249403 for 10000 messages. loop 54 duration: 0.245374 for 10000 messages. loop 55 duration: 0.251870 for 10000 messages. loop 56 duration: 0.227380 for 10000 messages. loop 57 duration: 0.245182 for 10000 messages. loop 58 duration: 0.246713 for 10000 messages. loop 59 duration: 0.254202 for 10000 messages. loop 60 duration: 0.256685 for 10000 messages. loop 61 duration: 0.249379 for 10000 messages. loop 62 duration: 0.257835 for 10000 messages. loop 63 duration: 0.244097 for 10000 messages. loop 64 duration: 0.246543 for 10000 messages. loop 65 duration: 0.246495 for 10000 messages. loop 66 duration: 0.249692 for 10000 messages. loop 67 duration: 0.250502 for 10000 messages. loop 68 duration: 0.249498 for 10000 messages. loop 69 duration: 0.250803 for 10000 messages. loop 70 duration: 0.248176 for 10000 messages. loop 71 duration: 0.253430 for 10000 messages. loop 72 duration: 0.250871 for 10000 messages. loop 73 duration: 0.256095 for 10000 messages. loop 74 duration: 0.253547 for 10000 messages. loop 75 duration: 0.252046 for 10000 messages. loop 76 duration: 0.246829 for 10000 messages. loop 77 duration: 0.246333 for 10000 messages. loop 78 duration: 0.227229 for 10000 messages. loop 79 duration: 0.235784 for 10000 messages. loop 80 duration: 0.236032 for 10000 messages. loop 81 duration: 0.229520 for 10000 messages. loop 82 duration: 0.237923 for 10000 messages. loop 83 duration: 0.246849 for 10000 messages. loop 84 duration: 0.240170 for 10000 messages. loop 85 duration: 0.246845 for 10000 messages. loop 86 duration: 0.252142 for 10000 messages. loop 87 duration: 0.248607 for 10000 messages. loop 88 duration: 0.235150 for 10000 messages. loop 89 duration: 0.236395 for 10000 messages. loop 90 duration: 0.231915 for 10000 messages. loop 91 duration: 0.238275 for 10000 messages. loop 92 duration: 0.237994 for 10000 messages. loop 93 duration: 0.238298 for 10000 messages. loop 94 duration: 0.237700 for 10000 messages. loop 95 duration: 0.238305 for 10000 messages. loop 96 duration: 0.241294 for 10000 messages. loop 97 duration: 0.255675 for 10000 messages. loop 98 duration: 0.256984 for 10000 messages. loop 99 duration: 0.219855 for 10000 messages. loop 100 duration: 0.271347 for 10000 messages. loop 101 duration: 0.276917 for 10000 messages. loop 102 duration: 0.262516 for 10000 messages. loop 103 duration: 0.263811 for 10000 messages. loop 104 duration: 0.268409 for 10000 messages. loop 105 duration: 0.284156 for 10000 messages. loop 106 duration: 0.308755 for 10000 messages. loop 107 duration: 0.303953 for 10000 messages. loop 108 duration: 0.316699 for 10000 messages. loop 109 duration: 0.309927 for 10000 messages. loop 110 duration: 0.311368 for 10000 messages. loop 111 duration: 637.728991 for 10000 messages. loop 112 duration: 714.973189 for 10000 messages.
Oh, and, by the bye, if you comment out this line in my code: m.setPriority ( i % 10 ); and then re-compile and re-test ... nothing bad happens. It runs to completion ( 1000 loops ) with no slowdown.
I can reproduce the problem as described by mick, am investigating the cause.
Created attachment 815467 [details] Faster version of the reproducer This is a faster version of the reproducer. The suddenness of the slow-down is noteworthy. The first 110 iterations take less than half a second, the 111th takes over 10 *minutes*.
Fixed on branch http://git.app.eng.bos.redhat.com/rh-qpid.git/log/?h=0.22-mrg-aconway-996249
I'm currently unable to reproduce the failing scenario. Tried it on qpid-cpp-*-0.22-24 and also with 0.22-29(latest), could you please check the reproducer, and also please failing version? NEEDINFO(mick)
I also can no longer reproduce this -- using these packages: qpid-qmf-0.22-24.el6.x86_64 qpid-cpp-client-devel-0.22-29.el6.x86_64 qpid-proton-c-0.5-9.el6.x86_64 qpid-cpp-server-0.22-29.el6.x86_64 python-qpid-0.22-8.el6.noarch qpid-tools-0.22-7.el6.noarch qpid-cpp-server-ha-0.22-29.el6.x86_64 qpid-cpp-client-0.22-29.el6.x86_64 python-qpid-qmf-0.22-24.el6.x86_64 It is now behaving like I saw it after the fix. I'm afraid when I said (in Comment 1) that I used "3.0 code" -- what I meant was *development* code that was going to become 3.0 . The 0.22-24 packages that I can find now no longer have the bug in them. It was 100% reproducible. Sorry for the sloppy information. I was young and fooling last August. Now I am old, and wise.
Since it is not reproducible, I propose CLOSE NOTABUG. NEEDINFO(jross)
Thanks, Mick and Zdenek.