Bug 1093996
Summary: | Purging TTL expired messages blocks all other timers, causing connection drops | |||
---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Pavel Moravec <pmoravec> | |
Component: | qpid-cpp | Assignee: | Gordon Sim <gsim> | |
Status: | CLOSED ERRATA | QA Contact: | Michal Toth <mtoth> | |
Severity: | medium | Docs Contact: | ||
Priority: | high | |||
Version: | 3.0 | CC: | freznice, gsim, jross, kgiusti, mcressma, mtoth | |
Target Milestone: | 3.1 | Keywords: | TestCaseProvided | |
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | qpid-cpp-0.30-2 | Doc Type: | Bug Fix | |
Doc Text: |
It was discovered that purging expired messages could take a long time, which prevented the timer thread from servicing other periodic tasks--such as heartbeats--in a timely manner. The timer thread is now used only to trigger the purge, with the actual purge task now offloaded to one of the broker's worker threads. This change allows the timer thread to monitor time state correctly, and allow other jobs such as heartbeat reporting to function correctly.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1142833 (view as bug list) | Environment: | ||
Last Closed: | 2015-04-14 13:48:00 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1142833 |
Description
Pavel Moravec
2014-05-04 07:36:37 UTC
There is a significant performance regression between 0.18-20 and 0.22-38 in purgeExpired method, when traversing even transient messages without TTL. Reproducer: 1) have in qpidd.conf: log-to-file=/tmp/qpidd.log log-enable=trace+:purge log-enable=info+ log-hires-timestamp=yes queue-purge-interval=60 default-queue-limit=0 2) Provision 500 queue bound to amq.fanout: for i in $(seq 1 500); do qpid-receive -a "queue_${i}; {create:always, node:{type:queue, durable:false, x-bindings:[{ exchange:'amq.fanout', queue:'queue_${i}' }] }}" done 3) send there some messages: qpid-send -a "amq.fanout" -m 50000 --durable=yes --content-size=1024 4) Wait for first "Purged" message in log and stop broker then: purged=$(grep -c Purged /tmp/qpidd.log) while [ $(grep -c Purged /tmp/qpidd.log) -eq $purged ]; do echo "$(date): waiting until next purge happens" sleep 10 done service qpidd stop 5) Count time difference between 1st and last "Purged" log On my VM, purge check traversal takes 2.929318886s on 3.0, while just 0.965582676s on 2.4. That means three times more (!). (to see "Purged XYZ expired messages from .." on 0.18-20, let apply patch: diff -rup _qpid-0.18_orig/cpp/src/qpid/broker/Queue.cpp _qpid-0.18_new/cpp/src/qpid/broker/Queue.cpp --- _qpid-0.18_orig/cpp/src/qpid/broker/Queue.cpp 2014-03-22 13:33:27.706897195 +0100 +++ _qpid-0.18_new/cpp/src/qpid/broker/Queue.cpp 2014-04-29 09:10:41.994011625 +0200 @@ -658,6 +658,7 @@ void Queue::purgeExpired(sys::Duration l if (seconds == 0 || count / seconds < 1) { std::deque<QueuedMessage> dequeued; dequeueIf(boost::bind(&isExpired, _1), dequeued); + QPID_LOG(debug, "Purged " << dequeued.size() << " expired messages from " << getName()); if (dequeued.size()) { if (mgmtObject) { mgmtObject->inc_discardsTtl(dequeued.size()); there. Or ask me for rpms with the patch applied) Some further comparison: 0.18 0.22 perf.degradation 500q, 50kM 0.965582676 2.929318886 3.0337318169 100q, 50kM 0.169874585 0.528339322 3.110172849 100q, 100kM 0.555664606 1.088116653 1.9582255937 1q, 1mM 0.056135355 0.094856374 1.6897795338 1q, 2mM(e) 0.086394264 0.177540997 2.0550090802 "500q, 50kM" means 500 queues got 50k messages via fanout exchange, (e) means messages were empty. I dont see a big correlation among #queues, #messages or message length.. :-/ Fixed upstream: https://svn.apache.org/r1594220 *** Bug 666944 has been marked as a duplicate of this bug. *** Ran on rhel 6x,i and rhel7x using qpid-cpp-server-0.30-6.el6 qpid-cpp-server-0.30-6.el7.x86_64 Bigger purge does not disconnect quick heartbeat clients. Marking as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2015-0805.html |