Description of problem: This problem was first encountered on the grid0 test bed. Whenever the queue-purge process ran on the broker, QMF agents would time out due to loss of heartbeats. On further investigation, a "stuck" console was found: this console had bound to the qmf topic exchanges, but was not retrieving messages. This caused its queues to back up on the broker, creating a large number of expired messages for the queue-purge process to clean up. When the queue-purge process ran, the broker would become CPU bound for over 30 seconds, which cased the erroneous agent timeouts. Version-Release number of selected component (if applicable): qpid-cpp-server-0.7.946106-25.el5 How reproducible: Difficult - requires a large scale deployment, and some means for creating a large number of timed-out messages (expired ttls). Steps to Reproduce: 1. Run a broker, and provide a constant traffic load. 2. Create a number of queues, fill them with messages with a short ttl (say 10 secs). 3. Observe impact of queue-purge process on traffic flow. Actual results: Expected results: Additional info:
See bz 603896 for an earlier issue in the same vein.
Created attachment 471549 [details] testcase - loads up a set of queues with msgs with short TTLs.
The queue purging and management processing happen on the same timer thread (is periodic processing in the brokers management agent connected with heartbeats at all?). Long running tasks will delay other periodic tasks. Additionally while a specific queue is being purged, this will hold up any enqueue or dequeue attempts. This could be addressed by a finer grained scheme for locking.
*** Bug 521292 has been marked as a duplicate of this bug. ***
*** This bug has been marked as a duplicate of bug 1093996 ***