Bug 867030 - HA throughput issues during longevity testing
Summary: HA throughput issues during longevity testing
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: Development
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: 2.3
: ---
Assignee: Alan Conway
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks: 698367
TreeView+ depends on / blocked
 
Reported: 2012-10-16 15:10 UTC by Jason Dillaman
Modified: 2013-03-19 16:39 UTC (History)
6 users (show)

Fixed In Version: qpid-cpp-0.18-4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-19 16:39:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Proposed patch (6.87 KB, patch)
2012-10-16 15:12 UTC, Jason Dillaman
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Apache JIRA QPID-4374 0 None None None 2012-10-16 15:35:51 UTC

Description Jason Dillaman 2012-10-16 15:10:26 UTC
Description of problem:
During a high-throughput longevity test of HA, several extended periods of throughput drop-outs were recorded.  

Version-Release number of selected component (if applicable):
Qpid 0.18

How reproducible:
Frequently

Steps to Reproduce:
1. Bi-directionally federate a chain of several HA brokers w/ acks enabled
2. Have client apps pull messages from a queue tied to each federation bridges' destination exchange into the bridge queue of the next federation hop
3. Utilize ring queue policies on all queues, disable producer flow control and queue threshold events
5. Inject messages at high-throughput concurrently from both sides of the broker chain 
  
Actual results:
Witnessed multiple multi-minute periods where throughput in the system dropped to zero messages / sec

Expected results:
The throughput of the system remains consistent

Comment 1 Jason Dillaman 2012-10-16 15:12:42 UTC
Created attachment 628211 [details]
Proposed patch

Limit the window size for the HA backup bridge's queue subscription.  Improve the performance of the ring queue policy from O(n) to O(1).  Reduce lock contention on Queue::messageLock caused by Queue::UsageBarrier.

Comment 4 Alan Conway 2012-10-18 18:20:03 UTC
Posted a fix on mrg repo, 4 commits on branch aconway-ha-2, branched off 0.18-mrg:

http://mrg1.lab.bos.redhat.com/cgit/qpid.git/log/?h=aconway-ha-2

082dd81 Bug 867030 - QPID-4374: Use map instead of SequenceSet for QueueGuard::delayed.
acb459f Bug 867030 - QPID-4374: Use configurable credit window for HA backup subscriptions.
74ae16d Bug 867030 - QPID-4374: Improve performance of ring queue policy index.
5fc42e9 Bug 867030 - QPID-4374: Reduce contention on Queue::messageLock

Porting to trunk has caused a regression there, will post to trunk as soon as its fixed.

Comment 5 Alan Conway 2012-10-18 19:43:41 UTC
Fixed on trunk by the following 2 commits:

r1399814 | Bug 867030 - QPID-4374: Make QueueGuard::cancel idempotent (Jason Dillaman)
r1399813 | Bug 867030 - QPID-4374: Use configurable credit window for HA backup subscriptions (Jason Dillaman)
r1399812 | Bug 867030 - QPID-4374: Reduce contention on Queue::messageLock (Jason Dillaman)


Note You need to log in before you can comment on or make changes to this bug.