Bug 867030

Summary: HA throughput issues during longevity testing
Product: Red Hat Enterprise MRG Reporter: Jason Dillaman <jdillama>
Component: qpid-cppAssignee: Alan Conway <aconway>
Status: CLOSED CURRENTRELEASE QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: high    
Version: DevelopmentCC: aconway, esammons, iboverma, jross, lzhaldyb, mcressma
Target Milestone: 2.3Keywords: OtherQA
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-cpp-0.18-4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-19 16:39:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 698367    
Attachments:
Description Flags
Proposed patch none

Description Jason Dillaman 2012-10-16 15:10:26 UTC
Description of problem:
During a high-throughput longevity test of HA, several extended periods of throughput drop-outs were recorded.  

Version-Release number of selected component (if applicable):
Qpid 0.18

How reproducible:
Frequently

Steps to Reproduce:
1. Bi-directionally federate a chain of several HA brokers w/ acks enabled
2. Have client apps pull messages from a queue tied to each federation bridges' destination exchange into the bridge queue of the next federation hop
3. Utilize ring queue policies on all queues, disable producer flow control and queue threshold events
5. Inject messages at high-throughput concurrently from both sides of the broker chain 
  
Actual results:
Witnessed multiple multi-minute periods where throughput in the system dropped to zero messages / sec

Expected results:
The throughput of the system remains consistent

Comment 1 Jason Dillaman 2012-10-16 15:12:42 UTC
Created attachment 628211 [details]
Proposed patch

Limit the window size for the HA backup bridge's queue subscription.  Improve the performance of the ring queue policy from O(n) to O(1).  Reduce lock contention on Queue::messageLock caused by Queue::UsageBarrier.

Comment 4 Alan Conway 2012-10-18 18:20:03 UTC
Posted a fix on mrg repo, 4 commits on branch aconway-ha-2, branched off 0.18-mrg:

http://mrg1.lab.bos.redhat.com/cgit/qpid.git/log/?h=aconway-ha-2

082dd81 Bug 867030 - QPID-4374: Use map instead of SequenceSet for QueueGuard::delayed.
acb459f Bug 867030 - QPID-4374: Use configurable credit window for HA backup subscriptions.
74ae16d Bug 867030 - QPID-4374: Improve performance of ring queue policy index.
5fc42e9 Bug 867030 - QPID-4374: Reduce contention on Queue::messageLock

Porting to trunk has caused a regression there, will post to trunk as soon as its fixed.

Comment 5 Alan Conway 2012-10-18 19:43:41 UTC
Fixed on trunk by the following 2 commits:

r1399814 | Bug 867030 - QPID-4374: Make QueueGuard::cancel idempotent (Jason Dillaman)
r1399813 | Bug 867030 - QPID-4374: Use configurable credit window for HA backup subscriptions (Jason Dillaman)
r1399812 | Bug 867030 - QPID-4374: Reduce contention on Queue::messageLock (Jason Dillaman)