Bug 867030 - HA throughput issues during longevity testing
HA throughput issues during longevity testing
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
Development
Unspecified Unspecified
high Severity high
: 2.3
: ---
Assigned To: Alan Conway
MRG Quality Engineering
: OtherQA
Depends On:
Blocks: 698367
  Show dependency treegraph
 
Reported: 2012-10-16 11:10 EDT by Jason Dillaman
Modified: 2013-03-19 12:39 EDT (History)
6 users (show)

See Also:
Fixed In Version: qpid-cpp-0.18-4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-03-19 12:39:20 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Proposed patch (6.87 KB, patch)
2012-10-16 11:12 EDT, Jason Dillaman
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Apache JIRA QPID-4374 None None None 2012-10-16 11:35:51 EDT

  None (edit)
Description Jason Dillaman 2012-10-16 11:10:26 EDT
Description of problem:
During a high-throughput longevity test of HA, several extended periods of throughput drop-outs were recorded.  

Version-Release number of selected component (if applicable):
Qpid 0.18

How reproducible:
Frequently

Steps to Reproduce:
1. Bi-directionally federate a chain of several HA brokers w/ acks enabled
2. Have client apps pull messages from a queue tied to each federation bridges' destination exchange into the bridge queue of the next federation hop
3. Utilize ring queue policies on all queues, disable producer flow control and queue threshold events
5. Inject messages at high-throughput concurrently from both sides of the broker chain 
  
Actual results:
Witnessed multiple multi-minute periods where throughput in the system dropped to zero messages / sec

Expected results:
The throughput of the system remains consistent
Comment 1 Jason Dillaman 2012-10-16 11:12:42 EDT
Created attachment 628211 [details]
Proposed patch

Limit the window size for the HA backup bridge's queue subscription.  Improve the performance of the ring queue policy from O(n) to O(1).  Reduce lock contention on Queue::messageLock caused by Queue::UsageBarrier.
Comment 4 Alan Conway 2012-10-18 14:20:03 EDT
Posted a fix on mrg repo, 4 commits on branch aconway-ha-2, branched off 0.18-mrg:

http://mrg1.lab.bos.redhat.com/cgit/qpid.git/log/?h=aconway-ha-2

082dd81 Bug 867030 - QPID-4374: Use map instead of SequenceSet for QueueGuard::delayed.
acb459f Bug 867030 - QPID-4374: Use configurable credit window for HA backup subscriptions.
74ae16d Bug 867030 - QPID-4374: Improve performance of ring queue policy index.
5fc42e9 Bug 867030 - QPID-4374: Reduce contention on Queue::messageLock

Porting to trunk has caused a regression there, will post to trunk as soon as its fixed.
Comment 5 Alan Conway 2012-10-18 15:43:41 EDT
Fixed on trunk by the following 2 commits:

r1399814 | Bug 867030 - QPID-4374: Make QueueGuard::cancel idempotent (Jason Dillaman)
r1399813 | Bug 867030 - QPID-4374: Use configurable credit window for HA backup subscriptions (Jason Dillaman)
r1399812 | Bug 867030 - QPID-4374: Reduce contention on Queue::messageLock (Jason Dillaman)

Note You need to log in before you can comment on or make changes to this bug.