Bug 700822

Summary: Producer flow control needs to be enabled for clustering.
Product: Red Hat Enterprise MRG Reporter: Ken Giusti <kgiusti>
Component: qpid-cppAssignee: Ken Giusti <kgiusti>
Status: CLOSED ERRATA QA Contact: Frantisek Reznicek <freznice>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.0CC: aconway, esammons, iboverma, jneedle, jross, ppecka, tross
Target Milestone: 2.0   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-cpp-mrg-0.10-6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-23 15:43:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Ken Giusti 2011-04-29 13:44:06 UTC
Description of problem:

Producer flow control is part of the qpid 0.10 release, however it was disabled when used in a clustered broker as the work needed to get replication working did not make the 0.10 release window.

A patch that enables producer flow control in the clustered environment has been made available upstream (see https://issues.apache.org/jira/browse/QPID-3076).  

Given that p/f/c is highly desired by MRG 2.0 cluster customers, and the patch itself appears very low risk, we should include it in the MRG 2.0 release.

Comment 1 Ken Giusti 2011-05-03 15:35:16 UTC
Testing notes:

The implementation of flow control in the broker leverages the broker's ability to asynchronously complete message.transfer commands.   This ability is new to 0.10, and is also used by store.  [Note well: this patch does NOT modify the existing async completion implementation in any way - it's unchanged from its original 0.10 release]

A good testing scenario would involve a configuration that has clustering + flowcontrol + store enabled simultaneously.  Such a configuration should be tested to ensure that it remains consistent across broker remove/add events while under transient and durable message load.

From the client point of view - testing should involve multiple senders on different threads with different capacity sending to the same destination queue.  In additions=, having a sending clients manually invoke "sync" at various points in the flow will trigger flow control code that pends the "sync" operation should any of the outstanding transmit messages be pending via flow control.  Example: Assume a sender has a capacity of 10 msgs.  If f/c triggers on the 6th message, that sender may continue sending another 4 messages before it will be blocked.  If that sender had invoked "sync" after the 6th message, the sync should cause the sender to block until flow control is release for msg #6.

Receiving clients should obviously consume at a slower rate than the aggregated sender rate.  Tests should run clients receiving from the same queue in different threads in order to guarantee that the broker correctly completes messages that may be consumed on different connection threads.

Internal to the broker, flow control tracks four states that a queue may be in at any given point in time:

1) # enqueued msgs >= flow stop threshold
2) # enqueued msgs < flow stop threshold AND # enqueued msgs >= flow resume threshold AND some number of enqueued messages are pending flow control release.
(eg: enqueue until f/c active, then dequeue a few without passing resume point).
3) # enqueued msgs < flow stop threshold AND # enqueued msgs >= flow resume threshold AND no enqueued messages are pending flow control release.
(eg: enqueue up to flow stop threshold)
4) # enqueued msgs < flow resume threshold

Tests should be crafted that cause each of the above queue states, and then attempt to add a new broker to the cluster.  It is expected that the queue state should synchronise on the new broker.

The queue's qmf managment object has a counter that tracks the # of transitions into flow control state that the queue has experienced.  There is also a boolean flag that indicates the current flow control state of the queue.  Tests should verify these states (via qmf queries).

Comment 3 Frantisek Reznicek 2011-05-30 12:51:34 UTC
The producer flow control is now enabled for MRG/M cluster configurations.
 - basic flow control functionality works well
 - basic flow control functionality together with cluster topology change involving failover works well


More stress tests needed, still under test.

Comment 5 ppecka 2011-06-02 15:02:34 UTC
Unit tests launched on rhel5.6 / rhel6.1 (i686/x86_64)
cluster_tests.ShortTests.test_queue_flowlimit ............................................ pass
cluster_tests.ShortTests.test_queue_flowlimit_cluster .................................... pass
cluster_tests.ShortTests.test_queue_flowlimit_cluster_join ............................... pass

# rpm -qa | grep qpid
qpid-cpp-server-xml-0.10-7.el5
qpid-cpp-server-cluster-0.10-7.el5
qpid-cpp-server-0.10-7.el5
python-qpid-qmf-0.10-10.el5
qpid-java-common-0.10-6.el5
qpid-java-client-0.10-6.el5
qpid-cpp-server-devel-0.10-7.el5
qpid-cpp-client-devel-0.10-7.el5
qpid-cpp-client-ssl-0.10-7.el5
qpid-cpp-server-ssl-0.10-7.el5
qpid-java-example-0.10-6.el5
qpid-qmf-devel-0.10-10.el5
qpid-cpp-client-devel-docs-0.10-7.el5
qpidc-debuginfo-0.5.752581-3.el5
qpid-qmf-0.10-10.el5
qpid-cpp-server-store-0.10-7.el5
qpid-cpp-client-0.10-7.el5
python-qpid-0.10-1.el5
qpid-tools-0.10-5.el5

Comment 6 Frantisek Reznicek 2011-06-03 14:18:15 UTC
Cluster flow-control is enabled and functional.
Existing tests together with developed tests proved functionality.
Tested on RHEL 5.6 / 6.1  i[36]86 / x86_64 on above commented packages.

More stress needed, ETC coming Monday.

Comment 7 Frantisek Reznicek 2011-06-06 08:13:25 UTC
No issue discovered.

-> VERIFIED

Comment 8 errata-xmlrpc 2011-06-23 15:43:43 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0890.html

Comment 9 ppecka 2011-07-19 13:45:35 UTC
*** Bug 486419 has been marked as a duplicate of this bug. ***