Bug 700822 - Producer flow control needs to be enabled for clustering.
Summary: Producer flow control needs to be enabled for clustering.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: 2.0
: ---
Assignee: Ken Giusti
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
: 486419 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-29 13:44 UTC by Ken Giusti
Modified: 2015-11-16 01:13 UTC (History)
7 users (show)

Fixed In Version: qpid-cpp-mrg-0.10-6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-23 15:43:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 660291 0 high CLOSED [RFE] Producer flow control 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHEA-2011:0890 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging 2.0 Release 2011-06-23 15:42:41 UTC

Internal Links: 660291

Description Ken Giusti 2011-04-29 13:44:06 UTC
Description of problem:

Producer flow control is part of the qpid 0.10 release, however it was disabled when used in a clustered broker as the work needed to get replication working did not make the 0.10 release window.

A patch that enables producer flow control in the clustered environment has been made available upstream (see https://issues.apache.org/jira/browse/QPID-3076).  

Given that p/f/c is highly desired by MRG 2.0 cluster customers, and the patch itself appears very low risk, we should include it in the MRG 2.0 release.

Comment 1 Ken Giusti 2011-05-03 15:35:16 UTC
Testing notes:

The implementation of flow control in the broker leverages the broker's ability to asynchronously complete message.transfer commands.   This ability is new to 0.10, and is also used by store.  [Note well: this patch does NOT modify the existing async completion implementation in any way - it's unchanged from its original 0.10 release]

A good testing scenario would involve a configuration that has clustering + flowcontrol + store enabled simultaneously.  Such a configuration should be tested to ensure that it remains consistent across broker remove/add events while under transient and durable message load.

From the client point of view - testing should involve multiple senders on different threads with different capacity sending to the same destination queue.  In additions=, having a sending clients manually invoke "sync" at various points in the flow will trigger flow control code that pends the "sync" operation should any of the outstanding transmit messages be pending via flow control.  Example: Assume a sender has a capacity of 10 msgs.  If f/c triggers on the 6th message, that sender may continue sending another 4 messages before it will be blocked.  If that sender had invoked "sync" after the 6th message, the sync should cause the sender to block until flow control is release for msg #6.

Receiving clients should obviously consume at a slower rate than the aggregated sender rate.  Tests should run clients receiving from the same queue in different threads in order to guarantee that the broker correctly completes messages that may be consumed on different connection threads.

Internal to the broker, flow control tracks four states that a queue may be in at any given point in time:

1) # enqueued msgs >= flow stop threshold
2) # enqueued msgs < flow stop threshold AND # enqueued msgs >= flow resume threshold AND some number of enqueued messages are pending flow control release.
(eg: enqueue until f/c active, then dequeue a few without passing resume point).
3) # enqueued msgs < flow stop threshold AND # enqueued msgs >= flow resume threshold AND no enqueued messages are pending flow control release.
(eg: enqueue up to flow stop threshold)
4) # enqueued msgs < flow resume threshold

Tests should be crafted that cause each of the above queue states, and then attempt to add a new broker to the cluster.  It is expected that the queue state should synchronise on the new broker.

The queue's qmf managment object has a counter that tracks the # of transitions into flow control state that the queue has experienced.  There is also a boolean flag that indicates the current flow control state of the queue.  Tests should verify these states (via qmf queries).

Comment 3 Frantisek Reznicek 2011-05-30 12:51:34 UTC
The producer flow control is now enabled for MRG/M cluster configurations.
 - basic flow control functionality works well
 - basic flow control functionality together with cluster topology change involving failover works well


More stress tests needed, still under test.

Comment 5 ppecka 2011-06-02 15:02:34 UTC
Unit tests launched on rhel5.6 / rhel6.1 (i686/x86_64)
cluster_tests.ShortTests.test_queue_flowlimit ............................................ pass
cluster_tests.ShortTests.test_queue_flowlimit_cluster .................................... pass
cluster_tests.ShortTests.test_queue_flowlimit_cluster_join ............................... pass

# rpm -qa | grep qpid
qpid-cpp-server-xml-0.10-7.el5
qpid-cpp-server-cluster-0.10-7.el5
qpid-cpp-server-0.10-7.el5
python-qpid-qmf-0.10-10.el5
qpid-java-common-0.10-6.el5
qpid-java-client-0.10-6.el5
qpid-cpp-server-devel-0.10-7.el5
qpid-cpp-client-devel-0.10-7.el5
qpid-cpp-client-ssl-0.10-7.el5
qpid-cpp-server-ssl-0.10-7.el5
qpid-java-example-0.10-6.el5
qpid-qmf-devel-0.10-10.el5
qpid-cpp-client-devel-docs-0.10-7.el5
qpidc-debuginfo-0.5.752581-3.el5
qpid-qmf-0.10-10.el5
qpid-cpp-server-store-0.10-7.el5
qpid-cpp-client-0.10-7.el5
python-qpid-0.10-1.el5
qpid-tools-0.10-5.el5

Comment 6 Frantisek Reznicek 2011-06-03 14:18:15 UTC
Cluster flow-control is enabled and functional.
Existing tests together with developed tests proved functionality.
Tested on RHEL 5.6 / 6.1  i[36]86 / x86_64 on above commented packages.

More stress needed, ETC coming Monday.

Comment 7 Frantisek Reznicek 2011-06-06 08:13:25 UTC
No issue discovered.

-> VERIFIED

Comment 8 errata-xmlrpc 2011-06-23 15:43:43 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0890.html

Comment 9 ppecka 2011-07-19 13:45:35 UTC
*** Bug 486419 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.