Description of problem: Producer flow control is part of the qpid 0.10 release, however it was disabled when used in a clustered broker as the work needed to get replication working did not make the 0.10 release window. A patch that enables producer flow control in the clustered environment has been made available upstream (see https://issues.apache.org/jira/browse/QPID-3076). Given that p/f/c is highly desired by MRG 2.0 cluster customers, and the patch itself appears very low risk, we should include it in the MRG 2.0 release.
Testing notes: The implementation of flow control in the broker leverages the broker's ability to asynchronously complete message.transfer commands. This ability is new to 0.10, and is also used by store. [Note well: this patch does NOT modify the existing async completion implementation in any way - it's unchanged from its original 0.10 release] A good testing scenario would involve a configuration that has clustering + flowcontrol + store enabled simultaneously. Such a configuration should be tested to ensure that it remains consistent across broker remove/add events while under transient and durable message load. From the client point of view - testing should involve multiple senders on different threads with different capacity sending to the same destination queue. In additions=, having a sending clients manually invoke "sync" at various points in the flow will trigger flow control code that pends the "sync" operation should any of the outstanding transmit messages be pending via flow control. Example: Assume a sender has a capacity of 10 msgs. If f/c triggers on the 6th message, that sender may continue sending another 4 messages before it will be blocked. If that sender had invoked "sync" after the 6th message, the sync should cause the sender to block until flow control is release for msg #6. Receiving clients should obviously consume at a slower rate than the aggregated sender rate. Tests should run clients receiving from the same queue in different threads in order to guarantee that the broker correctly completes messages that may be consumed on different connection threads. Internal to the broker, flow control tracks four states that a queue may be in at any given point in time: 1) # enqueued msgs >= flow stop threshold 2) # enqueued msgs < flow stop threshold AND # enqueued msgs >= flow resume threshold AND some number of enqueued messages are pending flow control release. (eg: enqueue until f/c active, then dequeue a few without passing resume point). 3) # enqueued msgs < flow stop threshold AND # enqueued msgs >= flow resume threshold AND no enqueued messages are pending flow control release. (eg: enqueue up to flow stop threshold) 4) # enqueued msgs < flow resume threshold Tests should be crafted that cause each of the above queue states, and then attempt to add a new broker to the cluster. It is expected that the queue state should synchronise on the new broker. The queue's qmf managment object has a counter that tracks the # of transitions into flow control state that the queue has experienced. There is also a boolean flag that indicates the current flow control state of the queue. Tests should verify these states (via qmf queries).
Merged to mrg_2.0.x: http://mrg1.lab.bos.redhat.com/git/?p=qpid.git;a=commitdiff;h=a3ee020d58ef0f2d83865dd6e1bf724c093b616e http://mrg1.lab.bos.redhat.com/git/?p=qpid.git;a=commitdiff;h=960b155735f8f1910c86262f8d65f5df6ab72d6b
The producer flow control is now enabled for MRG/M cluster configurations. - basic flow control functionality works well - basic flow control functionality together with cluster topology change involving failover works well More stress tests needed, still under test.
Unit tests launched on rhel5.6 / rhel6.1 (i686/x86_64) cluster_tests.ShortTests.test_queue_flowlimit ............................................ pass cluster_tests.ShortTests.test_queue_flowlimit_cluster .................................... pass cluster_tests.ShortTests.test_queue_flowlimit_cluster_join ............................... pass # rpm -qa | grep qpid qpid-cpp-server-xml-0.10-7.el5 qpid-cpp-server-cluster-0.10-7.el5 qpid-cpp-server-0.10-7.el5 python-qpid-qmf-0.10-10.el5 qpid-java-common-0.10-6.el5 qpid-java-client-0.10-6.el5 qpid-cpp-server-devel-0.10-7.el5 qpid-cpp-client-devel-0.10-7.el5 qpid-cpp-client-ssl-0.10-7.el5 qpid-cpp-server-ssl-0.10-7.el5 qpid-java-example-0.10-6.el5 qpid-qmf-devel-0.10-10.el5 qpid-cpp-client-devel-docs-0.10-7.el5 qpidc-debuginfo-0.5.752581-3.el5 qpid-qmf-0.10-10.el5 qpid-cpp-server-store-0.10-7.el5 qpid-cpp-client-0.10-7.el5 python-qpid-0.10-1.el5 qpid-tools-0.10-5.el5
Cluster flow-control is enabled and functional. Existing tests together with developed tests proved functionality. Tested on RHEL 5.6 / 6.1 i[36]86 / x86_64 on above commented packages. More stress needed, ETC coming Monday.
No issue discovered. -> VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0890.html
*** Bug 486419 has been marked as a duplicate of this bug. ***