Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 486419

Summary: Producer flow control could cause problems in a cluster.
Product: Red Hat Enterprise MRG Reporter: Alan Conway <aconway>
Component: qpid-cppAssignee: Alan Conway <aconway>
Status: CLOSED DUPLICATE QA Contact: ppecka <ppecka>
Severity: medium Docs Contact:
Priority: urgent    
Version: 1.1CC: gsim, jross, ppecka
Target Milestone: 1.1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-19 13:45:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alan Conway 2009-02-19 17:29:49 UTC
Description of problem:

Producer flow control could cause problems in a cluster. 

The broker could send message.flow commands at different points in the outgoing command stream based on time differences measured on each node. This would change the command-id's of subsequent commands, and those IDs are used by the client to identify messages in acks.

There could be another ad-hoc solution here but this + TTL makes me feel like  we need a general solution to making cluster-predictable time-based decisions.

I have a notion in mind of a "cluster clock":
 - every node timestamps it's outgoing multicasts
 - cluster-time is updated on each incoming multicast to be the max of the incoming timestamp and the previous cluster-time.

That would impose no additional multicast traffic. It would involve getting the system time for every mcast, not sure if that's a significant performance issue.

Since every action in the broker is triggered by delivery of an mcast, the clock would always have a recently-updated value at the point where time calculations are done.

Then we make the broker's clock plugable and plug in the cluster time when in a cluster. This would make all time-based descisions predictable and solve the whole class of time-based problems.

Once implemented the current ad-hoc solution for message ttl should be removed.

Comment 1 Alan Conway 2009-02-19 19:18:43 UTC
> Scenario: Broker has max publish rate set to 100 per sec. Client sends 
> 100 messages starting at time t1, then runs out of credit. Cluster 
> processes those one hundred messages updating clock on each. At the end 
> of processing those messages the clock is still less than t2 (where t2 = 
> t1 + 1 sec, the time at which new credit can be allocated to the session).
> 
> If there is no more 'real' cluster traffic, one node will have to send 
> an internally generated multicast to coincide with t2 and trigger 
> reallocation of credit.

Yes you're right - in situations which use a timer we need one member to register a real timer event and multicast a clock update when the timer goes off. All nodes would register the original flow-control event with the cluster timer, which runs events based on cluster time changing.

Comment 2 Alan Conway 2009-02-24 19:49:51 UTC
Fixed in revision 747528

Comment 3 Alan Conway 2009-02-24 19:56:30 UTC
Not fixed with my delirious cluster-clock solution suggested above. 

When in a cluster only the directy connected node does flow control calculations, then multicasts the required commands. All nodes process the sending of the commands in cluster-order.

Comment 4 Justin Ross 2011-06-27 21:02:41 UTC
Alan, the resolution of this is a little unclear to me.  At any rate, I suspect this bug is obviated by a subsequent change.  Safe to close this one?

Comment 5 Alan Conway 2011-06-28 10:02:46 UTC
It was fixed in revision 747528 which is on the mrg_2.0.x branch.

Comment 6 Justin Ross 2011-06-28 14:16:17 UTC
Frantisek, this one is pretty old.  Did it get verified?

Comment 7 ppecka 2011-06-30 15:10:38 UTC
Is testing for this issue still meaningful since BZ700822 introduced a change in clustered producer flow control and is already in verified state? 

Otherwise this issue needs clarification on what should be tested here and how the mechanism works.

Comment 8 Alan Conway 2011-07-18 15:01:19 UTC
This does not need further testing, testing for BZ700822 would cover this as well.

Comment 9 ppecka 2011-07-19 13:45:35 UTC

*** This bug has been marked as a duplicate of bug 700822 ***