Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 486419

Summary:	Producer flow control could cause problems in a cluster.
Product:	Red Hat Enterprise MRG	Reporter:	Alan Conway <aconway>
Component:	qpid-cpp	Assignee:	Alan Conway <aconway>
Status:	CLOSED DUPLICATE	QA Contact:	ppecka <ppecka>
Severity:	medium	Docs Contact:
Priority:	urgent
Version:	1.1	CC:	gsim, jross, ppecka
Target Milestone:	1.1.1
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-07-19 13:45:35 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Alan Conway 2009-02-19 17:29:49 UTC

Description of problem:

Producer flow control could cause problems in a cluster. 

The broker could send message.flow commands at different points in the outgoing command stream based on time differences measured on each node. This would change the command-id's of subsequent commands, and those IDs are used by the client to identify messages in acks.

There could be another ad-hoc solution here but this + TTL makes me feel like  we need a general solution to making cluster-predictable time-based decisions.

I have a notion in mind of a "cluster clock":
 - every node timestamps it's outgoing multicasts
 - cluster-time is updated on each incoming multicast to be the max of the incoming timestamp and the previous cluster-time.

That would impose no additional multicast traffic. It would involve getting the system time for every mcast, not sure if that's a significant performance issue.

Since every action in the broker is triggered by delivery of an mcast, the clock would always have a recently-updated value at the point where time calculations are done.

Then we make the broker's clock plugable and plug in the cluster time when in a cluster. This would make all time-based descisions predictable and solve the whole class of time-based problems.

Once implemented the current ad-hoc solution for message ttl should be removed.

Comment 1 Alan Conway 2009-02-19 19:18:43 UTC

> Scenario: Broker has max publish rate set to 100 per sec. Client sends 
> 100 messages starting at time t1, then runs out of credit. Cluster 
> processes those one hundred messages updating clock on each. At the end 
> of processing those messages the clock is still less than t2 (where t2 = 
> t1 + 1 sec, the time at which new credit can be allocated to the session).
> 
> If there is no more 'real' cluster traffic, one node will have to send 
> an internally generated multicast to coincide with t2 and trigger 
> reallocation of credit.

Yes you're right - in situations which use a timer we need one member to register a real timer event and multicast a clock update when the timer goes off. All nodes would register the original flow-control event with the cluster timer, which runs events based on cluster time changing.

Comment 2 Alan Conway 2009-02-24 19:49:51 UTC

Fixed in revision 747528

Comment 3 Alan Conway 2009-02-24 19:56:30 UTC

Not fixed with my delirious cluster-clock solution suggested above. 

When in a cluster only the directy connected node does flow control calculations, then multicasts the required commands. All nodes process the sending of the commands in cluster-order.

Comment 4 Justin Ross 2011-06-27 21:02:41 UTC

Alan, the resolution of this is a little unclear to me.  At any rate, I suspect this bug is obviated by a subsequent change.  Safe to close this one?

Comment 5 Alan Conway 2011-06-28 10:02:46 UTC

It was fixed in revision 747528 which is on the mrg_2.0.x branch.

Comment 6 Justin Ross 2011-06-28 14:16:17 UTC

Frantisek, this one is pretty old.  Did it get verified?

Comment 7 ppecka 2011-06-30 15:10:38 UTC

Is testing for this issue still meaningful since BZ700822 introduced a change in clustered producer flow control and is already in verified state? 

Otherwise this issue needs clarification on what should be tested here and how the mechanism works.

Comment 8 Alan Conway 2011-07-18 15:01:19 UTC

This does not need further testing, testing for BZ700822 would cover this as well.

Comment 9 ppecka 2011-07-19 13:45:35 UTC


*** This bug has been marked as a duplicate of bug 700822 ***