Description of problem: The AMQP session.resume command is not fully implemented. Although available on the API. Attempting to resume after being disconnected will fail if the disconnect occurred part way through delivery of a message. This feature will be implemented fully and delivered with the clustering features which are currently under development.
The relevant sections of the AMPQ spec are under revision, no further work planned until the spec has solidified.
This is a use-case for session resume from a customer On 11/06/2009 11:39 AM, Carl Trieloff wrote: > Ted Ross wrote: >> On 11/06/2009 09:30 AM, Alan Conway wrote: >>> On 11/06/2009 09:10 AM, Ted Ross wrote: >>>> On 11/05/2009 11:55 AM, Cullen Davis wrote: >>>>> Our Qpid based product will be deployed with brokers being federated >>>>> brokers networks that are characterized as "disconnected, interrupted, >>>>> and low-bandwidth". We are using a dedicated hardware based network >>>>> shaper to simulate the network conditions in order to test our >>>>> solution. >>>>> >>>>> Qpid is performing very well in the tests involving high bit error >>>>> rates, packet loss, and high latencies. However, our solution is not >>>>> meeting threshold objectives in tests involving extended network >>>>> outages (packet loss = 100%). >>>>> >>>>> Our solution utilizes Qpid 0.5 C++ brokers and clients running on >>>>> RedHat Enterprise Linux 5.4. The brokers are utilizing direct >>>>> exchanges and have been federate as follows: >>>>> qpid-route --durable dynamic add brokerB brokerA fed.direct >>>>> >>>>> The qpid-route command created a new queue, named "bridge-queue" at >>>>> brokerA. The new queue had queue properties of durable=False, >>>>> exclusive=True and autoDelete=True. >>>>> >>>>> Our test begin with 1000 messages being published into broker A at a >>>>> rate of 1 per second. The network connection between broker A and >>>>> broker B is set to run at 56kbps for 5 minutes and then degrade to a >>>>> network outage stage (100% packet loss) for 15 minutes. >>>>> >>>>> The test begins and broker B starts receiving the messages through the >>>>> federated route at a frequency of 1 per second. About seven minutes >>>>> into the network outage stage, broker A throws a timeout error: >>>>> >>>>> Connection timed out: closing >>>>> DISCONNECTED 150.nnn.nnn.nnn (broker B's ip) >>>>> >>>>> This results in the bridge-queue on broker A being deleted. When the >>>>> network connection is re-established, the bridge-queue is rebuilt, but >>>>> none of the messages that were published into Broker A during the >>>>> network outage were federated to broker B. Essentially, this means >>>>> that broker B never receives more than half of the messages received >>>>> by broker A. >>>>> >>>>> The current theory is that the federated route is backed by a >>>>> bridge-queue with a autoDelete property of true. When the network >>>>> outage occurs, the queue is deleted and the message counts are >>>>> flushed. The durable flag on the route causes the bridge-queue to be >>>>> rebuilt when the brokers reconnect, but there is no way for the >>>>> bridge-queue to establish what messages have not been federated. Could >>>>> setting the autoDelete property fix the problem? I am unsure of how to >>>>> properly set this property on a "system management" queue. >>>>> >>>>> Any thoughts on how to properly configure a broker link / route that >>>>> can survive extended network outages would be greatly appreciated. >>>>> >>>>> Cullen J. Davis >>>>> CommIT Enterprises, Inc. >>>>> >>>>> --------------------------------------------------------------------- >>>>> Apache Qpid - AMQP Messaging Implementation >>>>> Project: http://qpid.apache.org >>>>> Use/Interact: users-subscribe.org >>>>> >>>> Your theory is correct. An "exchange" route causes a temporary transit >>>> queue to be created to hold messages waiting to be sent from broker to >>>> broker. Even though the route is durable, meaning it will be >>>> re-established after a broker restart, the temporary queue is not >>>> (it is >>>> exclusive/auto-delete) and any messages in the queue when a restart >>>> occurs will be lost. >>>> >>>> You can use a "queue" route where rather than connecting to a remote >>>> exchange, the destination broker subscribes to an existing queue. This >>>> queue can be non exclusive and durable. Be sure to use the --ack N >>>> option in qpid-route where N is a number greater than zero. This will >>>> cause the inter-broker route to use message acknowledgement in such a >>>> way that recovery will be clean (i.e. the source broker will not >>>> discard >>>> messages from the queue until they are acknowledged by the destination >>>> broker). >>>> >>>> The downside of the queue route solution is that you don't get the >>>> dynamic binding behavior. It is possible (though not implemented) to >>>> use >>>> durable transit queues when durable routes are created so that no >>>> messages would ever be lost in the event of broker failure. >>>> >>> That creates a garbage collection problem though. We need a way to >>> detect the difference between >>> - remote target is done with queue it should be deleted >>> - remote target is temporarily inaccessible but may reconnect >>> >>> <soapbox>since we don't support session resume, we don't currently >>> have a way of distinguishing the two cases. If the lifecycle of the >>> queue was tied to a session that could outlive a connection that >>> would be one way to solve this problem. >>> >>> We could perhaps tie desctruction of the sources queue to destruction >>> of the destinations queue, with a configurable timeout to get rid of >>> the source queue if the destination goes away and never comes back. >>> However that seems like a non-standard way of duplicating a standard >>> feature that would also be useful in other situations. >>> </soapbox> >> This is a big part of why this hasn't been done. It's easy enough to >> create the queues, it's quite difficult to determine when they should >> be deleted. And worse, when they're not deleted, they accumulate >> messages. >> >> -Ted >> > > correct, we really need to use session timeout for this, it would solve > this case. > > Can we DZ/Jira it with that in the BZ. > > Carl.
Closing this BZ as session resume is not in immediate plans.