Bug 412011 - Session resume is not functional.
Summary: Session resume is not functional.
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: beta
Hardware: All
OS: Linux
medium
medium
Target Milestone: Next Version
: ---
Assignee: Alan Conway
QA Contact: Kim van der Riet
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-12-05 14:16 UTC by Alan Conway
Modified: 2009-12-11 21:12 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-12-11 21:12:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Alan Conway 2007-12-05 14:16:18 UTC
Description of problem:

The AMQP session.resume command is not fully implemented. Although available on
the API. Attempting to resume after being disconnected will fail if the
disconnect occurred part way through delivery of a message.

This feature will be implemented fully and delivered with the clustering
features which are currently under development.

Comment 1 Alan Conway 2008-01-03 19:12:03 UTC
The relevant sections of the AMPQ spec are under revision, no further work
planned until the spec has solidified. 

Comment 2 Alan Conway 2009-11-09 13:29:33 UTC
This is a use-case for session resume from a customer 

On 11/06/2009 11:39 AM, Carl Trieloff wrote:
> Ted Ross wrote:
>> On 11/06/2009 09:30 AM, Alan Conway wrote:
>>> On 11/06/2009 09:10 AM, Ted Ross wrote:
>>>> On 11/05/2009 11:55 AM, Cullen Davis wrote:
>>>>> Our Qpid based product will be deployed with brokers being federated
>>>>> brokers networks that are characterized as "disconnected, interrupted,
>>>>> and low-bandwidth". We are using a dedicated hardware based network
>>>>> shaper to simulate the network conditions in order to test our 
>>>>> solution.
>>>>>
>>>>> Qpid is performing very well in the tests involving high bit error
>>>>> rates, packet loss, and high latencies. However, our solution is not
>>>>> meeting threshold objectives in tests involving extended network
>>>>> outages (packet loss = 100%).
>>>>>
>>>>> Our solution utilizes Qpid 0.5 C++ brokers and clients running on
>>>>> RedHat Enterprise Linux 5.4. The brokers are utilizing direct
>>>>> exchanges and have been federate as follows:
>>>>> qpid-route --durable dynamic add brokerB brokerA fed.direct
>>>>>
>>>>> The qpid-route command created a new queue, named "bridge-queue" at
>>>>> brokerA. The new queue had queue properties of durable=False,
>>>>> exclusive=True and autoDelete=True.
>>>>>
>>>>> Our test begin with 1000 messages being published into broker A at a
>>>>> rate of 1 per second. The network connection between broker A and
>>>>> broker B is set to run at 56kbps for 5 minutes and then degrade to a
>>>>> network outage stage (100% packet loss) for 15 minutes.
>>>>>
>>>>> The test begins and broker B starts receiving the messages through the
>>>>> federated route at a frequency of 1 per second. About seven minutes
>>>>> into the network outage stage, broker A throws a timeout error:
>>>>>
>>>>> Connection timed out: closing
>>>>> DISCONNECTED 150.nnn.nnn.nnn (broker B's ip)
>>>>>
>>>>> This results in the bridge-queue on broker A being deleted. When the
>>>>> network connection is re-established, the bridge-queue is rebuilt, but
>>>>> none of the messages that were published into Broker A during the
>>>>> network outage were federated to broker B. Essentially, this means
>>>>> that broker B never receives more than half of the messages received
>>>>> by broker A.
>>>>>
>>>>> The current theory is that the federated route is backed by a
>>>>> bridge-queue with a autoDelete property of true. When the network
>>>>> outage occurs, the queue is deleted and the message counts are
>>>>> flushed. The durable flag on the route causes the bridge-queue to be
>>>>> rebuilt when the brokers reconnect, but there is no way for the
>>>>> bridge-queue to establish what messages have not been federated. Could
>>>>> setting the autoDelete property fix the problem? I am unsure of how to
>>>>> properly set this property on a "system management" queue.
>>>>>
>>>>> Any thoughts on how to properly configure a broker link / route that
>>>>> can survive extended network outages would be greatly appreciated.
>>>>>
>>>>> Cullen J. Davis
>>>>> CommIT Enterprises, Inc.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> Apache Qpid - AMQP Messaging Implementation
>>>>> Project: http://qpid.apache.org
>>>>> Use/Interact: users-subscribe.org
>>>>>
>>>> Your theory is correct. An "exchange" route causes a temporary transit
>>>> queue to be created to hold messages waiting to be sent from broker to
>>>> broker. Even though the route is durable, meaning it will be
>>>> re-established after a broker restart, the temporary queue is not 
>>>> (it is
>>>> exclusive/auto-delete) and any messages in the queue when a restart
>>>> occurs will be lost.
>>>>
>>>> You can use a "queue" route where rather than connecting to a remote
>>>> exchange, the destination broker subscribes to an existing queue. This
>>>> queue can be non exclusive and durable. Be sure to use the --ack N
>>>> option in qpid-route where N is a number greater than zero. This will
>>>> cause the inter-broker route to use message acknowledgement in such a
>>>> way that recovery will be clean (i.e. the source broker will not 
>>>> discard
>>>> messages from the queue until they are acknowledged by the destination
>>>> broker).
>>>>
>>>> The downside of the queue route solution is that you don't get the
>>>> dynamic binding behavior. It is possible (though not implemented) to 
>>>> use
>>>> durable transit queues when durable routes are created so that no
>>>> messages would ever be lost in the event of broker failure.
>>>>
>>> That creates a garbage collection problem though. We need a way to 
>>> detect the difference between
>>> - remote target is done with queue it should be deleted
>>> - remote target is temporarily inaccessible but may reconnect
>>>
>>> <soapbox>since we don't support session resume, we don't currently 
>>> have a way of distinguishing the two cases. If the lifecycle of the 
>>> queue was tied to a session that could outlive a connection that 
>>> would be one way to solve this problem.
>>>
>>> We could perhaps tie desctruction of the sources queue to destruction 
>>> of the destinations queue, with a configurable timeout to get rid of 
>>> the source queue if the destination goes away and never comes back. 
>>> However that seems like a non-standard way of duplicating a standard 
>>> feature that would also be useful in other situations.
>>> </soapbox>
>> This is a big part of why this hasn't been done. It's easy enough to 
>> create the queues, it's quite difficult to determine when they should 
>> be deleted. And worse, when they're not deleted, they accumulate 
>> messages.
>>
>> -Ted
>>
> 
> correct, we really need to use session timeout for this, it would solve 
> this case.
> 
> Can we DZ/Jira it with that in the BZ.
> 
> Carl.

Comment 3 Alan Conway 2009-12-11 21:12:01 UTC
Closing this BZ as session resume is not in immediate plans.


Note You need to log in before you can comment on or make changes to this bug.