Red Hat Bugzilla – Bug 441698
Feature: Support async queue replication
Last modified: 2009-04-21 12:16:22 EDT
For disaster recovery there is a desire to have asynchronous replication of
changes to particular queues from one data centre to a passive replica at
On failure some updates could be lost due to the async nature of the
replication; this is acceptable. The switch to the passive backup would require
manual intervention. The backup system should be available within seconds of
being made active; it could take a further hour or so to recover all the
messages for very large queues.
Asynchronous queue replication from one clustered "primary" broker to another
clustered "secondary" broker (asynchronous site to site replication). For
disaster recovery purposes there is a need to have asynchronous sequential
replication of changes (enqueue and de-queue of messages) made to queues on
primary broker in one data center to a secondary broker at another data center.
Although asynchronous, the updates' sequence should be precisely the same on
secondary broker queues as it was on the primary broker queues in the primary
site. Transactional changes to the primary broker queues should be applied to
secondary broker's queues in the same transactional fashion. Secondary broker
queues are not used (no messages are sent to or read from queues) by any AMQP
clients until the replica is manually activated in case of a DR. In case of a
failure all updates done on primary broker that have not been copied to the
secondary broker could be lost due to the asynchronous nature of the
replication. Switching the secondary broker to become active in case of DR is
manual, in which case it could take some time to recover all the messages for
very large queues.
Created attachment 329978 [details]
Notes on setting up and using the async replication feature
This text has also been added to the upstream wiki:
See attachement above for details on this feature. There are also two test scenarios checked in to the qpid svn:
Simple low volume verification of the basic functionality of replication where after configuration messages are sent and acknowleged then the state of the backup queues are validated. Both enqueue-and-dequeue and enqueue-only modes are tested. This is run as part of 'make check'.
This tests reliability in the face of bridge/link failure. It only deals with the enqueue-only mode as the other functionallity is covered by the test above. During replication of a large volume of messages the link is destroyed then re-created periodically and at the end the test ensures that all expected messages were replicated with no duplications.
Should have also added that src/tests/reliable_replication_test is automated as part of 'make check-long' (need to be in src/tests for that).
The feature has been implemented and above mentioned tests are passing on RHEL 4.7 / 5.3 i386 / x86_64 on packages:
qpidd-0.4.744917-1.el5, rhm-0.4.3116-3.el5, python-qpid-0.4.743856-1.el5
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.