Description of problem: Durables exchanges will not survive a full cluster restart. Version-Release number of selected component (if applicable): qpidd-0.5.752581-34.el5 qpidc-0.5.752581-34.el5 python-qpid-0.5.752581-4.el5 qpidd-cluster-0.5.752581-34.el5 rhm-0.5.3206-27.el5 openais-0.80.6-8.el5 Red Hat Enterprise Linux Server release 5.4 How reproducible: 100% Steps to Reproduce: 1. Start qpidd on broker1 and broker2 (service qpidd start) 2. execute the following commands: "qpid-config add exchange direct nfl.scores --durable" "qpid-config add queue falcons --durable" "qpid-config bind nfl.scores falcons falcons" 3. Stop qpidd on broker1 and broker2 (service qpidd stop) 4. Start qpidd on broker1 and broker2 (service qpidd start) Actual results: Before stopping the brokers, send some durable test messages to "nfl.scores" using "falcons" as the routing key. All works as expected. The following commands list the "nfl.scores" exchange as durable and bound to the "falcons" queue. "qpid-stat -e" "qpid-config -b exchanges" After stopping both brokers and restarting them one at a time, the "nfl.scores" exchange and bindings are gone. The "nfl.scores" exchange does not show up in the output of the management tools: "qpid-stat -e" "qpid-config -b exchanges" Sending the same test messages produces the following error: qpid.session.SessionException: exception(error_code=404, command_id=serial(0), class_code=0, command_code=0, field_index=0, descript ion=u'not-found: Exchange not found: nfl.scores (qpid/broker/ExchangeRegistry.cpp:92)', error_info={}) Running: "qpid-stat -q" The "falcons" queue is still listed in the output and marked durable. The message count matches the number of test messages sent. Expected results: Durable exchanges and bindings should survive a cluster restart. Additional info:
After further testing, if I replace the "nfl.scores" exchange with one of the default exchanges "amq.direct", I can achieve the expected results. Repeating the test, the bindings are preserved after a full cluster restart. I had to omit creating the durable exchange as the amq.direct exchange is available by default. This issues seems to be isolated to user-defined exchanges.
Testing on trunk, I am unable to reproduce this error. However, it is possible that this issue might have been fixed by recent updates to the cluster on the trunk since 1.2 was released. Currently, the "falcons" queue and its binding are being recovered in the above scenario, but the recovery of messages fails - this is a separate bug (see bug 557243). I'm not comfortable closing this bug until bug 557243 is resolved and this scenario can be completed successfully. I am setting this bug to depend on bug 557243.
Bug 557243 is now in state MODIFIED. I retested the above scenario; it completes successfully. Durable exchange "nfl.scores", queue "falcons" and the binding between them are recovered, as are the persistent messages on the queue. Setting to MODIFIED (although I did not make any specific fix, this was solved by one of the numerous clustering bugfixes/updates. QA: the above scenario should be easy to verify.
Above tested with qpid r.933222 / store r.3903.
Tested: on 752581 bug appears on 946106 does not. It has been fixed validated on RHEL 5.5 i386 / x86_64 not on RHEL4 because of no clustering packages: # rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u openais-0.80.6-16.el5_5.1 openais-debuginfo-0.80.6-16.el5_5.1 python-qpid-0.7.946106-1.el5 qpid-cpp-client-0.7.946106-2.el5 qpid-cpp-client-devel-0.7.946106-2.el5 qpid-cpp-client-devel-docs-0.7.946106-2.el5 qpid-cpp-client-ssl-0.7.946106-2.el5 qpid-cpp-mrg-debuginfo-0.7.946106-1.el5 qpid-cpp-server-0.7.946106-2.el5 qpid-cpp-server-cluster-0.7.946106-2.el5 qpid-cpp-server-devel-0.7.946106-2.el5 qpid-cpp-server-ssl-0.7.946106-2.el5 qpid-cpp-server-store-0.7.946106-2.el5 qpid-cpp-server-xml-0.7.946106-2.el5 qpid-java-client-0.7.946106-3.el5 qpid-java-common-0.7.946106-3.el5 qpid-tools-0.7.946106-4.el5 rhm-docs-0.7.946106-1.el5 ->VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Restarting a cluster which contains durable exchange will lose the durable exchange and its bindings. Consequence: The restart is incomplete as exchanges which should be present after restart are absent. Fix: No specific fix was made; various code changes in the cluster code seem to have solved this independently of this bug. Result: The durable exchanges and their bindings are now recovered as expected after restarting the cluster.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,7 +1 @@ -Cause: Restarting a cluster which contains durable exchange will lose the durable exchange and its bindings. +Previously, performing a full restart on a cluster with durable exchanges caused such exchanges to be lost. This error has been fixed, and all durable exchanges are now recovered as expected.- -Consequence: The restart is incomplete as exchanges which should be present after restart are absent. - -Fix: No specific fix was made; various code changes in the cluster code seem to have solved this independently of this bug. - -Result: The durable exchanges and their bindings are now recovered as expected after restarting the cluster.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -Previously, performing a full restart on a cluster with durable exchanges caused such exchanges to be lost. This error has been fixed, and all durable exchanges are now recovered as expected.+Previously , performing a full restart on clusters containing durable exchange lost both the durable exchange and its bindings. This resulted in incomplete exchanges on restart. Consequent to multiple, independent changes to the underlying cluster code, this problem no longer presents: durable exchanges and their bindings are now recovered as expected after restarting a cluster.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -Previously , performing a full restart on clusters containing durable exchange lost both the durable exchange and its bindings. This resulted in incomplete exchanges on restart. Consequent to multiple, independent changes to the underlying cluster code, this problem no longer presents: durable exchanges and their bindings are now recovered as expected after restarting a cluster.+Previously, performing a full restart on clusters containing durable exchange lost both the durable exchange and its bindings. This resulted in incomplete exchanges on restart. Consequent to multiple, independent changes to the underlying cluster code, this problem no longer presents: durable exchanges and their bindings are now recovered as expected after restarting a cluster.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html