556351 – clustered qpidd - durable exchanges do not survive cluster restart.

Bug 556351 - clustered qpidd - durable exchanges do not survive cluster restart.

Summary: clustered qpidd - durable exchanges do not survive cluster restart.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	qpid-cpp
Sub Component:
Version:	1.2
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	1.3
Target Release:	---
Assignee:	Kim van der Riet
QA Contact:	Jiri Kolar
Docs Contact:
URL:
Whiteboard:
Depends On:	557243
Blocks:
TreeView+	depends on / blocked

Reported:	2010-01-18 02:19 UTC by Kelsey Hightower
Modified:	2010-10-14 16:00 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, performing a full restart on clusters containing durable exchange lost both the durable exchange and its bindings. This resulted in incomplete exchanges on restart. Consequent to multiple, independent changes to the underlying cluster code, this problem no longer presents: durable exchanges and their bindings are now recovered as expected after restarting a cluster.
Clone Of:
Environment:
Last Closed:	2010-10-14 16:00:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2010:0773	0	normal	SHIPPED_LIVE	Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3	2010-10-14 15:56:44 UTC

Description Kelsey Hightower 2010-01-18 02:19:13 UTC

Description of problem:

Durables exchanges will not survive a full cluster restart.

Version-Release number of selected component (if applicable):
qpidd-0.5.752581-34.el5
qpidc-0.5.752581-34.el5
python-qpid-0.5.752581-4.el5
qpidd-cluster-0.5.752581-34.el5
rhm-0.5.3206-27.el5
openais-0.80.6-8.el5
Red Hat Enterprise Linux Server release 5.4

How reproducible:
100%

Steps to Reproduce:
1. Start qpidd on broker1 and broker2 (service qpidd start)
2. execute the following commands:

"qpid-config add exchange direct nfl.scores --durable"
"qpid-config add queue falcons --durable"
"qpid-config bind nfl.scores falcons falcons"

3. Stop qpidd on broker1 and broker2 (service qpidd stop)
4. Start qpidd on broker1 and broker2 (service qpidd start)

Actual results:

Before stopping the brokers, send some durable test messages to "nfl.scores" using "falcons" as the routing key. All works as expected.

The following commands list the "nfl.scores" exchange as durable and bound to the "falcons" queue.
"qpid-stat -e"
"qpid-config -b exchanges"

After stopping both brokers and restarting them one at a time, the "nfl.scores" exchange and bindings are gone. The "nfl.scores" exchange does not show up in the output of the management tools:

"qpid-stat -e"
"qpid-config -b exchanges"

Sending the same test messages produces the following error:

qpid.session.SessionException: exception(error_code=404, command_id=serial(0), class_code=0, command_code=0, field_index=0, descript
ion=u'not-found: Exchange not found: nfl.scores (qpid/broker/ExchangeRegistry.cpp:92)', error_info={})

Running:
"qpid-stat -q"

The "falcons" queue is still listed in the output and marked durable. The message count matches the number of test messages sent.

Expected results:
Durable exchanges and bindings should survive a cluster restart.

Additional info:

Comment 1 Kelsey Hightower 2010-01-18 03:35:07 UTC

After further testing, if I replace the "nfl.scores" exchange with one of the default exchanges "amq.direct", I can achieve the expected results. 

Repeating the test, the bindings are preserved after a full cluster restart. I had to omit creating the durable exchange as the amq.direct exchange is available by default.

This issues seems to be isolated to user-defined exchanges.

Comment 2 Kim van der Riet 2010-01-26 15:43:00 UTC

Testing on trunk, I am unable to reproduce this error. However, it is possible that this issue might have been fixed by recent updates to the cluster on the trunk since 1.2 was released.

Currently, the "falcons" queue and its binding are being recovered in the above scenario, but the recovery of messages fails - this is a separate bug (see bug 557243).

I'm not comfortable closing this bug until bug 557243 is resolved and this scenario can be completed successfully. I am setting this bug to depend on bug 557243.

Comment 3 Kim van der Riet 2010-04-12 14:07:51 UTC

Bug 557243 is now in state MODIFIED.

I retested the above scenario; it completes successfully. Durable exchange "nfl.scores", queue "falcons" and the binding between them are recovered, as are the persistent messages on the queue.

Setting to MODIFIED (although I did not make any specific fix, this was solved by one of the numerous clustering bugfixes/updates.

QA: the above scenario should be easy to verify.

Comment 4 Kim van der Riet 2010-04-12 14:08:38 UTC

Above tested with qpid r.933222 / store r.3903.

Comment 5 Jiri Kolar 2010-06-17 08:53:31 UTC

Tested:
on 752581 bug appears
on 946106 does not. It has been fixed

validated on RHEL  5.5 i386 / x86_64 not on RHEL4 because of no clustering

packages:

# rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u

openais-0.80.6-16.el5_5.1
openais-debuginfo-0.80.6-16.el5_5.1
python-qpid-0.7.946106-1.el5
qpid-cpp-client-0.7.946106-2.el5
qpid-cpp-client-devel-0.7.946106-2.el5
qpid-cpp-client-devel-docs-0.7.946106-2.el5
qpid-cpp-client-ssl-0.7.946106-2.el5
qpid-cpp-mrg-debuginfo-0.7.946106-1.el5
qpid-cpp-server-0.7.946106-2.el5
qpid-cpp-server-cluster-0.7.946106-2.el5
qpid-cpp-server-devel-0.7.946106-2.el5
qpid-cpp-server-ssl-0.7.946106-2.el5
qpid-cpp-server-store-0.7.946106-2.el5
qpid-cpp-server-xml-0.7.946106-2.el5
qpid-java-client-0.7.946106-3.el5
qpid-java-common-0.7.946106-3.el5
qpid-tools-0.7.946106-4.el5
rhm-docs-0.7.946106-1.el5

->VERIFIED

Comment 6 Kim van der Riet 2010-10-05 15:49:06 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: Restarting a cluster which contains durable exchange will lose the durable exchange and its bindings.

Consequence: The restart is incomplete as exchanges which should be present after restart are absent.

Fix: No specific fix was made; various code changes in the cluster code seem to have solved this independently of this bug.

Result: The durable exchanges and their bindings are now recovered as expected after restarting the cluster.

Comment 7 Jaromir Hradilek 2010-10-05 23:03:29 UTC

    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,7 +1 @@
-Cause: Restarting a cluster which contains durable exchange will lose the durable exchange and its bindings.
+Previously, performing a full restart on a cluster with durable exchanges caused such exchanges to be lost. This error has been fixed, and all durable exchanges are now recovered as expected.-
-Consequence: The restart is incomplete as exchanges which should be present after restart are absent.
-
-Fix: No specific fix was made; various code changes in the cluster code seem to have solved this independently of this bug.
-
-Result: The durable exchanges and their bindings are now recovered as expected after restarting the cluster.

Comment 8 Brian Forte 2010-10-05 23:29:26 UTC

    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Previously, performing a full restart on a cluster with durable exchanges caused such exchanges to be lost. This error has been fixed, and all durable exchanges are now recovered as expected.+Previously , performing a full restart on clusters containing durable exchange lost both the durable exchange and its bindings. This resulted in incomplete exchanges on restart. Consequent to multiple, independent changes to the underlying cluster code, this problem no longer presents: durable exchanges and their bindings are now recovered as expected after restarting a cluster.

Comment 9 Jaromir Hradilek 2010-10-06 11:08:23 UTC

    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Previously , performing a full restart on clusters containing durable exchange lost both the durable exchange and its bindings. This resulted in incomplete exchanges on restart. Consequent to multiple, independent changes to the underlying cluster code, this problem no longer presents: durable exchanges and their bindings are now recovered as expected after restarting a cluster.+Previously, performing a full restart on clusters containing durable exchange lost both the durable exchange and its bindings. This resulted in incomplete exchanges on restart. Consequent to multiple, independent changes to the underlying cluster code, this problem no longer presents: durable exchanges and their bindings are now recovered as expected after restarting a cluster.

Comment 11 errata-xmlrpc 2010-10-14 16:00:48 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html

Note You need to log in before you can comment on or make changes to this bug.