Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 483807

Summary:	resolve join state for store recover in cluster for joining nodes
Product:	Red Hat Enterprise MRG	Reporter:	Carl Trieloff <cctrieloff>
Component:	qpid-cpp	Assignee:	Kim van der Riet <kim.vdriet>
Status:	CLOSED ERRATA	QA Contact:	Jan Sarenik <jsarenik>
Severity:	high	Docs Contact:
Priority:	high
Version:	1.1	CC:	aconway, freznice, iboverma, jsarenik, lans.carstensen, lbrindle, tao
Target Milestone:	1.2
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Messaging bug fix C: When a node in a cluster failed, and was then brought back up, it was attempting to sync with both the store, and the running cluster C: The node that attempting to rejoin the running cluster failed F: Only the first node started in a cluster will restore from the store. All subsequent nodes added to the cluster will discard the store data and will synchronize with the master node in the cluster. R: Rejoining a running cluster now operates as expected. When a node in a cluster failed, and was then brought back up, it was attempting to restore using information from both the store, and the running master node. This resulted in the node that was attempting to rejoin failing. This has been corrected, so that only the first node started in a cluster will restore from the store. All subsequent nodes added to the cluster will discard the store data and will synchronize with the master node in the cluster. Rejoining a running cluster now operates as expected.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2009-12-03 09:17:43 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	527551

Description Carl Trieloff 2009-02-03 18:04:13 UTC

Logically the following scenario can exist:

1. start a cluster, more than one node
2. publish durable messages (to durable queue) to one node in the cluster
3. confirm, they are on all nodes
4. kill one of the nodes
5. (optional)  publish some more messages
5. Rejoin the cluster with the failed node.... (this will fail.)

Reason, the joined node will be synced from the running cluster, but also try to recover from the store.

What needs to happen is:

a.) The first node is a cluster to start needs to recover the store
b.) All joining nodes need to sync data, as they do today but ignore any store they may have (the bug -- they don't ignore their store if they have one)

Comment 1 Carl Trieloff 2009-02-03 18:05:08 UTC

This can be worked around by identifying the node to start first, and removing the stores from the other nodes before restart.

Comment 2 Carl Trieloff 2009-02-03 18:19:02 UTC

in broker.cpp
    if (store.get() != 0) {
        RecoveryManagerImpl recoverer(queues, exchanges, links, dtxManager, 
                                      conf.stagingThreshold);
        store->recover(recoverer);
    }

needs to be not called for joining nodes.

Comment 3 Alan Conway 2009-02-04 17:05:16 UTC

In revision 740793

Cluster sets recovery flag on Broker for first member in cluster.
Disable recovery from local store if the recovery flag is not set.

Comment 4 Carl Trieloff 2009-02-04 17:29:02 UTC

Need store test case, tbd kim

Comment 5 Kim van der Riet 2009-02-04 17:42:35 UTC

Changing priority to high; set target milestone to 1.1.2.

Comment 6 Kim van der Riet 2009-05-08 13:52:33 UTC

*** Bug 486991 has been marked as a duplicate of this bug. ***

Comment 7 Kim van der Riet 2009-05-08 14:28:29 UTC

The error described in Bug 486991 (marked as a dup of this one) is the result of BDB errors when trying to set up mandatory broker exchanges when they have already been restored. This happens on all cluster nodes which are not the first in the cluster and are restored from the persistence store.

The work-around up until now has been to delete the store directory from all the nodes (or all the nodes except the first to be restarted) when there are messages to be recovered.

A fix now modifies the startup sequence of the store so that when a node is not the first in a cluster to restart and has been restored, the restored data is discarded and the store files are "pushed down" into a bak folder (in case the order of cluster recovery is incorrect, and the store from other nodes can be restored) then the node is restarted without recovery.

QA: This bug is easy to reproduce:
1. Start a multi-node cluster.
2. Shut down any node in the cluster.
3. Restart that node. The broker start will fail with "Exchange already exists:
amq.direct (MessageStoreImpl.cpp:488)" message.
4. If all nodes are shut down, then all nodes after the first will fail with this error.

Built-in store python test test_Cluster_04_SingleClusterRemoveRestoreNodes tests this scenario.

qpid r. 773004
store r. 3368

Comment 8 Jan Sarenik 2009-05-12 09:05:28 UTC

Reproduced on RHEL5.3 i386.

Related packages (mrg-devel repo):
 qpidd-cluster-0.5.752581-5.el5
 qpidd-0.5.752581-5.el5
 openais-0.80.3-22.el5_3.4

Waiting for new packages to verify.

Comment 10 Kim van der Riet 2009-10-05 18:35:19 UTC

Backported qpid r.773004 onto git mrg_1.1.x branch: http://git.et.redhat.com/git/qpid.git/?p=qpid.git;a=commitdiff;h=441c88204cb0135564669d7b004d62a1bc03828a

Comment 11 Jan Sarenik 2009-10-08 13:02:25 UTC

Verified on qpidd-0.5.752581-28.el5, both i386 and x86_64.

Comment 12 Kim van der Riet 2009-10-08 13:48:43 UTC

Included in store backport for 1.2.

Comment 13 Jan Sarenik 2009-10-09 08:31:06 UTC

I forgot to mention rhm-0.5.3206-14.el5

Comment 14 Irina Boverman 2009-10-28 17:35:05 UTC

Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Cluster joining nodes now recover correctly by preserving (instead of replicating) any stored data they already had prior to rejoining (483807)

Comment 15 Kim van der Riet 2009-10-29 19:31:15 UTC

Modified the release note to the following:

Only the first node started in a cluster will restore from the store. All subsequent nodes added to the cluster will discard the store data (the store files will be pushed down into a bak directory) and will instead synchronize with the master node in the cluster. (483807)

Comment 16 Kim van der Riet 2009-10-29 19:31:15 UTC

Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1 @@
-Cluster joining nodes now recover correctly by preserving (instead of replicating) any stored data they already had prior to rejoining (483807)+Only the first node started in a cluster will restore from the store. All subsequent nodes added to the cluster will discard the store data (the store files will be pushed down into a bak directory) and will instead synchronize with the master node in the cluster. (483807)

Comment 17 Gordon Sim 2009-11-19 19:59:27 UTC

*** Bug 539287 has been marked as a duplicate of this bug. ***

Comment 18 Lana Brindley 2009-11-23 06:50:01 UTC

Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,8 @@
-Only the first node started in a cluster will restore from the store. All subsequent nodes added to the cluster will discard the store data (the store files will be pushed down into a bak directory) and will instead synchronize with the master node in the cluster. (483807)+Messaging bug fix
+
+C: When a node in a cluster failed, and was then brought back up, it was attempting to sync with both the store, and the running cluster
+C: The node that attempting to rejoin the running cluster failed 
+F: Only the first node started in a cluster will restore from the store. All subsequent nodes added to the cluster will discard the store data and will synchronize with the master node in the cluster.
+R: Rejoining a running cluster now operates as expected.
+
+When a node in a cluster failed, and was then brought back up, it was attempting to restore using information from both the store, and the running master node. This resulted in the node that was attempting to rejoin failing. This has been corrected, so that only the first node started in a cluster will restore from the store. All subsequent nodes added to the cluster will discard the store data and will synchronize with the master node in the cluster. Rejoining a running cluster now operates as expected.

Comment 20 errata-xmlrpc 2009-12-03 09:17:43 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html