Bug 518773 - Recovering from a total crash with clustering + persistence is problematic
Summary: Recovering from a total crash with clustering + persistence is problematic
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: Development
Hardware: All
OS: Linux
high
high
Target Milestone: 1.3
: ---
Assignee: Alan Conway
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-22 17:41 UTC by Jonathan Robie
Modified: 2011-06-29 19:13 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-29 19:13:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jonathan Robie 2009-08-22 17:41:27 UTC
If all nodes in a cluster crash, and persistence is enabled, the node with the most recent journal should be started first to ensure consistent state. But it is not easy for the user to determine which node this is.

This might be solved using a tool that would identify which node to start first, or perhaps a tool that would restart a failed cluster

In practice, this problem does not frequently arise, since every node in the cluster must fail before it does.

Comment 1 Alan Conway 2009-09-01 21:08:21 UTC
This is not just abot a total crash. If some nodes in a cluster crash and the remainder are shut-down cleanly, then any of the nodes with clean journals can be the first member in the cluster.

The right solution to this is to have it handled automatically by the cluster during start up, so the user can start nodes in any order but the brokers with dirty journals will wait for a broker with a clean journal to start.

A running broker would mark its journal dirty, so it will be marked dirty if the broker dies unexpectedly. The journal is marked clean in 2 cases
 - broker shuts down as part of orderly cluster shutdown.
 - broker becomes last-man-standing

In a partial crash + shut down, only the cleanly shut down nodes can be first-in-cluster. In a total crash where the cluster was reduced to one member who finally crashed, only the last member can be first-in-cluster. 

In a crash where more than one member died without either becoming a clear last-man-standing, manual intervention is required.

Comment 2 Alan Conway 2009-09-02 13:41:06 UTC
To avoid the manual intervention case we could write a cluster sequence ID to the journal headers. On restart, if all members have a dirty store, the journal(s) with the highest sequence IDs are eligible to be first-in-cluster.

This requires the members to know what "all members" means. This could be:
 - configure a list of members
 - configure an expected count of members
 - use a timeout (in case some members can't be started)
 - user runs config tool to tell cluster when all members are present.

Comment 3 Alan Conway 2009-09-02 13:42:19 UTC
On a clean startup, cluster members with clean journals should check that they all have the same journal sequence number for consistency and refuse to start if not.

Comment 5 Alan Conway 2009-11-25 21:34:19 UTC
Addressed in commits up to 883999. See user description at 
http://cwiki.apache.org/qpid/persistent-cluster-restart-design-note.html

Remaining piece is manual recovery from a complete cluster failure: Bug 541426


Note You need to log in before you can comment on or make changes to this bug.