Bug 514054 - [store] Journal can fill under some conditions, and recover from full condition not possible
Summary: [store] Journal can fill under some conditions, and recover from full conditi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.2
Hardware: All
OS: Linux
urgent
high
Target Milestone: 1.3
: ---
Assignee: Kim van der Riet
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-07-27 19:07 UTC by Kim van der Riet
Modified: 2015-11-16 01:11 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Because a write operation is required when processing a message queue, reaching the maximum storage capacity rendered any recovery impossible. To target this issue, this update introduces a "resize" utility that allows storage to be resized so that the messages can be recovered and delivered as expected. Note that the broker must be stopped in order to run this tool.
Clone Of:
Environment:
Last Closed: 2010-10-14 16:00:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0773 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 15:56:44 UTC

Description Kim van der Riet 2009-07-27 19:07:05 UTC
Under some conditions, it is possible for a store to fill up without triggering an enqueue capacity full condition. A full store journal is a fatal condition, and results in the store closing. While data is not lost when this occurs and is still in the store journal files, it is currently not possible to restart the broker and consume the messages through a broker. Currently recovery of a full journal results in a condition identical to that which caused the shutdown in the first place - a full journal which cannot be written without the risk of data loss.

There are two issues to be considered:

1. Whether the store can be prevented from ever filling up (ie a hard guarantee) through limits and restrictions; and/or

2. If such a condition should occur, the store should be able to remedy the situation during recovery and allow the messages to be consumed.

The addition of auto-expand to the store will minimize, but not preclude the occurrence of this condition, primarily because it can be defeated (ie turned off, being an option), and because the store will place limits on how much a store can ultimately expand by this means.

Comment 2 Kim van der Riet 2009-12-14 19:56:51 UTC
A python tool called resize was written to analyze and resize the journal. The original journal is pushed down into a backup dir, and a new journal created. The remaining records in the old journal are transferred to the new journal.

Note that this procedure cannot be carried out on a running broker, but on the store of a stopped broker. The broker is then restarted on the new store. This strategy, while not ideal, does provide a path to data recovery in a journal which has stopped because of becoming full.

There are other broker-size strategies which could also be developed, such as tracking the amount of space needed to fully dequeue all existing records, and using this value as a dynamic threshold for enqueue threshold exceptions. These will be left for a later time and version, however.

Last update to resize tool: r.3735

Comment 4 Frantisek Reznicek 2010-10-01 08:21:50 UTC
The resize tool is included and functional, tested on RHEL 4.8 / 5.5, i386 / x86_64 on packages:
python-qmf-0.7.946106-13.el5
python-qpid-0.7.946106-14.el5
qmf-0.7.946106-17.el5
qmf-devel-0.7.946106-17.el5
qpid-cpp-*-0.7.946106-17.el5
qpid-dotnet-0.4.738274-2.el5
qpid-java-*-0.7.946106-10.el5
qpid-tools-0.7.946106-11.el5
ruby-qmf-0.7.946106-17.el5
ruby-qpid-0.7.946106-2.el5

-> VERIFIED

Comment 5 Kim van der Riet 2010-10-05 15:25:41 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: Under some limited conditions in which the store file size is too small, the store can be filled such that no recovery is possible.

Consequence: While messages are not lost per se, the store cannot deliver the messages because dequeueing them requires a write operation, and this is not possible when the store is full.

Fix: A tool which allows the store to be resized off-line (ie when the broker is not running) has been written.

Result: The store can now be increased in size, which in turn allows messages on the store to be dequeued after recovery.

Comment 6 Jaromir Hradilek 2010-10-05 22:31:54 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,7 +1 @@
-Cause: Under some limited conditions in which the store file size is too small, the store can be filled such that no recovery is possible.
+Since a write operation is required when processing a message queue, reaching the maximum storage capacity rendered any recovery impossible. To target this issue, this update introduces a utility that allows the storage to be resized, so that the messages can be recovered and delivered as expected. Note that the broker must be stopped in order to run this tool.-
-Consequence: While messages are not lost per se, the store cannot deliver the messages because dequeueing them requires a write operation, and this is not possible when the store is full.
-
-Fix: A tool which allows the store to be resized off-line (ie when the broker is not running) has been written.
-
-Result: The store can now be increased in size, which in turn allows messages on the store to be dequeued after recovery.

Comment 7 Douglas Silas 2010-10-11 14:03:18 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Since a write operation is required when processing a message queue, reaching the maximum storage capacity rendered any recovery impossible. To target this issue, this update introduces a utility that allows the storage to be resized, so that the messages can be recovered and delivered as expected. Note that the broker must be stopped in order to run this tool.+Because a write operation is required when processing a message queue, reaching the maximum storage capacity rendered any recovery impossible. To target this issue, this update introduces a "resize" utility that allows storage to be resized so that the messages can be recovered and delivered as expected. Note that the broker must be stopped in order to run this tool.

Comment 9 errata-xmlrpc 2010-10-14 16:00:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html


Note You need to log in before you can comment on or make changes to this bug.