Bug 413021 - Journal recovery fails when flush did not occur and records don't end on sblk boundary
Journal recovery fails when flush did not occur and records don't end on sblk...
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
All Linux
urgent Severity urgent
: ---
: ---
Assigned To: Kim van der Riet
Kim van der Riet
Depends On:
  Show dependency treegraph
Reported: 2007-12-05 16:57 EST by Kim van der Riet
Modified: 2012-12-07 12:46 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Kim van der Riet 2007-12-05 16:57:12 EST
Under normal conditions, the journal flushes after inactivity or when closing
down. This fills the journal with filler records until an sblk boundary is
crossed and then writes to disk. However, if a crash or stoppage occurs such
that this write is interrupted and the last full record does not coincide with
an sblk boundary, then the journal cannot be recovered.

This can be fixed by filling the remaining space with filler records durning the
recover process, but requires that the journal files be written to during
recovery, something that is not presently allowed.
Comment 1 Kim van der Riet 2007-12-05 17:23:34 EST
Currently recovery does not use O_DIRECT (since performance is not an issue
during this process), and this relieves the sblk boundary restriction for reads
and writes. This should make it easier to add the required fill records in this
Comment 2 Kim van der Riet 2007-12-06 16:51:20 EST
The strategy to fix this is as follows:

Step 1:
Check the record tail of each record during the analysis phase. A bad tail
indicates either a corrupted record header or an incomplete record write. If a
bad tail is found in any file *other* than the last logical file, then this is a
fatal error (and should never happen). However, if this occurs in the last
logical file, then this indicates an incomplete write at the file overwrite
In this context, the first logical file is the last complete file to not be
overwritten (i.e. the oldest complete file) and thus the first to be read during
recovery, while the last logical file is the most recent file to be overwritten,
and possibly contains an overwrite boundary. It is the last file to be read
during recovery.

Step 2:
If the record that has been truncated starts on a dblock boundary that is not
also an sblock boundary, then filler records need to be written to the file
which will overwrite the truncated record. These will start at the record
header, each consuming one dblock, until the next sblock boundary is reached. At
this point, the journal is once again usable, as O_DIRECT reads and writes which
must be sblock aligned, can again take place without interleaving bad or
corrupted records.
Presently, the sblock size is 512 bytes (although this can be set to any
multiple of 512 bytes), and there are 4 dblocks per sblock - i.e. 128 bytes.
Comment 3 Kim van der Riet 2007-12-07 14:58:31 EST
RHM svn r1442
Cruisecontrol 64-bit build 337
Cruisecontrol 32-bit build 49

Note You need to log in before you can comment on or make changes to this bug.