Red Hat Bugzilla – Bug 404611
GFS2: gfs2_fsck dupl. blocks between EA and data
Last modified: 2010-01-11 22:40:40 EST
Description of problem:
If you have a data block that's duplicated as an extended attribute
block, gfs2_fsck gets all confused and makes some bad decisions.
For example, if the data block is discovered first (and is correct)
and the EA duplicate is discovered second (and is wrong) the code
will destroy the file in favor of the bad EA block. If you run
fsck a second time, it will notice that the EA is bad and delete
In the EA handling code in pass1.c, if a duplicate is encountered,
fsck should see if the duplicated block is really part of an EA or not.
If it isn't, it can fix the problem there and leave the user's file
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Use gfs2_edit to change an EA for a file so that it points to a
data block of another file.
2. Run gfs2_fsck on the modified file system
The entire file is destroyed and the bad EA is left out there.
The bad EA should be destroyed and the file left intact, depending
on the duplicated block.
I've got a working prototype of pass1.c to fix this. However, there
are three different instances of this bug and only one is fixed.
I'd like to rearrange the code a bit to eliminate the redundant code.
The likelihood of hitting this is very low. I only encountered it
after designing a special test case for bug #325151.
Therefore, I recommend deferring this to 5.3. We have higher
priorities than this.
Created attachment 272991 [details]
File system metadata to recreate problem
You can recreate this problem by unbipz2ing and restoring this metadata
and then running gfs2_fsck.
Missed in comment that this is low probability. Moved to 5.3 and lowered priority.
Created attachment 273071 [details]
First prototype patch
This is only a third of the patch, but it seems to work.
There are three EA cases in pass1.c, and this patch only fixes
one of those cases. Before this can be shipped, the other two
cases need to be coded and tested.
devel & 5.3 ACKs please.
Created attachment 311604 [details]
Second prototype patch
This patch covers all three cases. However, it is completely untested
and I need to go over the logic again. So it's likely to change.
Created attachment 311988 [details]
Third prototype patch
I found all kinds of "exceptions to the rule" once I ran my prototype
through a set of saved metadata I had previously used to recreate the
problem. This is my "fscktest" metadata, also known as my "leukemia"
test for gfs2_fsck, where I purposely blasted blocks, duplicated blocks
and stuck EA pointers where they shouldn't be. My goal was to test
every combination of EA corruption, but I don't guarantee the test is
This patch successfully recovers all the kinds of corruption simulated
by the leukemia test and a second run won't find additional corruption.
I still need to do more tests. For one, I'm planning on testing it
with the (hopefully good) metadata left over from the benchp14 test of
a 2TB file system. That will take a while.
I also need to run it through a -n test to make sure it doesn't try
to write to the file system when it shouldn't.
I also fixed some cosmetic things: For example, there were a lot of
places where it would tell you that you had a bad Extended Attribute,
but it wouldn't tell you which inode it was associated with, etc.
I tried to make the messages more meaningful and consistent.
Created attachment 312174 [details]
fsck memory reduction patch
I've got a bug #445858 to fix gfs's fsck to reduce the amount of memory
it uses. Well, I've got the same problem with gfs2_fsck. The thing
is, I wanted to run my new gfs2_fsck (above patch) against a file system
populated by benchp14, but the problem is, the metadata is huge.
The file system is 2TB in size and the metadata save is 184GB. So when
I tried to run gfs2_fsck on it (to test the patch) I ran out of memory
big time. So I started investigating ways to reduce gfs2_fsck's memory
usage and came up with this patch.
Although the patch is huge and looks pervasive, it's really just a
couple of simple concepts. First, instead of allocating four bitmaps,
we only allocate one. For bad blocks, duplicate blocks and extended
attribute blocks, we just use a linked list of blocks. Since these
type of blocks should be rare, we don't need to allocate a huge amount
of memory for them. Second, for a couple of common data structures,
such as the phony buffer heads used in libgfs2/buf.c, I got rid of
a field to make the structure smaller. There's probably more fields
I can get rid of here as well. Eventually I want to get rid of the
whole thing and just let vfs do the buffering. Also, I reduced a link
count from 32-bits to 16 bits. Hopefully we won't have more than 16384
links between files, but I checked a (somewhat old) copy of e2fsprogs
and it also uses a 16-bit number to keep track of inode links, so I
figure that can't be too horrible.
Anyway, I'm letting the latest fsck code run on kool over the weekend
and I'll see what happens. My savings might not be enough.
But I thought I'd at least document what I had so far.
Created attachment 312404 [details]
I started rigorously testing gfs2_fsck and ran into more problems.
After I got it working properly for my ea-damage test (aka "leukemia"
test) I threw it against gfs2_fsck_hell and it uncovered several issues.
First, I found a small change I did for gfs_fsck that I needed to
crosswrite. Next, I discovered that the RG repair code had a problem:
Since the resource groups are journaled, it's not uncommon for RGs to
appear in journal blocks. But if the rindex file is damaged and we have
to repair the index via the hunt-and-peck method, those "false RGs"
confuse the code greatly. So I had to write some special code to first
search for journal blocks that look like RGs and handle that properly.
It turns out that the easiest way to do that (while taking into account
the fact that journals may not be contiguous on disk--e.g. if the user
created them through dd in the meta_fs) was to use some of the new code
I devised to save memory on block maps. So the memory patch has become
I tried to run this on a 2T file system populated by many hours of
testing with benchp14. The system became bogged down and was cpu-
bound. I ported the fix from bug #234627 and my cpu usage went down
greatly. In one trial run, an gfs2_fsck run time went down from 19
seconds to about 6 seconds with the fix. The gfs2_fsck on kool
went from being cpu bound to being IO bound, which is how it should be.
Still, even with less memory and faster code, it only made it to
7% into pass1 on an overnight run.
Before I ship this, I will likely split the patch into its various
component fixes and do separate patch commits for each. That way
it's not such a big mess. Plus, if there's a regression, I can back
out one of the patches more easily. So the patches will likely be:
1. The third prototype patch as given above
2. The memory saving patch
3. The 234627 patch for bitmap speed
4. The gfs_fsck crosswrite for block number sanity checking
5. The RG-inside-a-journal patch
I'll split them out tomorrow. I just wanted to document what I have so
far in case I had a drive failure or something.
The "third prototype" plus the "addendum patch" from comment #9 passes
the lengthy gfs2_fsck_hell test which deliberately damages RGs and the
rindex file in certain ways and makes gfs2_fsck fix it.
Here is the final list of patches that I just pushed to master:
a d5b9e65: Speed up userspace bitmap manipulation code.
b a0d10d8: gfs_fsck crosswrite for block number sanity checking
c 8f39570: Fix some bad references to gfs_tool and gfs_fsck
d 7654295: Deleted unused function print_map
e c903a12: Shrink memory 1: eliminate b_size from pseudo-buffer-heads
f 5008483: Shrink memory 2: get rid of 3 huge in-core bitmaps
g 7e52932: Shrink memory 3: smaller link counts in inode_info
h 9fb74de: Better error reporting in gfs2_fsck
i 768d7f6: RGRepair: Account for RG blocks inside journals
j c9dfb0f: gfs2_fsck dupl. blocks between EA and data
Created attachment 312571 [details]
This tarball contains the ten patches for the "master" branch.
They should hopefully apply directly to the RHEL5 branch because there
isn't much difference except for the copyright notices and a few minor
Patches applied to the RHEL5 branch, tested fully on system roth-01 and
pushed to the RHEL5 branch for inclusion into 5.3. Changing status to
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.