There are still some useful things that we can make use of though:
1. We know where the first RG will be and then the rest will follow on sequentially
2. Given that the disk hasn't been expanded, we can test to see if the RGs are
where we expect them to be.
3. At the worst case we can scan the disk looking for them based upon our
knowledge of where they ought to be, the fact that they should exactly cover
the available space and that we'd expect the bitmaps to bear at least some
relation to reality (e.g. blocks marked as inode should actually look like
inodes). We'd have to take into account that a filesystem with a load of spare
space at the end of it may be correct.
It is probably possible to make some assumptions about the rindex too, such as
it being contained within the first RG, or at least it should start there.
Maybe its worth thinking about modifying mkfs in order to put more of the
special files into known places to aid this process. Obviously we still have to
cope with older filesystems where they are not in known places, but it could
help future developments and be trivially forward and backwards compatible.
I've been working on a solution to this, based on my gfs v1 solution.
Many things are easier to implement in gfs2, thanks to libgfs2.
However, life becomes difficult if the file system has been
extended through gfs_grow several times.
For example, suppose you create a 100GB GFS file system, then grow
it by 200GB, then grow it by 100GB. The original rg size picked by
mkfs might be 2MB. The first grow might use 4MB, the third grow 2M.
When fsck gets to the end of the first chunk and encounters the
second chunk, how is it supposed to know that (block + 2MB) isn't
an RG because the RG size switched to 4MB in the middle. How does it
know that there isn't supposed to be an RG at 2MB that got blasted.
We could force gfs2_grow to always use the same RG size as mkfs, but
that doesn't help the people who created their fs with gfs v1 tools
(gfs_mkfs then gfs_grow) followed by gfs2_convert.
Perhaps I can extrapolate the rg size from the size of the bitmap
If we can change the ondisk format, we could improve fsck greatly
and make the fs more impervious to harm. I'd like to see a few changes
1. Add the initial file system size to the superblock.
2. Add the initial rg size to the superblock.
3. Add chunk size and rg size for each gfs2_grow that's been done.
4. Make a copy of the superblock somewhere we can find it if the original
superblock gets blasted.
We don't currently have a gfs2_grow function, which is the biggest
money wrench in gfs2_fsck's rg repair concerns. The grow function is
to be added with this bugzilla:
Since I'm designing and writing the function, I decided it would be
best to design the new grow function with fsck in mind. Namely, I can
make fsck's life a whole lot easier if I can guarantee the size of the
RGs will always stay the same. I can do this by making gfs2_grow read
in the current rindex file and calculate the difference between the
ri_addr values. That means that the mkfs.gfs2 decides on the RG size,
and further gfs2_grow functions will always use that same size.
Some people might not like that because we might not fill up a whole
file system if it's not an even multiple of the RG size. But I think
it's well worth it if gfs2_fsck can use that assumption to predict the
RG locations. There's still a monkey wrench, though, because file
systems may have been grown with gfs_grow under gfs1 and converted to
gfs2 by gfs2_convert. I can't do anything about that I'm afraid.
Created attachment 152593 [details]
First go at a fix
Here's my initial prototype. I've done minimal testing; it needs more.
I definitely haven't tested all the paths through the code yet.
For example, I haven't tested the repairing of a damaged rindex at all,
only damaged RGs before and after gfs2_grow. Also, I haven't seen how it
reacts when repair is impossible.
I need to resurrect the "gfs2_fsck_hell" test case to exercise the fixing
of all the important types of damage.
I may have made changes to libgfs2 to support this capability, but
they're not included with this patch. The majority of libgfs2 changes
I have pending are for the new gfs2_grow function for bz 234844, so it
was hard to separate them out. I think the libgfs2 changes for gfs2_fsck
were minimal anyway.
I've resurrected and improved the gfs2_fsck_hell test case for this code.
Unfortunately, the test does not pass with the latest code. It uncovered
an additional complication: When a bitmap is found to be corrupt, it has
to be reinitialized (i.e. set to zeroes). If the bitmap recovered in this
way happens to be the first one, all the system inodes (master directory,
root directory, inum, per_inode, rindex, jindex, statfs, quota, etc.) are
all marked as free blocks. That makes them look like they're deleted
but still referenced.
Even though the inode information in the superblock is still good,
fsck wants to recover the "deleted" system files and toss them all
into lost+found. Since the system journals are inside the file system,
it applies to them too. Then they're recreated, which is not good
(although in some cases, it's not too tragic). So now I have to add
code to check the integrity of these system inodes, much like it
already does for the root directory in pass2. As a matter of fact,
the new code I'm working on replaces that code with a generic version
I can use for all the system inodes.
I hope to have a better prototype tomorrow or at least by the end of
Created attachment 153017 [details]
Here's the untested gfs2_quota patch to use the libgfs2 functions
that were ported from gfs2_jadd.
Oops; please disregard the previous comment and attachment.
I accidentally attached that patch to the wrong bugzilla.
It belongs with 234844, not this one.
Created attachment 153868 [details]
The hopefully "Golden" patch for HEAD
This version has been tested against the gfs2_fsck_hellfire test
on system trin-10. This is the one I'm committing to CVS.
Created attachment 153876 [details]
The hopefully "Golden" patch for RHEL5
The code is the same as HEAD at this point, so either patch would
probably apply. I'm going to commit this to CVS.
Tested on system trin-10 in a variety of damaged RG conditions.
Code committed to CVS at HEAD and RHEL5 (for RHEL5.1).
Changing status to modified.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.