Description of problem:
This is a follow-up to bug #229220.
If gfs_fsck encounters damaged directory entries in a leaf page,
it doesn't do very good at recovering the directory. In fact,
in cases I've seen, it deletes the whole directory and you're left
with zero files.
Version-Release number of selected component (if applicable):
RHEL5 Beta 2
Steps to Reproduce:
1. gfs_mkfs -t bob_cluster:bobs_gfs -p lock_dlm -j 3 /dev/bob_vg/lvol0
2. mount -tgfs dev/bob_vg/lvol0 /mnt/gfs1
3. mkdir /mnt/gfs1/bob
3. for i in `seq 1 500` ; do touch /mnt/gfs1/bob/file$i ; done
4. umount /mnt/gfs1
5. Use a tool like gfs_edit to change the length of a directory
entry to something illegal, like 0x0010.
6. gfs_fsck -y /dev/bob_vg/lvol0
A whole lot of problems will be reported and in the end,
all files in directory /mnt/gfs1/bob will be lost.
Bad directory entries should be repaired or deleted.
Bad leaf pages shouldn't cause the other leaf pages to be deleted,
so at least some of the files should still be found, even if an
entire leaf page is blasted with zeroes.
I'm nearly done with a patch to fix the problems, but it needs work.
For now, I'm going to focus my efforts on this in RHEL5.
If it seems prudent, we can backport the fix to RHEL4, but it's too
late for 4.5. This problem also exists in gfs2_fsck, but that will
be a separate bug.
Created attachment 148958 [details]
First go at a fix
This is a patch to add a limited ability for gfs_fsck to recover
directories that are corrupt, whereas before they would just get
destroyed at the least bit of corruption.
The main features of the patch are:
1. For directory entries found to be corrupt, (i.e. invalid entry
length or name length) it tries to figure out the correct
length(s) and fix them. I saw this at the customer site in Pune
when I was working on 229220.
2. For directory entries found to be pointing to trash (i.e. a leaf
pointer that really points to an inode) it gets rid of the bad
leaf entries by extending the previous "good" leaf entry to fill
where the bad entries were. I also saw this in Pune.
3. The leaf block validation code was moved from pass1.c to metawalk.c.
Metawalk.c was doing its own crude form of this validation, but it
wasn't good enough in all cases, and the calls made during pass 1
were redundant (did twice the work.) When I say it wasn't good
enough, what I mean is that I could get gfs_fsck to segfault in
later passes by introducing the right kind of corruption. We
should study the performance impact of this, however, using a very
big (multi-TB) file system.
4. I removed the check_leaf function from pass2.c. As far as I could
tell, this wasn't performing any valid function. It wasn't
repairing anything or even doing good checks. It was just burning
time. I need to go back and review the code more closely though,
just to make sure.
I tested this patch on trin-10 by creating a new file system, using
gfs2_edit to patch in corruption, and then let gfs_fsck pick up the
pieces. I tested two kinds of corruption: (1) Changing the directory
entry length field to 0x10, and (2) Changing leaf pointers to point
to inodes or zeroes. I also tested with a couple of different file
name lengths: short (5 - 7 chars) and somewhat long (67 - 69 chars).
In my testing, I ran gfs_fsck twice to make sure it came up clean on
the second run.
Having reviewed the code again, I stand by my conclusion in
comment #1, point 4: the check_leaf function in pass2 was checking
some things, but not doing anything about them. At best, this was
just doing some validations without consequence. That means if you
got beyond it, your file system was ensured to have a little bit more
integrity. If it didn't get beyond it, the code would likely crash
and burn (segfault) due to corruption, but it didn't try to fix
anything. At worst, this was just burning time. I stand by my
decision to remove it.
I was trying to test this patch when I ran into a problem very much like:
I added an update to that bz. I'm using an upstream kernel that apparently
already has the fix. This reminds me of:
where we had dlm messages writing the wrong length. I verified my
upstream kernel has the fix for that as well. I think I'll add some
debug code to try to catch this like I did with 214595.
I finally got QE test case gfs_fsck_stress to run with this patched
version of gfs_fsck. It was a long battle. My biggest problem was
that, with an upstream kernel running, a simple mount of a gfs file
system would not automatically load the gfs driver, even though I
could manually modprobe it. I never figured out why. I spoke with
Steve Whitehouse about it, but we were left scratching our heads.
I didn't spend an excessive amount of time tracking down why because
I had high priorities. So I went back to using the default RHEL5
kernel and then it was able to mount gfs without pre-loading the module.
Next, I discovered that my xml file used by the test case was specifying
the wrong qlogic driver. That caused the test to fail. Fixed that.
Next, it wasn't starting cman after the test killed off nodes (like it
does for revolver). That turned out to be because somehow I had two
copies of cman_tool, one in /sbin and one in /usr/sbin. Not sure how
that happened. I need to look at the Makefiles for cman_tool and talk
to Mr. Feist I guess. The cman service init script specified the one
that worked. The test case didn't specify the full path, so it took
the wrong one.
I finally got these issues straightened out and the test passed on
my trin-09,10,11 cluster.
Incidentally, to run this test case:
Log into system "try", then:
gfs/bin/gfs_fsck_stress -l $PWD -r /usr/tests/sts/ -f \
Fixing product name. Cluster Suite components were integrated into Enterprise
Linux for verion 5.0.
Created attachment 154072 [details]
patch to fix the problem [try 2]
This version fixes another subtle form of corruption. It has also been
tested against a file system with 16GB worth of mp3's that have varying
directory conditions and file names.
Fix tested on trin-10 by manually damaging directory entries and
letting gfs_fsck pick up the pieces. Fix was committed to HEAD and
RHEL5 branches of CVS. Changing status to Modified.
Also, I opened a bug record for the gfs2 crosswrite:
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.