Red Hat Bugzilla – Bug 229484
gfs_fsck not good at fixing corrupt directory entries
Last modified: 2010-01-11 22:32:38 EST
Description of problem: This is a follow-up to bug #229220. If gfs_fsck encounters damaged directory entries in a leaf page, it doesn't do very good at recovering the directory. In fact, in cases I've seen, it deletes the whole directory and you're left with zero files. Version-Release number of selected component (if applicable): RHEL5 Beta 2 How reproducible: Always Steps to Reproduce: 1. gfs_mkfs -t bob_cluster:bobs_gfs -p lock_dlm -j 3 /dev/bob_vg/lvol0 2. mount -tgfs dev/bob_vg/lvol0 /mnt/gfs1 3. mkdir /mnt/gfs1/bob 3. for i in `seq 1 500` ; do touch /mnt/gfs1/bob/file$i ; done 4. umount /mnt/gfs1 5. Use a tool like gfs_edit to change the length of a directory entry to something illegal, like 0x0010. 6. gfs_fsck -y /dev/bob_vg/lvol0 Actual results: A whole lot of problems will be reported and in the end, all files in directory /mnt/gfs1/bob will be lost. Expected results: Bad directory entries should be repaired or deleted. Bad leaf pages shouldn't cause the other leaf pages to be deleted, so at least some of the files should still be found, even if an entire leaf page is blasted with zeroes. Additional info: I'm nearly done with a patch to fix the problems, but it needs work. For now, I'm going to focus my efforts on this in RHEL5. If it seems prudent, we can backport the fix to RHEL4, but it's too late for 4.5. This problem also exists in gfs2_fsck, but that will be a separate bug.
Created attachment 148958 [details] First go at a fix This is a patch to add a limited ability for gfs_fsck to recover directories that are corrupt, whereas before they would just get destroyed at the least bit of corruption. The main features of the patch are: 1. For directory entries found to be corrupt, (i.e. invalid entry length or name length) it tries to figure out the correct length(s) and fix them. I saw this at the customer site in Pune when I was working on 229220. 2. For directory entries found to be pointing to trash (i.e. a leaf pointer that really points to an inode) it gets rid of the bad leaf entries by extending the previous "good" leaf entry to fill where the bad entries were. I also saw this in Pune. 3. The leaf block validation code was moved from pass1.c to metawalk.c. Metawalk.c was doing its own crude form of this validation, but it wasn't good enough in all cases, and the calls made during pass 1 were redundant (did twice the work.) When I say it wasn't good enough, what I mean is that I could get gfs_fsck to segfault in later passes by introducing the right kind of corruption. We should study the performance impact of this, however, using a very big (multi-TB) file system. 4. I removed the check_leaf function from pass2.c. As far as I could tell, this wasn't performing any valid function. It wasn't repairing anything or even doing good checks. It was just burning time. I need to go back and review the code more closely though, just to make sure. I tested this patch on trin-10 by creating a new file system, using gfs2_edit to patch in corruption, and then let gfs_fsck pick up the pieces. I tested two kinds of corruption: (1) Changing the directory entry length field to 0x10, and (2) Changing leaf pointers to point to inodes or zeroes. I also tested with a couple of different file name lengths: short (5 - 7 chars) and somewhat long (67 - 69 chars). In my testing, I ran gfs_fsck twice to make sure it came up clean on the second run.
Having reviewed the code again, I stand by my conclusion in comment #1, point 4: the check_leaf function in pass2 was checking some things, but not doing anything about them. At best, this was just doing some validations without consequence. That means if you got beyond it, your file system was ensured to have a little bit more integrity. If it didn't get beyond it, the code would likely crash and burn (segfault) due to corruption, but it didn't try to fix anything. At worst, this was just burning time. I stand by my decision to remove it.
I was trying to test this patch when I ran into a problem very much like: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=213289 I added an update to that bz. I'm using an upstream kernel that apparently already has the fix. This reminds me of: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=214595 where we had dlm messages writing the wrong length. I verified my upstream kernel has the fix for that as well. I think I'll add some debug code to try to catch this like I did with 214595.
I finally got QE test case gfs_fsck_stress to run with this patched version of gfs_fsck. It was a long battle. My biggest problem was that, with an upstream kernel running, a simple mount of a gfs file system would not automatically load the gfs driver, even though I could manually modprobe it. I never figured out why. I spoke with Steve Whitehouse about it, but we were left scratching our heads. I didn't spend an excessive amount of time tracking down why because I had high priorities. So I went back to using the default RHEL5 kernel and then it was able to mount gfs without pre-loading the module. Next, I discovered that my xml file used by the test case was specifying the wrong qlogic driver. That caused the test to fail. Fixed that. Next, it wasn't starting cman after the test killed off nodes (like it does for revolver). That turned out to be because somehow I had two copies of cman_tool, one in /sbin and one in /usr/sbin. Not sure how that happened. I need to look at the Makefiles for cman_tool and talk to Mr. Feist I guess. The cman service init script specified the one that worked. The test case didn't specify the full path, so it took the wrong one. I finally got these issues straightened out and the test passed on my trin-09,10,11 cluster.
Incidentally, to run this test case: Log into system "try", then: cd /local/bob/sts-rhel5/sts-root gfs/bin/gfs_fsck_stress -l $PWD -r /usr/tests/sts/ -f \ /local/bob/sts-rhel5/sts-root/var/share/resource_files/trin.xml
Fixing product name. Cluster Suite components were integrated into Enterprise Linux for verion 5.0.
Created attachment 154072 [details] patch to fix the problem [try 2] This version fixes another subtle form of corruption. It has also been tested against a file system with 16GB worth of mp3's that have varying directory conditions and file names.
Fix tested on trin-10 by manually damaging directory entries and letting gfs_fsck pick up the pieces. Fix was committed to HEAD and RHEL5 branches of CVS. Changing status to Modified. Also, I opened a bug record for the gfs2 crosswrite: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=239023
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0576.html