Bug 229484

Summary: gfs_fsck not good at fixing corrupt directory entries
Product: Red Hat Enterprise Linux 5 Reporter: Robert Peterson <rpeterso>
Component: gfs-utilsAssignee: Robert Peterson <rpeterso>
Status: CLOSED ERRATA QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0576 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-07 17:57:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 239023    
Attachments:
Description Flags
First go at a fix
none
patch to fix the problem [try 2] none

Description Robert Peterson 2007-02-21 14:32:02 UTC
Description of problem:
This is a follow-up to bug #229220.
If gfs_fsck encounters damaged directory entries in a leaf page,
it doesn't do very good at recovering the directory.  In fact,
in cases I've seen, it deletes the whole directory and you're left
with zero files.

Version-Release number of selected component (if applicable):
RHEL5 Beta 2

How reproducible:
Always

Steps to Reproduce:
1. gfs_mkfs -t bob_cluster:bobs_gfs -p lock_dlm -j 3 /dev/bob_vg/lvol0
2. mount -tgfs dev/bob_vg/lvol0 /mnt/gfs1
3. mkdir /mnt/gfs1/bob
3. for i in `seq 1 500` ; do touch /mnt/gfs1/bob/file$i ; done
4. umount /mnt/gfs1
5. Use a tool like gfs_edit to change the length of a directory
   entry to something illegal, like 0x0010.
6. gfs_fsck -y /dev/bob_vg/lvol0
  
Actual results:
A whole lot of problems will be reported and in the end,
all files in directory /mnt/gfs1/bob will be lost.

Expected results:
Bad directory entries should be repaired or deleted.
Bad leaf pages shouldn't cause the other leaf pages to be deleted,
so at least some of the files should still be found, even if an
entire leaf page is blasted with zeroes.

Additional info:
I'm nearly done with a patch to fix the problems, but it needs work.
For now, I'm going to focus my efforts on this in RHEL5.
If it seems prudent, we can backport the fix to RHEL4, but it's too
late for 4.5.  This problem also exists in gfs2_fsck, but that will
be a separate bug.

Comment 1 Robert Peterson 2007-02-28 20:26:02 UTC
Created attachment 148958 [details]
First go at a fix

This is a patch to add a limited ability for gfs_fsck to recover
directories that are corrupt, whereas before they would just get
destroyed at the least bit of corruption.

The main features of the patch are:

1. For directory entries found to be corrupt, (i.e. invalid entry 
   length or name length) it tries to figure out the correct 
   length(s) and fix them.  I saw this at the customer site in Pune
   when I was working on 229220.
2. For directory entries found to be pointing to trash (i.e. a leaf
   pointer that really points to an inode) it gets rid of the bad
   leaf entries by extending the previous "good" leaf entry to fill
   where the bad entries were.	I also saw this in Pune.
3. The leaf block validation code was moved from pass1.c to metawalk.c.
   Metawalk.c was doing its own crude form of this validation, but it
   wasn't good enough in all cases, and the calls made during pass 1
   were redundant (did twice the work.)  When I say it wasn't good 
   enough, what I mean is that I could get gfs_fsck to segfault in 
   later passes by introducing the right kind of corruption.  We 
   should study the performance impact of this, however, using a very
   big (multi-TB) file system.
4. I removed the check_leaf function from pass2.c.  As far as I could
   tell, this wasn't performing any valid function.  It wasn't 
   repairing anything or even doing good checks.  It was just burning
   time.  I need to go back and review the code more closely though,
   just to make sure.

I tested this patch on trin-10 by creating a new file system, using
gfs2_edit to patch in corruption, and then let gfs_fsck pick up the
pieces.  I tested two kinds of corruption: (1) Changing the directory 
entry length field to 0x10, and (2) Changing leaf pointers to point
to inodes or zeroes.  I also tested with a couple of different file
name lengths: short (5 - 7 chars) and somewhat long (67 - 69 chars).
In my testing, I ran gfs_fsck twice to make sure it came up clean on
the second run.

Comment 2 Robert Peterson 2007-03-07 18:19:56 UTC
Having reviewed the code again, I stand by my conclusion in 
comment #1, point 4: the check_leaf function in pass2 was checking
some things, but not doing anything about them.  At best, this was
just doing some validations without consequence.  That means if you
got beyond it, your file system was ensured to have a little bit more
integrity.  If it didn't get beyond it, the code would likely crash
and burn (segfault) due to corruption, but it didn't try to fix 
anything.  At worst, this was just burning time.  I stand by my 
decision to remove it.


Comment 3 Robert Peterson 2007-03-12 15:09:02 UTC
I was trying to test this patch when I ran into a problem very much like:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=213289
I added an update to that bz.  I'm using an upstream kernel that apparently
already has the fix.  This reminds me of:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=214595
where we had dlm messages writing the wrong length.  I verified my
upstream kernel has the fix for that as well.  I think I'll add some
debug code to try to catch this like I did with 214595.


Comment 4 Robert Peterson 2007-03-14 21:53:50 UTC
I finally got QE test case gfs_fsck_stress to run with this patched
version of gfs_fsck.  It was a long battle.  My biggest problem was
that, with an upstream kernel running, a simple mount of a gfs file
system would not automatically load the gfs driver, even though I
could manually modprobe it.  I never figured out why.  I spoke with
Steve Whitehouse about it, but we were left scratching our heads.
I didn't spend an excessive amount of time tracking down why because
I had high priorities.  So I went back to using the default RHEL5 
kernel and then it was able to mount gfs without pre-loading the module.  

Next, I discovered that my xml file used by the test case was specifying
the wrong qlogic driver.  That caused the test to fail.  Fixed that.

Next, it wasn't starting cman after the test killed off nodes (like it
does for revolver).  That turned out to be because somehow I had two 
copies of cman_tool, one in /sbin and one in /usr/sbin.  Not sure how 
that happened.  I need to look at the Makefiles for cman_tool and talk
to Mr. Feist I guess.  The cman service init script specified the one 
that worked.  The test case didn't specify the full path, so it took 
the wrong one.

I finally got these issues straightened out and the test passed on
my trin-09,10,11 cluster.


Comment 5 Robert Peterson 2007-03-14 21:54:57 UTC
Incidentally, to run this test case:

Log into system "try", then:

cd /local/bob/sts-rhel5/sts-root
gfs/bin/gfs_fsck_stress -l $PWD -r /usr/tests/sts/ -f \
/local/bob/sts-rhel5/sts-root/var/share/resource_files/trin.xml


Comment 6 Kiersten (Kerri) Anderson 2007-04-23 17:38:29 UTC
Fixing product name. Cluster Suite components were integrated into Enterprise
Linux for verion 5.0.

Comment 7 Robert Peterson 2007-05-03 20:45:46 UTC
Created attachment 154072 [details]
patch to fix the problem [try 2]

This version fixes another subtle form of corruption.  It has also been
tested against a file system with 16GB worth of mp3's that have varying
directory conditions and file names.

Comment 8 Robert Peterson 2007-05-04 14:14:36 UTC
Fix tested on trin-10 by manually damaging directory entries and 
letting gfs_fsck pick up the pieces.  Fix was committed to HEAD and
RHEL5 branches of CVS.  Changing status to Modified.
Also, I opened a bug record for the gfs2 crosswrite:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=239023


Comment 11 errata-xmlrpc 2007-11-07 17:57:52 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0576.html