Bug 624689

Summary: fsck.gfs2 deletes directories if they get too big
Product: Red Hat Enterprise Linux 5 Reporter: Robert Peterson <rpeterso>
Component: gfs2-utilsAssignee: Robert Peterson <rpeterso>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: high    
Version: 5.6CC: adas, bmarzins, djansa, edamato, scooter, ssaha, swhiteho, theophanis_kontogiannis
Target Milestone: rcKeywords: Regression
Target Release: 5.6   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: gfs2-utils-0.1.62-26.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 624691 (view as bug list) Environment:
Last Closed: 2011-01-13 23:21:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 575968, 620384    
Bug Blocks: 622576, 624691    
Attachments:
Description Flags
RHEL56 patch none

Description Robert Peterson 2010-08-17 13:56:28 UTC
+++ This bug was initially created as a clone of Bug #620384 +++

Yesterday I did more rigorous testing for bug #620384 and
discovered this bug:  Basically, the latest and greatest
fsck.gfs2 doesn't like when directories get really big (i.e.
lots of entries).  This happens easier with a small block size
like the 512B blocks.

For almost all normal directories, the metadata structure looks
like this:

height  structure
------  -------------------------------------------------
0.      dinode
1.      journaled data block (hash table block pointers)
2.      directory leaf blocks

When directories get really big their metadata structure gets
more complex and ends up looking like this:

height  structure
------  -------------------------------------------------
0.      dinode
1.      indirect block (block pointers to block pointers)
2.      journaled data block (hash table block pointers)
3.      directory leaf blocks

If there are enough directory entries, the structure can
reach more heights, with level 2 being another level of
indirect blocks:

height  structure
------  -------------------------------------------------
0.      dinode
1.      indirect block (block pointers to block pointers)
2.      indirect block (block pointers to block pointers)
3.      journaled data block (hash table block pointers)
4.      directory leaf blocks

Right now, fsck.gfs2 can only handle directories of the first
form.  Large directories with four different metadata types
are flagged as errors and data is destroyed.  This is very
serious and needs to get fixed ASAP.  I've written a patch for
this issue and I'm testing it now.  So far the patch has passed
a simple unit test using a four-level directory.

Comment 1 Robert Peterson 2010-08-17 15:27:29 UTC
I did some testing and discovered this bug does not exist in
gfs1's fsck, gfs_fsck.  So gfs_fsck does not have this problem.
I also tested gfs2-utils-0.1.60-1.el5 and it does not have this
problem, so it is, in fact, a regression.

Here's how to recreate the failure and what it looks like:

[root@kool ~]# mkfs.gfs2 -O -b512 -p lock_nolock -t "kool:bob" -j1
/dev/kool_vg/kool_bob 
Device:                    /dev/kool_vg/kool_bob
Blocksize:                 512
Device Size                40.00 GB (83886080 blocks)
Filesystem Size:           40.00 GB (83886078 blocks)
Journals:                  1
Resource Groups:           160
Locking Protocol:          "lock_nolock"
Lock Table:                "kool:bob"
UUID:                      05060249-E9BB-1DCF-C9C8-112EF09BD56C

You have new mail in /var/spool/mail/root
[root@kool ~]# sync
[root@kool ~]# mount -tgfs2 /dev/kool_vg/kool_bob  /mnt/bob
[root@kool ~]# mkdir /mnt/bob/bob
[root@kool ~]# for i in `seq 1 10000` ; do touch /mnt/bob/bob/file_name_$i ;
done
[root@kool ~]# !umo
umount /mnt/bob
[root@kool ~]# /sbin/fsck.gfs2 -V
GFS2 fsck DEVEL.1274286054 (built May 19 2010 11:22:48)
Copyright (C) Red Hat, Inc.  2004-2006  All rights reserved.
[root@kool ~]# fsck.gfs2 /dev/kool_vg/kool_bob 
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Block 287425 (0x462c1) seems to be free space, but is marked as data in the
bitmap.
Okay to fix the bitmap? (y/n)y
The bitmap was fixed.
Block 287426 (0x462c2) seems to be free space, but is marked as data in the
bitmap.
Okay to fix the bitmap? (y/n)y
The bitmap was fixed.
Error: inode 269038 (0x41aee) has unrecoverable errors; invalidating.
Block 269038 (0x41aee) seems to be free space, but is marked as inode in the
bitmap.
Okay to fix the bitmap? (y/n)y
The bitmap was fixed.
Pass1 complete      
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Directory entry 'bob' referencing inode 269038 (0x41aee) in dir inode 398
(0x18e) block type 0: was deleted or is not an inode.
Clear directory entry to non-inode block? (y/n) 

Obviously, the fsck of the file system should come up clean.

Comment 2 Robert Peterson 2010-08-17 16:57:54 UTC
Created attachment 439165 [details]
RHEL56 patch

Here is the RHEL5.6 patch I'm testing for this problem.
This one is separated from the other patch, so final form.
I'll crosswrite this to RHEL6.0 and attach that shortly.

Comment 3 Robert Peterson 2010-08-17 17:17:35 UTC
Patch tested on system kool and found to fix the problem.

Comment 4 Robert Peterson 2010-08-18 14:22:35 UTC
I pushed the patch to the RHEL56 branch of the cluster git tree
for inclusion into 5.6.  Changing status to POST until we get
this built.

Comment 5 Robert Peterson 2010-09-20 14:50:29 UTC
Build 2770902 successful.  Changing status to Modified.
This fix is in gfs2-utils-0.1.62-26.el5.

Comment 7 Nate Straz 2010-11-11 22:16:51 UTC
Verified that fsck.gfs2 does not remove entries if di_height = 3.

gfs2-utils-0.1.62-28.el5

Comment 9 errata-xmlrpc 2011-01-13 23:21:11 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0135.html