Bug 493727 - GFS: gfs_fsck can delete everything in a corrupt file system
Summary: GFS: gfs_fsck can delete everything in a corrupt file system
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gfs-utils
Version: 5.3
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Robert Peterson
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-04-02 20:26 UTC by Robert Peterson
Modified: 2010-01-12 03:35 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 11:01:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
First patch (864 bytes, patch)
2009-04-03 15:44 UTC, Robert Peterson
no flags Details | Diff
Core file backtrace (23.44 KB, text/plain)
2009-06-12 13:23 UTC, Jaroslav Kortus
no flags Details
metadata of FS with the corruption (491.73 KB, application/octet-stream)
2009-06-12 13:25 UTC, Jaroslav Kortus
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1336 0 normal SHIPPED_LIVE gfs-utils bug fix update 2009-09-01 10:41:37 UTC

Description Robert Peterson 2009-04-02 20:26:03 UTC
Description of problem:
Nate encountered a data corruption problem where duplicate blocks
were encountered.  That is, there two files pointed to the same
physical block.  This is documented in bug #471141.  I restored a
copy of his metadata and ran gfs_fsck.  Rather than fix the file
system reasonably, gfs_fsck deleted all the files in the root directory.
This should be improved upon.

I've done a bunch of this work already in gfs2_fsck, but I need to
get to the bottom of why it deleted everything and fix it.

Version-Release number of selected component (if applicable):
RHEL5.3

How reproducible:
Unknown

Steps to Reproduce:
1. gfs2_edit restoremeta /home/bob/metadata/north-gfs_setbit.img /dev/some/device
2. gfs_fsck /dev/some/device
3.
  
Actual results:
Found dup block at 140809186 (and a bunch more)
Leaf(77589849) entry count in directory 26 doesn't match number of entries found - is 3, found 0
Update leaf entry count? (y/n)
(directory 26 is the root directory), and if you answer y, it deletes
everything.

Expected results:
Only the corrupt files / files with duplicated blocks should be
deleted.

Additional info:

Comment 1 Robert Peterson 2009-04-03 15:44:12 UTC
Created attachment 338068 [details]
First patch

Here's the culprit.  This patch is cross-written from gfs2.
Duplicate block processing was not returning the proper number of
leaf entries.  The "metablock" scanner took that to mean there were
no directory entries, and therefore, it should destroy the whole
root directory.  I'm testing the fix now.

Comment 2 Robert Peterson 2009-04-06 01:33:48 UTC
Requesting ack flags--we have the fix in hand.

Comment 3 Robert Peterson 2009-04-06 16:36:53 UTC
The attached patch has been pushed to the master branch of the
gfs1-utils git tree, and the STABLE3, STABLE2 and RHEL5 branches
of the cluster git tree for inclusion into 5.4.  It has been
tested on system roth-01.  Changing status to Modified.

Comment 5 Jaroslav Kortus 2009-06-12 13:22:05 UTC
I have tested very simple FS corruption and it produced quite interesting results.

Corruption description:
1. new GFS filesystem is created (mkfs.gfs -O -t a3cluster:a3gfs2 -p lock_nolock -j 2 -J 32 /dev/GFSVG/GFS)
2. FS mounted and 3x10M files created (file-01 file-02 file-03)
3. FS umounted
4. gfs2_edit used to create duplicate link from first to second file. The last link in first section of first file (first link pointing to block containing data) was made the same for first and second file. In other words the first data block of file-02 (or whatever is 2nd file in FS) is the same as in first file.

Note: the same scenario is usable for indirect links (links to another block of links) and for gfs2_fsck@GFS2.

Versions used:
x86_64: GFS fsck 0.1.19 (built May  4 2009 19:34:42)
ia64: GFS fsck 0.1.19 (built May  4 2009 19:35:05)

And now gfs_fsck -y was run. The corruption was fixed on ia64 but not on x86_64.

ia64:
gfs_fsck -y /dev/sdc1
Initializing fsck
Clearing journals (this may take a while).
Journals cleared.
Starting pass1
Pass1 complete      
Starting pass1b
Found dup block at 88
Block 88 has 2 inodes referencing it fora total of 2 duplicate references
Inode (null) has 1 reference(s) to block 88
Clearing...
Found dup in inode "unknown name" (block #24) with block #88
inode  is in directory 0
Pass1b complete      
Starting pass1c
Pass1c complete      
Starting pass2
Found directory entry 'file-02' in block 23 to something not a file or directory!
Directory entry 'file-02' cleared
Entries is 6 - should be 5 for 23
Entries updated
Pass2 complete      
Starting pass3
Pass3 complete      
Starting pass4
Pass4 complete      
Starting pass5
ondisk and fsck bitmaps differ at block 24
Succeeded.
ondisk and fsck bitmaps differ at block 2648
Succeeded.
RG #1 free count inconsistent: is 20715 should be 20717
RG #1 used inode count inconsistent: is 9 should be 8
Resource group counts updated
Pass5 complete      
Writing changes to disk


x86_64:
gfs_fsck -y /dev/GFSVG/GFS
Initializing fsck
Clearing journals (this may take a while).
Journals cleared.
Starting pass1
Pass1 complete      
Starting pass1b
Found dup block at 88
Block 88 has 2 inodes referencing it fora total of 2 duplicate references
Inode (null) has 1 reference(s) to block 88
Clearing...
make: *** [checkgfs1] Segmentation fault (core dumped)

I will attach metadata of the corrupted FS (x86 version) and backtrace from the core file.

Comment 6 Jaroslav Kortus 2009-06-12 13:23:35 UTC
Created attachment 347563 [details]
Core file backtrace

backtrace of x86_64 core from gfs_fsck

Comment 7 Jaroslav Kortus 2009-06-12 13:25:03 UTC
Created attachment 347564 [details]
metadata of FS with the corruption

Metadata containing the simple corruption described in the comments.

Comment 8 Jaroslav Kortus 2009-07-20 16:07:20 UTC
verified with gfs-utils-0.1.20-1.el5

it no longer deletes everything if crosslinked files are found in root directory.
passed crosslink test on x86_64 and ia64.

Comment 9 Jaroslav Kortus 2009-07-20 16:08:30 UTC
to fully fix the filesystem the gfs_fsck has to be run twice. Second run fixes bitmap differences. This applies until bug 509225 is fixed.

Comment 12 errata-xmlrpc 2009-09-02 11:01:30 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1336.html


Note You need to log in before you can comment on or make changes to this bug.