Description of problem: Nate encountered a data corruption problem where duplicate blocks were encountered. That is, there two files pointed to the same physical block. This is documented in bug #471141. I restored a copy of his metadata and ran gfs_fsck. Rather than fix the file system reasonably, gfs_fsck deleted all the files in the root directory. This should be improved upon. I've done a bunch of this work already in gfs2_fsck, but I need to get to the bottom of why it deleted everything and fix it. Version-Release number of selected component (if applicable): RHEL5.3 How reproducible: Unknown Steps to Reproduce: 1. gfs2_edit restoremeta /home/bob/metadata/north-gfs_setbit.img /dev/some/device 2. gfs_fsck /dev/some/device 3. Actual results: Found dup block at 140809186 (and a bunch more) Leaf(77589849) entry count in directory 26 doesn't match number of entries found - is 3, found 0 Update leaf entry count? (y/n) (directory 26 is the root directory), and if you answer y, it deletes everything. Expected results: Only the corrupt files / files with duplicated blocks should be deleted. Additional info:
Created attachment 338068 [details] First patch Here's the culprit. This patch is cross-written from gfs2. Duplicate block processing was not returning the proper number of leaf entries. The "metablock" scanner took that to mean there were no directory entries, and therefore, it should destroy the whole root directory. I'm testing the fix now.
Requesting ack flags--we have the fix in hand.
The attached patch has been pushed to the master branch of the gfs1-utils git tree, and the STABLE3, STABLE2 and RHEL5 branches of the cluster git tree for inclusion into 5.4. It has been tested on system roth-01. Changing status to Modified.
I have tested very simple FS corruption and it produced quite interesting results. Corruption description: 1. new GFS filesystem is created (mkfs.gfs -O -t a3cluster:a3gfs2 -p lock_nolock -j 2 -J 32 /dev/GFSVG/GFS) 2. FS mounted and 3x10M files created (file-01 file-02 file-03) 3. FS umounted 4. gfs2_edit used to create duplicate link from first to second file. The last link in first section of first file (first link pointing to block containing data) was made the same for first and second file. In other words the first data block of file-02 (or whatever is 2nd file in FS) is the same as in first file. Note: the same scenario is usable for indirect links (links to another block of links) and for gfs2_fsck@GFS2. Versions used: x86_64: GFS fsck 0.1.19 (built May 4 2009 19:34:42) ia64: GFS fsck 0.1.19 (built May 4 2009 19:35:05) And now gfs_fsck -y was run. The corruption was fixed on ia64 but not on x86_64. ia64: gfs_fsck -y /dev/sdc1 Initializing fsck Clearing journals (this may take a while). Journals cleared. Starting pass1 Pass1 complete Starting pass1b Found dup block at 88 Block 88 has 2 inodes referencing it fora total of 2 duplicate references Inode (null) has 1 reference(s) to block 88 Clearing... Found dup in inode "unknown name" (block #24) with block #88 inode is in directory 0 Pass1b complete Starting pass1c Pass1c complete Starting pass2 Found directory entry 'file-02' in block 23 to something not a file or directory! Directory entry 'file-02' cleared Entries is 6 - should be 5 for 23 Entries updated Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 ondisk and fsck bitmaps differ at block 24 Succeeded. ondisk and fsck bitmaps differ at block 2648 Succeeded. RG #1 free count inconsistent: is 20715 should be 20717 RG #1 used inode count inconsistent: is 9 should be 8 Resource group counts updated Pass5 complete Writing changes to disk x86_64: gfs_fsck -y /dev/GFSVG/GFS Initializing fsck Clearing journals (this may take a while). Journals cleared. Starting pass1 Pass1 complete Starting pass1b Found dup block at 88 Block 88 has 2 inodes referencing it fora total of 2 duplicate references Inode (null) has 1 reference(s) to block 88 Clearing... make: *** [checkgfs1] Segmentation fault (core dumped) I will attach metadata of the corrupted FS (x86 version) and backtrace from the core file.
Created attachment 347563 [details] Core file backtrace backtrace of x86_64 core from gfs_fsck
Created attachment 347564 [details] metadata of FS with the corruption Metadata containing the simple corruption described in the comments.
verified with gfs-utils-0.1.20-1.el5 it no longer deletes everything if crosslinked files are found in root directory. passed crosslink test on x86_64 and ia64.
to fully fix the filesystem the gfs_fsck has to be run twice. Second run fixes bitmap differences. This applies until bug 509225 is fixed.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1336.html