Description of problem: If you have a data block that's duplicated as an extended attribute block, gfs2_fsck gets all confused and makes some bad decisions. For example, if the data block is discovered first (and is correct) and the EA duplicate is discovered second (and is wrong) the code will destroy the file in favor of the bad EA block. If you run fsck a second time, it will notice that the EA is bad and delete that too. In the EA handling code in pass1.c, if a duplicate is encountered, fsck should see if the duplicated block is really part of an EA or not. If it isn't, it can fix the problem there and leave the user's file intact. Version-Release number of selected component (if applicable): RHEL5.2 prototype How reproducible: Always Steps to Reproduce: 1. Use gfs2_edit to change an EA for a file so that it points to a data block of another file. 2. Run gfs2_fsck on the modified file system Actual results: The entire file is destroyed and the bad EA is left out there. Expected results: The bad EA should be destroyed and the file left intact, depending on the duplicated block. Additional info: I've got a working prototype of pass1.c to fix this. However, there are three different instances of this bug and only one is fixed. I'd like to rearrange the code a bit to eliminate the redundant code. The likelihood of hitting this is very low. I only encountered it after designing a special test case for bug #325151. Therefore, I recommend deferring this to 5.3. We have higher priorities than this.
Created attachment 272991 [details] File system metadata to recreate problem You can recreate this problem by unbipz2ing and restoring this metadata and then running gfs2_fsck.
Missed in comment that this is low probability. Moved to 5.3 and lowered priority.
Created attachment 273071 [details] First prototype patch This is only a third of the patch, but it seems to work. There are three EA cases in pass1.c, and this patch only fixes one of those cases. Before this can be shipped, the other two cases need to be coded and tested.
devel & 5.3 ACKs please.
Created attachment 311604 [details] Second prototype patch This patch covers all three cases. However, it is completely untested and I need to go over the logic again. So it's likely to change.
Created attachment 311988 [details] Third prototype patch I found all kinds of "exceptions to the rule" once I ran my prototype through a set of saved metadata I had previously used to recreate the problem. This is my "fscktest" metadata, also known as my "leukemia" test for gfs2_fsck, where I purposely blasted blocks, duplicated blocks and stuck EA pointers where they shouldn't be. My goal was to test every combination of EA corruption, but I don't guarantee the test is comprehensive. This patch successfully recovers all the kinds of corruption simulated by the leukemia test and a second run won't find additional corruption. I still need to do more tests. For one, I'm planning on testing it with the (hopefully good) metadata left over from the benchp14 test of a 2TB file system. That will take a while. I also need to run it through a -n test to make sure it doesn't try to write to the file system when it shouldn't. I also fixed some cosmetic things: For example, there were a lot of places where it would tell you that you had a bad Extended Attribute, but it wouldn't tell you which inode it was associated with, etc. I tried to make the messages more meaningful and consistent.
Created attachment 312174 [details] fsck memory reduction patch I've got a bug #445858 to fix gfs's fsck to reduce the amount of memory it uses. Well, I've got the same problem with gfs2_fsck. The thing is, I wanted to run my new gfs2_fsck (above patch) against a file system populated by benchp14, but the problem is, the metadata is huge. The file system is 2TB in size and the metadata save is 184GB. So when I tried to run gfs2_fsck on it (to test the patch) I ran out of memory big time. So I started investigating ways to reduce gfs2_fsck's memory usage and came up with this patch. Although the patch is huge and looks pervasive, it's really just a couple of simple concepts. First, instead of allocating four bitmaps, we only allocate one. For bad blocks, duplicate blocks and extended attribute blocks, we just use a linked list of blocks. Since these type of blocks should be rare, we don't need to allocate a huge amount of memory for them. Second, for a couple of common data structures, such as the phony buffer heads used in libgfs2/buf.c, I got rid of a field to make the structure smaller. There's probably more fields I can get rid of here as well. Eventually I want to get rid of the whole thing and just let vfs do the buffering. Also, I reduced a link count from 32-bits to 16 bits. Hopefully we won't have more than 16384 links between files, but I checked a (somewhat old) copy of e2fsprogs and it also uses a 16-bit number to keep track of inode links, so I figure that can't be too horrible. Anyway, I'm letting the latest fsck code run on kool over the weekend and I'll see what happens. My savings might not be enough. But I thought I'd at least document what I had so far.
Created attachment 312404 [details] addendum patch I started rigorously testing gfs2_fsck and ran into more problems. After I got it working properly for my ea-damage test (aka "leukemia" test) I threw it against gfs2_fsck_hell and it uncovered several issues. First, I found a small change I did for gfs_fsck that I needed to crosswrite. Next, I discovered that the RG repair code had a problem: Since the resource groups are journaled, it's not uncommon for RGs to appear in journal blocks. But if the rindex file is damaged and we have to repair the index via the hunt-and-peck method, those "false RGs" confuse the code greatly. So I had to write some special code to first search for journal blocks that look like RGs and handle that properly. It turns out that the easiest way to do that (while taking into account the fact that journals may not be contiguous on disk--e.g. if the user created them through dd in the meta_fs) was to use some of the new code I devised to save memory on block maps. So the memory patch has become essential now. I tried to run this on a 2T file system populated by many hours of testing with benchp14. The system became bogged down and was cpu- bound. I ported the fix from bug #234627 and my cpu usage went down greatly. In one trial run, an gfs2_fsck run time went down from 19 seconds to about 6 seconds with the fix. The gfs2_fsck on kool went from being cpu bound to being IO bound, which is how it should be. Still, even with less memory and faster code, it only made it to 7% into pass1 on an overnight run. Before I ship this, I will likely split the patch into its various component fixes and do separate patch commits for each. That way it's not such a big mess. Plus, if there's a regression, I can back out one of the patches more easily. So the patches will likely be: 1. The third prototype patch as given above 2. The memory saving patch 3. The 234627 patch for bitmap speed 4. The gfs_fsck crosswrite for block number sanity checking 5. The RG-inside-a-journal patch I'll split them out tomorrow. I just wanted to document what I have so far in case I had a drive failure or something.
The "third prototype" plus the "addendum patch" from comment #9 passes the lengthy gfs2_fsck_hell test which deliberately damages RGs and the rindex file in certain ways and makes gfs2_fsck fix it.
Here is the final list of patches that I just pushed to master: a d5b9e65: Speed up userspace bitmap manipulation code. b a0d10d8: gfs_fsck crosswrite for block number sanity checking c 8f39570: Fix some bad references to gfs_tool and gfs_fsck d 7654295: Deleted unused function print_map e c903a12: Shrink memory 1: eliminate b_size from pseudo-buffer-heads f 5008483: Shrink memory 2: get rid of 3 huge in-core bitmaps g 7e52932: Shrink memory 3: smaller link counts in inode_info h 9fb74de: Better error reporting in gfs2_fsck i 768d7f6: RGRepair: Account for RG blocks inside journals j c9dfb0f: gfs2_fsck dupl. blocks between EA and data
Created attachment 312571 [details] Master patches This tarball contains the ten patches for the "master" branch. They should hopefully apply directly to the RHEL5 branch because there isn't much difference except for the copyright notices and a few minor things.
Patches applied to the RHEL5 branch, tested fully on system roth-01 and pushed to the RHEL5 branch for inclusion into 5.3. Changing status to Modified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0087.html