Description of problem: Running `gfs_fsck -v -n <block device>' on a GFS filesystem with inconsistencies, gfs_fsck outputs EA leaf block problems: Initializing fsck Initializing lists... Initializing special inodes... Validating Resource Group index. Level 1 check. 3798 resource groups found. (passed) Setting block ranges... Creating a block list of size 249167869... Starting pass1 Checking metadata in Resource Group 0 Checking metadata in Resource Group 1 ... Checking metadata in Resource Group 3065 Checking metadata in Resource Group 3066 Checking metadata in Resource Group 3067 Checking metadata in Resource Group 3068 EA leaf block has incorrect type. And gfs_fsck aborts. When running gfs_fsck without the '-n' flag to fix the problem, gfs_fsck crashes doing a double free on GLIBC. # gfs_fsck -y <block device> Initializing fsck Clearing journals (this may take a while).... Journals cleared. Starting pass1 13 percent complete. 25 percent complete. 37 percent complete. 48 percent complete. 49 percent complete. 61 percent complete. 72 percent complete. EA leaf block has incorrect type. *** glibc detected *** gfs_fsck: double free or corruption (fasttop): 0x00000000059aaf20 *** ======= Backtrace: ========= /lib64/libc.so.6[0x3ca0871634] /lib64/libc.so.6(cfree+0x8c)[0x3ca0874c5c] gfs_fsck[0x416546] gfs_fsck[0x4166ed] gfs_fsck[0x403c9c] gfs_fsck[0x404119] gfs_fsck[0x404370] gfs_fsck[0x40148a] /lib64/libc.so.6(__libc_start_main+0xf4)[0x3ca081d8b4] gfs_fsck[0x4010e9] ======= Memory map: ======== 00400000-00422000 r-xp 00000000 fd:00 8053142 /sbin/gfs_fsck 00622000-00623000 rw-p 00022000 fd:00 8053142 /sbin/gfs_fsck 03d2f000-059ba000 rw-p 03d2f000 00:00 0 [heap] 3ca0400000-3ca041a000 r-xp 00000000 fd:00 1146035 /lib64/ld-2.5.so 3ca061a000-3ca061b000 r--p 0001a000 fd:00 1146035 /lib64/ld-2.5.so 3ca061b000-3ca061c000 rw-p 0001b000 fd:00 1146035 /lib64/ld-2.5.so 3ca0800000-3ca094a000 r-xp 00000000 fd:00 1146049 /lib64/libc-2.5.so 3ca094a000-3ca0b49000 ---p 0014a000 fd:00 1146049 /lib64/libc-2.5.so 3ca0b49000-3ca0b4d000 r--p 00149000 fd:00 1146049 /lib64/libc-2.5.so 3ca0b4d000-3ca0b4e000 rw-p 0014d000 fd:00 1146049 /lib64/libc-2.5.so 3ca0b4e000-3ca0b53000 rw-p 3ca0b4e000 00:00 0 3ca4000000-3ca400d000 r-xp 00000000 fd:00 1146061 /lib64/libgcc_s-4.1.2-20080102.so.1 3ca400d000-3ca420d000 ---p 0000d000 fd:00 1146061 /lib64/libgcc_s-4.1.2-20080102.so.1 3ca420d000-3ca420e000 rw-p 0000d000 fd:00 1146061 /lib64/libgcc_s-4.1.2-20080102.so.1 2adb93530000-2adb93531000 rw-p 2adb93530000 00:00 0 2adb93540000-2adba0532000 rw-p 2adb93540000 00:00 0 2adba4000000-2adba4021000 rw-p 2adba4000000 00:00 0 2adba4021000-2adba8000000 ---p 2adba4021000 00:00 0 7fff17535000-7fff1757a000 rw-p 7ffffffba000 00:00 0 [stack] ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso] Aborted followed by a core dump. Version-Release number of selected component (if applicable): kernel-2.6.18-120.el5.bz470074.0 gfs2-utils-0.1.44-1.el5_2.1 glibc-common-2.5-24 cman-2.0.98-1.el5 kernel-headers-2.6.18-121.el5 kernel-doc-2.6.18-92.1.18.el5 kernel-2.6.18-134.el5 glibc-2.5-24 gfs-utils-0.1.18-1.el5 How reproducible: everytime on this particular filesystem which is damaged. Steps to Reproduce: 1. run the command as show above. Actual results: gfs_fsck segfaults and exits. Expected results: gfs_fsck fix the filesystem. Additional info:
Created attachment 339902 [details] Preliminary patch A similar problem was fixed in gfs2_fsck. This patch is a gfs-crosswrite from the gfs2 patch. It is completely untested. I'm waiting to get the customer's metadata so I can test it properly to make sure it fixes the problem.
Setting NEEDINFO flag until I can get the metadata in to make sure it will fix the file system properly. Since the patch has not been run even once, it's likely to need a few changes before it ships. I do not recommend running the patch on a production machine until the testing of the patch is complete.
This one seems to be bad too. Setting NEEDINFO again until I can get a clean copy.
This copy of the metadata is perfect. I ran the patch I posted with comment #13 on it, and it correctly fixes the file system. I'll start the process of getting this into RHEL5 asap.
The patch was pushed to the master branch of the gfs1-utils git tree, and the STABLE2, STABLE3 and RHEL5 branches of the cluster git tree for inclusion into 5.4. It was tested on system roth-01 using the customer's metadata that failed before the patch. Changing status to Modified.
Created attachment 341575 [details] Addendum patch The previous patch forgot to actually write the changes to disk. This was an oversight on my part, mainly because I made incorrect assumptions based on how gfs2_fsck (from which the patch came) operates. Hopefully this fixes it.
Thanks for the good news. The addendum patch was pushed to the master branch of the gfs1-utils git tree, and the STABLE2, STABLE3 and RHEL5 branches of the cluster git tree for inclusion into 5.4.
Using customer metadata, I have determined that bug #507775 was caused by a regression introduced with this bug's patch. I have an addendum patch that corrects the problem and allows gfs_fsck to repair both sets of gfs metadata from bug #507775. I will post the addendum patch immediately and start the process of respinning this fix for all the appropriate releases. Temporarily changing the status to FAILS_QA, but I should be able to push the addendum fix today.
Created attachment 350176 [details] Addendum patch for the 507775 problem. This patch fixes both sets of corrupt metadata from bug #507775.
We can't commit this fix unless the blocker or exception flag is set for this bug.
Chris, the flags were set when I originally did the commit for RHEL5.4, and that previous commit is defective. We can't ship a defective fix, so we really have no choice. Do I really need the exception flag? If so, I can likely get it. The addendum has been pushed now to master in gfs1-utils, and STABLE3, STABLE2, RHEL5 and RHEL54 branches of the cluster.git repository. It was tested on system roth-01.
*** Bug 507775 has been marked as a duplicate of this bug. ***
Requesting the exception flag.
verified with gfs-utils-0.1.20-1.el5 to fully fix the filesystem, gfs_fsck has to be run twice. This applies until bug 509225 is fixed. Passed eatype test on x86_64 and ia64.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1336.html