Description of problem: A small GFS Volume (20GB, 3GB used) was corrupted as a server was shutdown uncleanly. When executing a fsck.gfs, a segfault occurs. After that, When mounting the device with lockproto=lock_nolock on a single machine, the filesystem is reported to be filled with 3GB data, but only around 500MB of data are still accessible. Most files on the filesystem seem to be missing. Version-Release number of selected component (if applicable): RHEL 5 U3 with GFS utils How reproducible: Always with this filesystem Steps to Reproduce: 1. Run fsck.gfs on device 2. wait a few minutes. 3. See segfault Actual results: fsck.gfs -y /dev/mapper/vg-gfs_lv-gfs Initializing fsck Clearing journals (this may take a while).... Journals cleared. Starting pass1 Found unused inode marked in-use Pass1 complete Starting pass1b Found dup block at 724461 Block 724461 has 2 inodes referencing it fora total of 2 duplicate references Inode (null) has 1 reference(s) to block 724461 Clearing... Jun 22 19:26:55 (none) kernel: fsck.gfs[30737]: segfault at 0000000000000018 rip 0000000000416406 rsp 00007fff232c09b0 error 4 Expected results: fsck.gfs should not segfault. errors shoud be corrected. Additional info: # uname -a Linux C2N1 2.6.18-128.1.10.el5 #1 SMP Thu May 7 10:35:59 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux rpm -qf /sbin/gfs_fsck gfs-utils-0.1.18-1.el5
This is very likely to be the result of one of the bugs I found in gfs2's fsck in bug #500483. I stated in this comment: https://bugzilla.redhat.com/show_bug.cgi?id=500483#c8 that many of the bugs I found and fixed should be back-ported to gfs's fsck since gfs2's fsck was based on gfs_fsck. So I'll use this bug to do that back-porting. However, in order to make sure that the problem is fixed, I'd like to get a copy of the GFS metadata that recreated this segfault, if that's possible.
Created attachment 349598 [details] Rough Cut Patch This is my first crack at a crosswrite patch from bug #500483 that I hope full fix the problem. It is COMPLETELY untested, (and likely dangerous) but I wanted to save what I've got so far. I need to debug this and run it through all the GFS metadata I've got in my collection, like I did for gfs2's fsck, and that's likely to take several days. I am still waiting to get the original metadata so I can make sure it fixes the problem. There is still the possibility that this bug is the same as 506550, in which case there is still a fair amount of work to do. Either way, this preliminary port will be needed before I can get to that, and either way, I need the original metadata that shows the problem.
We could recreate this error in an test environment. There we created an complete dump of the metadata. The dump is available here: http://www.files.to/get/718921/7t711aftr0 Within an test environment See the attachment fsck.gfs.dump for the complete output of the fsck.gfs command.
Could this be related to bug 493727? I have met the very same issue during my testing there.
I doubt that it is the same problem, but I just received their metadata today, so I can likely run the different versions of gfs_fsck against the metadata and find out.
Here is the metadata of the original filesytem where the bug has been experienced first: http://www.files.to/get/719467/3f4fb20cqu
I have determined that this problem was caused by a regression introduced by the patch for bug #495774. That bug record now contains an addendum patch that fixes the segfault. Therefore, I'm closing this one as a duplicate of 495774. I have opened a new bug #509225 for the gfs crosswrite work from gfs2 bug #500483 (GFS2: fsck.gfs2 sometimes needs to be run twice). *** This bug has been marked as a duplicate of bug 495774 ***