Red Hat Bugzilla – Bug 507775
fsck.gfs segfaults when repairing a corrupt GFS volume. Data gets lost!
Last modified: 2010-01-11 22:35:20 EST
Description of problem:
A small GFS Volume (20GB, 3GB used) was corrupted as a server was shutdown uncleanly.
When executing a fsck.gfs, a segfault occurs.
After that, When mounting the device with lockproto=lock_nolock on a single machine, the filesystem is reported to be filled with 3GB data, but only around 500MB of data are still accessible. Most files on the filesystem seem to be missing.
Version-Release number of selected component (if applicable):
RHEL 5 U3 with GFS utils
Always with this filesystem
Steps to Reproduce:
1. Run fsck.gfs on device
2. wait a few minutes.
3. See segfault
fsck.gfs -y /dev/mapper/vg-gfs_lv-gfs
Clearing journals (this may take a while)....
Found unused inode marked in-use
Found dup block at 724461
Block 724461 has 2 inodes referencing it fora total of 2 duplicate references
Inode (null) has 1 reference(s) to block 724461
Jun 22 19:26:55 (none) kernel: fsck.gfs: segfault at 0000000000000018 rip 0000000000416406 rsp 00007fff232c09b0 error 4
fsck.gfs should not segfault.
errors shoud be corrected.
# uname -a
Linux C2N1 2.6.18-128.1.10.el5 #1 SMP Thu May 7 10:35:59 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
rpm -qf /sbin/gfs_fsck
This is very likely to be the result of one of the bugs I found
in gfs2's fsck in bug #500483. I stated in this comment:
that many of the bugs I found and fixed should be back-ported to
gfs's fsck since gfs2's fsck was based on gfs_fsck. So I'll use
this bug to do that back-porting. However, in order to make sure
that the problem is fixed, I'd like to get a copy of the GFS
metadata that recreated this segfault, if that's possible.
Created attachment 349598 [details]
Rough Cut Patch
This is my first crack at a crosswrite patch from bug #500483
that I hope full fix the problem. It is COMPLETELY untested,
(and likely dangerous) but I wanted to save what I've got so far.
I need to debug this and run it through all the GFS metadata
I've got in my collection, like I did for gfs2's fsck, and that's
likely to take several days.
I am still waiting to get the original metadata so I can make
sure it fixes the problem. There is still the possibility that
this bug is the same as 506550, in which case there is still
a fair amount of work to do. Either way, this preliminary port
will be needed before I can get to that, and either way, I need
the original metadata that shows the problem.
We could recreate this error in an test environment. There we created an complete dump of the metadata.
The dump is available here: http://www.files.to/get/718921/7t711aftr0
Within an test environment
See the attachment fsck.gfs.dump for the complete output of the fsck.gfs command.
Could this be related to bug 493727? I have met the very same issue during my testing there.
I doubt that it is the same problem, but I just received their
metadata today, so I can likely run the different versions of
gfs_fsck against the metadata and find out.
Here is the metadata of the original filesytem where the bug has been experienced first:
I have determined that this problem was caused by a regression
introduced by the patch for bug #495774. That bug record now
contains an addendum patch that fixes the segfault. Therefore,
I'm closing this one as a duplicate of 495774.
I have opened a new bug #509225 for the gfs crosswrite work from
gfs2 bug #500483 (GFS2: fsck.gfs2 sometimes needs to be run twice).
*** This bug has been marked as a duplicate of bug 495774 ***