Bug 519721 - GFS2: filesystem consistency error and gfs2_fsck
Summary: GFS2: filesystem consistency error and gfs2_fsck
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gfs2-utils
Version: 5.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Robert Peterson
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-27 14:51 UTC by Florencia Fotorello
Modified: 2018-10-27 15:23 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-04-08 12:34:41 UTC


Attachments (Terms of Use)

Description Florencia Fotorello 2009-08-27 14:51:11 UTC
Description of problem:
GFS2 filesystem got consistency errors and can't be used. Not possible to write or read from it.

The message is:

Aug 23 12:34:33 server kernel: GFS2: fsid=cluster01:gfsv2vg04vol02.0: fatal:
filesystem consistency error
Aug 23 12:34:33 server kernel: GFS2: fsid=cluster01:gfsv2vg04vol02.0:   RG =
130666
Aug 23 12:34:33 server kernel: GFS2: fsid=cluster01:gfsv2vg04vol02.0:  
function = gfs2_setbit, file = /builddir/buil
d/BUILD/gfs2-kmod-1.92/_kmod_build_/rgrp.c, line = 97
Aug 23 12:34:33 server kernel: GFS2: fsid=cluster01:gfsv2vg04vol02.0: about to
withdraw this file system
Aug 23 12:34:33 server kernel: GFS2: fsid=cluster01:gfsv2vg04vol02.0: telling
LM to withdraw

The output of gfs2_fsck is attached.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 5.3 (Tikanga)
Linux server 2.6.18-128.4.1.el5 #1 SMP Thu Jul 23 19:59:19 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:
The filesystem shows filesystem consistency errors, after executing gfs2_fsck.

Expected results:
gfs2_fsck repairs the GFS2 filesystem and determine if the consistency error is cause by a bug.

Additional info:
This new bug is open as suggested in Bug #490136 (internal note). It's a production system.

Comment 2 Robert Peterson 2009-09-02 14:47:32 UTC
The output of fsck.gfs2 does not seem to be attached as stated
in the description.  I recommend they download and untar this
file:

http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/gfs2_fsck_edit.tgz

It contains new versions of fsck.gfs2 and gfs2_edit.

I recommend first saving off their metadata first with this
version of gfs2_edit using a command like this:

gfs2_edit savemeta /dev/their/device /tmp/519721.meta

Then perhaps they can run this version of fsck.gfs2 to see if it
fixes their file system.

Comment 4 Robert Peterson 2009-09-02 18:18:12 UTC
The consistency error is caused by file system corruption.
There are many ways the file system may become corrupted.
Some of them are due to hardware problems, such as defective
storage or Host Bus Adapter.  Some of them are due to user
error, such as running fsck.gfs2 while the file system is
mounted on a different node in the cluster.  Some of them are
due to bugs in the GFS2 file system.  It is nearly impossible
to say what caused the file system corruption at this time
because there is almost no information here to analyze.
File system corruption problems are very difficult to solve
unless we have a scenario we can use to recreate the corruption
reliably, starting with a "clean" file system.

Comment 5 Robert Peterson 2009-09-08 15:31:22 UTC
This may be related to bug #519049.  I'm hoping to attach a patch
for that bug later today.  Perhaps the customer would be willing
to try it.

Comment 9 Robert Peterson 2009-09-15 12:39:15 UTC
These two newest attachments have syslogs that are nearly a
month old.  They indicate the customer is still using the
obsolete GFS2 overlay module.  That needs to be removed.
They should be able to remove it with this command:

rpm -e gfs2-kmod

The syslog also indicates possible hardware problems with their
storage (or possibly multipath problems).  I don't see further
evidence of GFS2 doing the wrong thing, but that gfs2-kmod rpm
needs to be removed in favor of a newer GFS2 module that is
built into the kernel.

Comment 11 Robert Peterson 2009-10-12 21:16:12 UTC
Setting NEEDINFO until I get feedback

Comment 21 Robert Peterson 2010-02-18 20:40:50 UTC
Hot off the press, this is my latest and greatest 5.5 version
of fsck.gfs2:

http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/fsck.gfs2

It has passed all the tests I've run so far and it's faster than
the official version.  I recommend they save their gfs2 metadata
and run this version, saving the output.  In other words:

(1) Save this version of fsck.gfs2 into some directory like ~/Download

(2) unmount the file system from all nodes:
    umount /mnt/gfs2
(3) On one node:
    gfs2_edit savemeta /dev/your/device ~/519721.savemeta
    cd ~/Download  (or the directory where you have the new fsck.gfs2)
    ./fsck.gfs2 -y /dev/your/device &> /tmp/fsck.gfs2.output
(4) re-mount the file system as normal
(5) please post the output (/tmp/fsck.gfs2.output) to the bugzilla.


Note You need to log in before you can comment on or make changes to this bug.