Bug 1032337

Summary: gfs2 filesystem always "fatal: invalid metadata block"
Product: Red Hat Enterprise Linux 5 Reporter: skk <skkwish>
Component: gfs2-utilsAssignee: Robert Peterson <rpeterso>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 5.3CC: adas, anprice, bmarzins, cluster-maint, pevans, rpeterso, skkwish, swhiteho
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-04-25 14:28:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description skk 2013-11-20 02:00:33 UTC
After I excute gfs2.fsckļ¼Œ my system always occur the next error:


Nov 19 08:21:13 kss2 kernel: GFS2: fsid=pcmk:secbox.0: fatal: invalid metadata block
Nov 19 08:21:13 kss2 kernel: GFS2: fsid=pcmk:secbox.0:   bh = 1669014485 (magic number)
Nov 19 08:21:13 kss2 kernel: GFS2: fsid=pcmk:secbox.0:   function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 334
Nov 19 08:21:13 kss2 kernel: GFS2: fsid=pcmk:secbox.0: about to withdraw this file system
Nov 19 08:21:13 kss2 kernel: GFS2: fsid=pcmk:secbox.0: telling LM to withdraw
Nov 19 08:21:14 kss2 kernel: GFS2: fsid=pcmk:secbox.0: withdrawn
Nov 19 08:21:14 kss2 kernel: 
Nov 19 08:21:14 kss2 kernel: Call Trace:
Nov 19 08:21:14 kss2 kernel:  [<ffffffff888447b0>] :gfs2:gfs2_lm_withdraw+0xd3/0x100
Nov 19 08:21:14 kss2 kernel:  [<ffffffff80063a1a>] __wait_on_bit+0x60/0x6e
Nov 19 08:21:14 kss2 kernel:  [<ffffffff800155d0>] sync_buffer+0x0/0x3f
Nov 19 08:21:14 kss2 kernel:  [<ffffffff80063a94>] out_of_line_wait_on_bit+0x6c/0x78
Nov 19 08:21:14 kss2 kernel:  [<ffffffff800a2ebe>] wake_bit_function+0x0/0x23
Nov 19 08:21:14 kss2 kernel:  [<ffffffff8001abd7>] submit_bh+0x10d/0x114
Nov 19 08:21:14 kss2 kernel:  [<ffffffff8885876b>] :gfs2:gfs2_meta_check_ii+0x2c/0x38
Nov 19 08:21:14 kss2 kernel:  [<ffffffff8884821a>] :gfs2:gfs2_meta_indirect_buffer+0x108/0x162
Nov 19 08:21:14 kss2 kernel:  [<ffffffff88842ec2>] :gfs2:gfs2_inode_refresh+0x20/0x2d1
Nov 19 08:21:14 kss2 kernel:  [<ffffffff88842450>] :gfs2:inode_go_lock+0x44/0xc3
Nov 19 08:21:14 kss2 kernel:  [<ffffffff8883fef5>] :gfs2:do_promote+0xbc/0x195
Nov 19 08:21:14 kss2 kernel:  [<ffffffff88841b98>] :gfs2:finish_xmote+0x315/0x333
Nov 19 08:21:14 kss2 kernel:  [<ffffffff88841bc6>] :gfs2:glock_work_func+0x0/0xe2
Nov 19 08:21:14 kss2 kernel:  [<ffffffff88841be7>] :gfs2:glock_work_func+0x21/0xe2
Nov 19 08:21:14 kss2 kernel:  [<ffffffff8004d342>] run_workqueue+0x9e/0xfb
Nov 19 08:21:14 kss2 kernel:  [<ffffffff80049b71>] worker_thread+0x0/0x122
Nov 19 08:21:14 kss2 kernel:  [<ffffffff800a2c78>] keventd_create_kthread+0x0/0xc4
Nov 19 08:21:14 kss2 kernel:  [<ffffffff80049c61>] worker_thread+0xf0/0x122
Nov 19 08:21:14 kss2 kernel:  [<ffffffff8008e8b4>] default_wake_function+0x0/0xe
Nov 19 08:21:14 kss2 kernel:  [<ffffffff800a2c78>] keventd_create_kthread+0x0/0xc4
Nov 19 08:21:14 kss2 kernel:  [<ffffffff8003271a>] kthread+0xfe/0x132
Nov 19 08:21:14 kss2 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
Nov 19 08:21:14 kss2 kernel:  [<ffffffff800a2c78>] keventd_create_kthread+0x0/0xc4
Nov 19 08:21:14 kss2 kernel:  [<ffffffff8003261c>] kthread+0x0/0x132
Nov 19 08:21:14 kss2 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11


My OS is RHEL5.3. Every time, the bh is 1669014485, is there the block 1669014485 error? 
Can I fsck the specific block ?

Comment 1 Ondrej Vasik 2013-11-20 10:08:17 UTC
While Red Hat welcomes bug reports on Red Hat products here in our  public bugzilla database, please keep in mind that bugzilla is not a support tool or means of accessing support.  If you would like technical support please visit our support portal at access.redhat.com or call us for information on subscription offerings to suit your needs.
In addition, filesystem is not component you are searching for - it creates and owns basic system directory layout. It has nothing to do with gfs2. Reassigning (let's try gfs2-utils)

Comment 2 Robert Peterson 2013-11-20 13:31:21 UTC
This problem looks very familiar, and I'm pretty sure we've
seen it and fixed it in later releases. I've seen this symptom
in both kernel problems and fsck.gfs2 problems.

It's easiest to check if fsck.gfs2 is the problem; it's harder
to check see if it's a kernel problem.

Since the number is always the same, 1669014485, there's a good
chance that your version of fsck.gfs2 is not fixing the real
problem. We've made I recommend you try running a more recent
version of fsck.gfs2. The RHEL6.4 version and newer is preferred:
we've probably done more than 150 patches to that code since
RHEL5.3. We've made it much faster, more accurate, and able to
find and fix lots more problems.

If the problem remains after a newer version of fsck.gfs2 has
checked the file system, it could be a kernel problem. Again,
there have been a lot of problems since 5.3 and I recommend
upgrading to a newer release if you can.

If neither of these options is possible, you can save off your
gfs2 metadata, gzip it, and put that on a server for me to
download. (None of your data should be saved, only file system
structures, etc). I can then restore it and analyze what's
wrong with it. I can also run it through a newer fsck.gfs2 to
see if it detects and fixes the problem. To save off the
metadata, use a command like this:

gfs2_edit savemeta /dev/your/device /tmp/bz1032337.meta
gzip /tmp/bz1032337.meta

Also, if you're a Red Hat customer, please contact our Global
Support Services (GSS) and have them open a case.

Comment 3 skk 2013-11-25 05:00:18 UTC
  The metadata file is too large.
  My disk data exceed 10G and once fsck.gfs2 need about more than 12h, But system can't be shutdown.  
  I has found the damaged file corresponding to block 1669014485, while I access this file ,filesystem will be withdrawed. I move it's father directory to a new name, then mkdir the old directory and copy the other files to the directory. But I don't know why the error occurs, and worry about the same error occurring.

   In addition, My OS is 5.4, fsck.gf2 is 3.0.12.

Comment 4 Robert Peterson 2013-11-25 13:33:37 UTC
I cannot tell you why the file at block 1669014485 is causing
GFS2 problems unless I can analyze the file system metadata
to determine the root cause.

As I said, the RHEL6.4 and newer versions of fsck.gfs2 are
better than the RHEL5.4 version. The rhel6 fsck.gfs2 might
tell you what's wrong with that block and fix it, but that is
not guaranteed. It is also much faster, so it probably won't
take 12 hours; it might only take 2 hours. There is no way to
predict this.

So your only choices are:

(1) Rebuild the file system completely, and copy everything
    but the corrupt file to the new file system.
(2) Run a RHEL6 version of fsck.gfs2 on your file system,
    either by adding a new RHEL6 machine or a RHEL6 virt that
    can access the same physical device. This is not guaranteed
    to find or fix the problem, but the odds are pretty good.
(3) As an alternative, you could try an experimental version
    of RHEL5 fsck.gfs2 from my people page to see if it finds
    and fixes the problem. The experimental version is here:
    http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/fsck.gfs2
    This fsck.gfs2 is NOT quality tested by Red Hat, and you
    run it at your own risk.
(4) Somehow give me access to the metadata so I can determine
    what exactly is wrong with it.
(5) Ignore the problem and hope there are no other corrupt
    files like it.

The choice is yours.

Comment 5 RHEL Program Management 2014-01-29 10:33:45 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 6 Steve Whitehouse 2014-04-25 14:28:45 UTC
This bug has been in needinfo since Nov last year. Without additional information we are unable to continue to debug this issue, so we are closing the bug at this stage. Please reopen, including the requested information, if you think this is incorrect.

Comment 7 Red Hat Bugzilla 2023-09-14 01:54:03 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days