Bug 553447 - GFS2: fatal: filesystem consistency error in gfs2_ri_update
Summary: GFS2: fatal: filesystem consistency error in gfs2_ri_update
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 526947 581009
TreeView+ depends on / blocked
 
Reported: 2010-01-07 21:51 UTC by Nate Straz
Modified: 2010-04-09 18:53 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 581009 (view as bug list)
Environment:
Last Closed: 2010-03-30 07:29:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Do not withdraw on partial rindex entries (633 bytes, patch)
2010-01-20 16:27 UTC, Ben Marzinski
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Nate Straz 2010-01-07 21:51:38 UTC
Description of problem:

While running gfs2_grow regression tests I started hitting this problem:

GFS2: fsid=morph-cluster:grow1.0: fatal: filesystem consistency error
GFS2: fsid=morph-cluster:grow1.0:   inode = 19 99327
GFS2: fsid=morph-cluster:grow1.0:   function = gfs2_ri_update, file = fs/gfs2/rgrp.c, line = 590
GFS2: fsid=morph-cluster:grow1.0: about to withdraw this file system
GFS2: fsid=morph-cluster:grow1.0: telling LM to withdraw

Version-Release number of selected component (if applicable):
kernel-2.6.18-183.el5

How reproducible:
Easily

Steps to Reproduce:
1. run growfs test
2.
3.
  
Actual results:
gfs2_grow hangs and file system withdraws.

Expected results:
gfs2_grow should complete without errors or withdrawing the file system.

Additional info:

Comment 2 Ben Marzinski 2010-01-08 19:00:51 UTC
This seems to be a regression caused by my fix for bz 482756.

Comment 3 Ben Marzinski 2010-01-09 07:34:41 UTC
That consistency error is

 if (do_div(rgrp_count, sizeof(struct gfs2_rindex))) {

Whenever it fails, rgrp_count is definitely not a multiple of gfs2_rindex. Instead, it's always a multiple of 4k. Looking at gfs2_write_begin() and gfs2_write_end() when this error occurs, gfs2_write_begin() is only being told to write up to a page boundary, which it correctly does. That's because gfs2_perform_write() is only writing a page at a time. and it's dropping the exclusive glock on the sd_rindex between times.  This means that in another process calls gfs2_rindex_hold in this window, when it sees that the index is not uptodate, and calls gfs2_ri_update() it won't be looking at the completely updated rindex, and will get a consistency error.

This problem could be lessened by moving the clearing of gl->gl_sbd->sd_rindex_uptodate till after the entire rindex file has been written out.  However that still will likely leave some nasty corner cases since the other process is still able to grab the rindex without it being up to date.

Comment 4 Steve Whitehouse 2010-01-14 10:11:51 UTC
Can we just change the kernel code to ignore partial entries rather than withdraw or whatever?

Comment 5 Ben Marzinski 2010-01-20 16:27:46 UTC
Created attachment 385713 [details]
Do not withdraw on partial rindex entries

This patch fixes the problems as long as you do not have two nodes trying to grow the fs at the same time.  That can be fixed by grabbing a flock on the rindex file in userspace before writing to it.

Comment 6 RHEL Program Management 2010-01-20 16:42:17 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Ben Marzinski 2010-01-20 17:00:07 UTC
Posted

Comment 8 Ben Marzinski 2010-02-02 16:15:23 UTC
Reposted

Comment 11 Robert Peterson 2010-02-22 20:27:13 UTC
Nate seems to have recreated this problem by doing gfs2_grow while
a gfs2 load was running.  He got:

GFS2: fsid=morph-cluster:grow1.2: fatal: invalid metadata block
GFS2: fsid=morph-cluster:grow1.2:   bh = 16744776 (magic number)
GFS2: fsid=morph-cluster:grow1.2:   function = gfs2_rgrp_bh_get, file = fs/gfs2/rgrp.c, line = 754
GFS2: fsid=morph-cluster:grow1.2: about to withdraw this file system
GFS2: fsid=morph-cluster:grow1.2: telling LM to withdraw

16744776 = 0xff8148

Excerpt from the rindex indirect pointers:
1865B220 00000000 00061994 00000000 00061995 [................] 
1865B230 00000000 00061996 00000000 00061997 [................] 
1865B240 00000000 004B60A0 00000000 004DDA4B [.....K`......M.K] 
1865B250 00000000 004DDA4C 00000000 004DDA4D [.....M.L.....M.M] 
1865B260 00000000 004DDA4E 00000000 0054D6CD [.....M.N.....T..] 
1865B270 00000000 0054D6CE 00000000 0054D6CF [.....T.......T..] 
1865B280 00000000 0054D6D0 00000000 005BD34D [.....T.......[.M] 
1865B290 00000000 005BD34E 00000000 005BD34F [.....[.N.....[.O] 
1865B2A0 00000000 005BD350 00000000 00634F8D [.....[.P.....cO.] 

Except from indirect block 0x4dda4e:
0000000137693BA0 00000000 00FF0180 00000009 00000000 [................] 
0000000137693BB0 00000000 00FF0189 00007FB4 00001FED [................] 
0000000137693BC0 00000000 00000000 00000000 00000000 [................] 
0000000137693BD0 00000000 00000000 00000000 00000000 [................] 

All the new rgrps have "complete trash" rather than the rgrps and
bitmaps as they should.  But gfs2_grow writes the new rgrp blocks
before it writes changes to the rindex file.

I examined all 3 of the journals and none of the three loads
were accessing blocks anywhere near the new section of rgrps.

My theory is that gfs2_grow needs to open and write "2" to
/proc/sys/vm/drop_caches to get the kernel to re-read the
modified blocks.  If we can recreate this reliably, I should
be able to test this very easily.

Comment 12 Robert Peterson 2010-02-22 20:36:12 UTC
Correction: The block 0x4dda4e was the wrong one.  Here's the
right one:

Block #5560013    (0x54d6cd)  of 38522880 (0x24BD000)
(p.1 of 1--Data )
00000001535B3400 00000000 00FF8140 00000009 00000000 [.......@........]
00000001535B3410 00000000 00FF8149 00007FB4 00001FED [.......I........]
00000001535B3420 00000000 00000000 00000000 00000000 [................]
00000001535B3430 00000000 00000000 00000000 00000000 [................]

Since length == 9, this rindex entry implies that the block in
question, 0xff8148, is supposed to be a rgrp bitmap block, and
that it was calculated to be in the correct location.  The data
is just trash though; it doesn't even have a gfs2 metadata header.
None of the new rgrps do.  So it's as if gfs2_grow did not write
the new rgrps or their bitmaps at all.

Comment 13 Ben Marzinski 2010-02-22 21:08:08 UTC
I don't see any reason to think that Nate's new issue is related to this bug.  The original issue is that sometimes another node reads the rindex file before the node doing the grow has written it out completely.  This is a transient issue. Moments after the node hits the consistency error, the rest of the rindex file is written out.

In this new issue, the rgrps are never written out, and they should have been already written out and synced to disk before the rindex file was modified in the first place. Since a fix for the original issue has already gone into the kernel, and this new issue isn't suggesting that there is anything wrong with that fix, we should probably open a new bug instead.

Comment 15 Nate Straz 2010-03-08 15:19:54 UTC
Made it through 100 iterations of our growfs test without hitting this.

Comment 17 errata-xmlrpc 2010-03-30 07:29:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.