Bug 382581 - GFS2: gfs2_fsck: buffer still held for block: 162646118 (0x9b1c866)
GFS2: gfs2_fsck: buffer still held for block: 162646118 (0x9b1c866)
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gfs2-utils (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Chris Feist
GFS Bugs
Depends On:
  Show dependency treegraph
Reported: 2007-11-14 10:07 EST by Dean Jansa
Modified: 2010-01-11 22:40 EST (History)
0 users

See Also:
Fixed In Version: RHBA-2008-0350
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-05-21 13:20:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Preliminary patch to fix the problem (1015 bytes, patch)
2007-11-14 10:15 EST, Robert Peterson
no flags Details | Diff
Better patch for HEAD (4.27 KB, patch)
2007-11-15 17:13 EST, Robert Peterson
no flags Details | Diff
Patch to fix the problem (try #3) (6.46 KB, patch)
2007-11-16 18:17 EST, Robert Peterson
no flags Details | Diff

  None (edit)
Description Dean Jansa 2007-11-14 10:07:10 EST
Description of problem:

Running gfs2_fsck after filesystem recovery resulted in the following error:

gfs2_fsck: buffer still held for block: 162646118 (0x9b1c866)

Version-Release number of selected component (if applicable):

2.6.18-53.el5 + kmod-gfs2-1.53-7

Steps to Reproduce:
1. Mount fs, start IO, shoot node, allow recovery to finish
2. umount fs, run gfs2_fsck
Comment 1 Robert Peterson 2007-11-14 10:09:54 EST
I've found the bug and fixed it.  Requesting some ACK flags for
inclusion into RHEL5.2.
Comment 2 RHEL Product and Program Management 2007-11-14 10:14:23 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 3 Robert Peterson 2007-11-14 10:15:02 EST
Created attachment 258161 [details]
Preliminary patch to fix the problem

This is a preliminary patch that fixes this bugzilla's case.  However,
I want to do an exhaustive search through the gfs2_fsck code for any
other cases where this could happen.
Comment 4 Robert Peterson 2007-11-15 17:13:07 EST
Created attachment 260481 [details]
Better patch for HEAD

As I suspected, there were more places in the code where we could
run into the same problem.  This is a revised patch to fix more cases.
Using my new improved gfs2_edit I am developing and running a new test
case that tests various scenarios where extended attributes are found
to be in a bad state and fixed by gfs2_fsck.  It also tests both
failing and non-failing cases.	It checks EA's kept in both the direct
blocks (the EAs are in a single block) and indirect blocks (where EAs
are kept in multiple blocks with indirect EA data block pointers).
I hope to be done with this testing and check the patch in to CVS
tomorrow before I go on vacation.
Comment 5 Robert Peterson 2007-11-16 17:25:55 EST
These problems all have to do with the checking of extended attributes.
There are apparently three kinds of extended attributes:
(1) Simple EAs, where all the data is chained together into one EA block.
(2) Complex EAs where the EA has indirect pointers to more EA blocks.
(3) Indirect EAs where the inode points to an indirect block list, which
    points to blocks of EAs of the other types.
I hit a snag during unit testing: I discovered that gfs2_fsck has a
pass1c (which checks the EAs), that is COMPLETELY useless.
Pass1c skips across the special EA bitmap trying to find blocks that are
marked as EAs.  When it finds one, it reads it in, assumes it is a dinode,
and checks the EAs for that dinode.  However, in pass1, dinodes are marked
as dinodes, and EAs are marked as EAs.  Therefore, pass1c will NEVER find
a block marked as an EA that is also a dinode.

GFS1's fsck goes about it differently.  It somehow marks inodes in the
special EA bitmap, so pass1c finds it, reads it in, then processes its
EAs.  This needs more investigation.

I tried marking the dinode as an EA in pass1 so that pass1c would find
the dinode and process it like gfs2.  That much worked.  However, the act
of marking the dinode as an EA also UNMARKED it as an inode.  Therefore,
when pass5 ran, it thought the dinode was marked wrong, and wanted to
change it to a user data block.

I'm thinking that perhaps pass1 should mark it as an EA, pass1c should
process it, then mark it back as an inode again.  That way pass1c
will process the EAs as it should and pass5 will have the correct
block type for the inode.  That might be what GFS1 does.
Comment 6 Robert Peterson 2007-11-16 18:17:28 EST
Created attachment 261981 [details]
Patch to fix the problem (try #3)

Third time's a charm, hopefully.
This version has a functioning pass1c.
Comment 7 Robert Peterson 2007-11-16 18:29:25 EST
I tested this on roth-01 and checked it in to CVS at the HEAD and RHEL5
branches for inclusion into RHEL5.2.

I'm a little bit concerned that it might not have undergone as much testing
as I'd like to see, but it passes all the tests I did anyway.
I'd like to formalize these test cases and submit them to QE eventually.
They are basically this:

for i in `seq 1 6` ; do
gfs2_edit restoremeta /home/bob/fsckeattrtest$i.savemeta /dev/roth_vg/roth_lv
gfs2_fsck /dev/roth_vg/roth_lv

The fsckeattrtest{1,2,3,4,5,6}.savemeta files are saved off metadata
from the file system in various states of extended attribute damage.
File fsckeattrtest1.savemeta is undamaged, but it has a bunch of
eattrs in various places.  The rest are damaged in various ways, and
fsck needs to be able to fix them.

At any rate, the code is checked in and I'm marking it as modified.
Comment 10 errata-xmlrpc 2008-05-21 13:20:30 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.