Bug 382581
Summary: | GFS2: gfs2_fsck: buffer still held for block: 162646118 (0x9b1c866) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Dean Jansa <djansa> | ||||||||
Component: | gfs2-utils | Assignee: | Chris Feist <cfeist> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | GFS Bugs <gfs-bugs> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 5.1 | ||||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | RHBA-2008-0350 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2008-05-21 17:20:30 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Dean Jansa
2007-11-14 15:07:10 UTC
I've found the bug and fixed it. Requesting some ACK flags for inclusion into RHEL5.2. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Created attachment 258161 [details]
Preliminary patch to fix the problem
This is a preliminary patch that fixes this bugzilla's case. However,
I want to do an exhaustive search through the gfs2_fsck code for any
other cases where this could happen.
Created attachment 260481 [details]
Better patch for HEAD
As I suspected, there were more places in the code where we could
run into the same problem. This is a revised patch to fix more cases.
Using my new improved gfs2_edit I am developing and running a new test
case that tests various scenarios where extended attributes are found
to be in a bad state and fixed by gfs2_fsck. It also tests both
failing and non-failing cases. It checks EA's kept in both the direct
blocks (the EAs are in a single block) and indirect blocks (where EAs
are kept in multiple blocks with indirect EA data block pointers).
I hope to be done with this testing and check the patch in to CVS
tomorrow before I go on vacation.
These problems all have to do with the checking of extended attributes. There are apparently three kinds of extended attributes: (1) Simple EAs, where all the data is chained together into one EA block. (2) Complex EAs where the EA has indirect pointers to more EA blocks. (3) Indirect EAs where the inode points to an indirect block list, which points to blocks of EAs of the other types. I hit a snag during unit testing: I discovered that gfs2_fsck has a pass1c (which checks the EAs), that is COMPLETELY useless. Pass1c skips across the special EA bitmap trying to find blocks that are marked as EAs. When it finds one, it reads it in, assumes it is a dinode, and checks the EAs for that dinode. However, in pass1, dinodes are marked as dinodes, and EAs are marked as EAs. Therefore, pass1c will NEVER find a block marked as an EA that is also a dinode. GFS1's fsck goes about it differently. It somehow marks inodes in the special EA bitmap, so pass1c finds it, reads it in, then processes its EAs. This needs more investigation. I tried marking the dinode as an EA in pass1 so that pass1c would find the dinode and process it like gfs2. That much worked. However, the act of marking the dinode as an EA also UNMARKED it as an inode. Therefore, when pass5 ran, it thought the dinode was marked wrong, and wanted to change it to a user data block. I'm thinking that perhaps pass1 should mark it as an EA, pass1c should process it, then mark it back as an inode again. That way pass1c will process the EAs as it should and pass5 will have the correct block type for the inode. That might be what GFS1 does. Created attachment 261981 [details]
Patch to fix the problem (try #3)
Third time's a charm, hopefully.
This version has a functioning pass1c.
I tested this on roth-01 and checked it in to CVS at the HEAD and RHEL5 branches for inclusion into RHEL5.2. I'm a little bit concerned that it might not have undergone as much testing as I'd like to see, but it passes all the tests I did anyway. I'd like to formalize these test cases and submit them to QE eventually. They are basically this: for i in `seq 1 6` ; do gfs2_edit restoremeta /home/bob/fsckeattrtest$i.savemeta /dev/roth_vg/roth_lv sync gfs2_fsck /dev/roth_vg/roth_lv done The fsckeattrtest{1,2,3,4,5,6}.savemeta files are saved off metadata from the file system in various states of extended attribute damage. File fsckeattrtest1.savemeta is undamaged, but it has a bunch of eattrs in various places. The rest are damaged in various ways, and fsck needs to be able to fix them. At any rate, the code is checked in and I'm marking it as modified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0350.html |