Bug 179069 - gfs_fsck unable to fix file system
gfs_fsck unable to fix file system
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Robert Peterson
GFS Bugs
:
Depends On:
Blocks: 180185
  Show dependency treegraph
 
Reported: 2006-01-26 18:00 EST by Henry Harris
Modified: 2010-01-11 22:09 EST (History)
0 users

See Also:
Fixed In Version: RHBA-2006-0560
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-08-10 17:28:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Output from running gfs_fsck (308 bytes, text/plain)
2006-01-26 18:02 EST, Henry Harris
no flags Details
Patch to fix the problem (42.17 KB, patch)
2006-03-31 16:54 EST, Robert Peterson
no flags Details | Diff
Better patch to fix the problem (52.03 KB, patch)
2006-04-19 10:10 EDT, Robert Peterson
no flags Details | Diff

  None (edit)
Description Henry Harris 2006-01-26 18:00:36 EST
Description of problem: Ran gfs_fsck but it was unable to fix the file system.
See attachment gfs_fsck.out.  The nodes in the cluster had crashed as 
described in additional info below.  We were getting I/O errors trying to 
access GFS file system so we ran gfs_fsck.


Version-Release number of selected component (if applicable):


How reproducible:
Only happenend once on all three nodes in cluster

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
1)  The same process/stack on both crashes was identical
     called 'o5_wait_for_sys' whose parent process is 'hotplug'
 
2) The stack crash function was in the read() system call
    and looked like:
 
show_cfsmnt()
seq_read()
vfs_read()
sys_read()  <---- crash w /null pointer
Comment 1 Henry Harris 2006-01-26 18:02:46 EST
Created attachment 123750 [details]
Output from running gfs_fsck
Comment 2 Robert Peterson 2006-01-30 17:51:58 EST
The gfs_fsck messages indicate corruption in the gfs resource group information
for the filesystem.  It's nearly impossible to say whether the crash caused the
corruption or whether the corruption caused the crash.

Is there any way I can get a copy of the corrupted filesystem to examine?
I'd like to see the corruption first-hand, if possible.  Sometimes the only way
to find a smoking gun is to find the embedded bullet and follow the trail of
smoke backward to its source.
Comment 3 Henry Harris 2006-01-31 11:05:24 EST
Unfortunately, the file system has been recreated and is no longer in the 
corrupted state.
Comment 4 Robert Peterson 2006-03-31 16:54:08 EST
Created attachment 127163 [details]
Patch to fix the problem

Attached is an extensive patch that attempts to fix corrupted RGs and
corrupted RG Index entries.  Several rudimentary tests have been run
on a variety of conditions under which rgs and rgindex entries were
purposely corrupted.  The patch seems to work properly in all cases
tested.
Comment 5 Robert Peterson 2006-04-19 10:10:27 EDT
Created attachment 127984 [details]
Better patch to fix the problem

This patch is much better.  Several code problems from the previous
patch were found and fixed.  This version has passed a newly designed
battery of test cases that use gfs_fsck to fix 245 different 
variations of:

(1) filesystem size, (2) number of journals, (3) location of RG 
corruption, (4) location of RG index corruption, (5) filesystem
resizing by gfs_grow, and (6) RG size and number of RGs.

I won't promise that it can fix all forms of RG and RG index
corruption, but it does pretty well.
Comment 8 Red Hat Bugzilla 2006-08-10 17:28:45 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0560.html

Note You need to log in before you can comment on or make changes to this bug.