Bug 186125 - gfs_fsck on GFS 6.1 leaves volume in an unmountable state
gfs_fsck on GFS 6.1 leaves volume in an unmountable state
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Robert Peterson
GFS Bugs
:
: 191708 (view as bug list)
Depends On:
Blocks: 180185
  Show dependency treegraph
 
Reported: 2006-03-21 14:35 EST by Issue Tracker
Modified: 2010-01-11 22:10 EST (History)
3 users (show)

See Also:
Fixed In Version: RHBA-2006-0560
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-08-10 17:28:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
sysreport from cluster machine (823.87 KB, application/octet-stream)
2006-03-21 14:37 EST, Jeff Layton
no flags Details
Patch to fix the problem (651 bytes, patch)
2006-03-23 15:33 EST, Robert Peterson
no flags Details | Diff

  None (edit)
Description Issue Tracker 2006-03-21 14:35:20 EST
Escalated to Bugzilla from IssueTracker
Comment 8 Jeff Layton 2006-03-21 14:37:40 EST
Created attachment 126424 [details]
sysreport from cluster machine
Comment 12 Robert Peterson 2006-03-22 17:23:03 EST
I was able to recreate this failure using a sparse device and gdb.
The problem should be easy to figure out and, with a little luck, easy to fix.
Comment 13 Robert Peterson 2006-03-23 15:33:45 EST
Created attachment 126571 [details]
Patch to fix the problem

This problem occurs if gfs_fsck is run on a filesystem that is more
than 5TB in size, and possibly under other conditions.	What really
triggers it is when the hidden/internal resource group index file
(rgindex) has more than one level of indirection.  In my recreation,
the rgindex had 20888 resource group entries.

While gfs_fsck is running, it changes the lock protocol to ensure
that noone can mount the filesystem as it is being checked.
Once the error has occurred and gfs_fsck has given up on a filesystem, 
it exits, but forgets to change the lock protocol back to lock_dlm,
thus making the filesystem unusable.  To make the filesystem usable
again, merely change the protocol back to lock_dlm as follows:

gfs_tool sb /dev/<your device> proto lock_dlm

The problem described by this bug was caused by an omission from
gfs_fsck's bitmap code that appears to be complete in the gfs kernel
code.  This patch attempts to fix the problem.	It's been tested and
works on a 5.1TB sparse device filesystem (basically, faking out the
device mapper to think the filesystem is bigger than it really is).

The fix is basically to handle the level of indirection needed rather
than exiting.
Comment 21 Robert Peterson 2006-05-15 09:44:14 EDT
*** Bug 191708 has been marked as a duplicate of this bug. ***
Comment 25 Red Hat Bugzilla 2006-08-10 17:28:56 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0560.html

Note You need to log in before you can comment on or make changes to this bug.