Bug 200883 - gfs_fsck seg faults
gfs_fsck seg faults
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
4
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Robert Peterson
GFS Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-08-01 06:35 EDT by Stephen Willey
Modified: 2010-01-11 22:12 EST (History)
1 user (show)

See Also:
Fixed In Version: RHBA-2007-0139
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-10 16:59:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Stephen Willey 2006-08-01 06:35:21 EDT
Description of problem:

gfs_fsck uses all available RAM and swap before seg faulting.

Version-Release number of selected component (if applicable):


How reproducible:

Every time the filesystem is checked.

Steps to Reproduce:
1. gfs_fsck -y /dev/blah
2.
3.
  
Actual results:

Started checking because the following errors were appearing:

GFS: fsid=nearlineA:gfs1.0: fatal: invalid metadata block
GFS: fsid=nearlineA:gfs1.0:   bh = 2644310219 (type: exp=4, found=5)
GFS: fsid=nearlineA:gfs1.0:   function = gfs_get_meta_buffer
GFS: fsid=nearlineA:gfs1.0:   file =
/usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/gfs/dio.c, line = 1223
GFS: fsid=nearlineA:gfs1.0:   time = 1154425344
GFS: fsid=nearlineA:gfs1.0: about to withdraw from the cluster
GFS: fsid=nearlineA:gfs1.0: waiting for outstanding I/O
GFS: fsid=nearlineA:gfs1.0: telling LM to withdraw
lock_dlm: withdraw abandoned memory
GFS: fsid=nearlineA:gfs1.0: withdrawn

And another instance:

GFS: fsid=nearlineA:gfs1.1: fatal: filesystem consistency error
GFS: fsid=nearlineA:gfs1.1:   inode = 2384574146/2384574146
GFS: fsid=nearlineA:gfs1.1:   function = dir_e_del
GFS: fsid=nearlineA:gfs1.1:   file =
/usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/gfs/dir.c, line = 1495
GFS: fsid=nearlineA:gfs1.1:   time = 1154393717
GFS: fsid=nearlineA:gfs1.1: about to withdraw from the cluster
GFS: fsid=nearlineA:gfs1.1: waiting for outstanding I/O
GFS: fsid=nearlineA:gfs1.1: telling LM to withdraw
lock_dlm: withdraw abandoned memory
GFS: fsid=nearlineA:gfs1.1: withdrawn

And running gfs_fsck -vvv -y /dev/blah would return:

Initializing fsck
Initializing lists...
Initializing special inodes...
Setting block ranges...
Creating a block list of size 11105160192...
Unable to allocate bitmap of size 1388145025
Segmentation fault
[root@ns1a ~]# gfs_fsck -vvv -y /dev/gfs1_vg/gfs1_lv
Initializing fsck
Initializing lists...
(bio.c:140)     Writing to 65536 - 16 4096
Initializing special inodes...
(file.c:45)     readi:  Offset (640) is >= the file size (640).
(super.c:208)   8 journals found.
(file.c:45)     readi:  Offset (7116576) is >= the file size (7116576).
(super.c:265)   74131 resource groups found.
Setting block ranges...
Creating a block list of size 11105160192...
(bitmap.c:68)   Allocated bitmap of size 5552580097 with 2 chunks per byte
Unable to allocate bitmap of size 1388145025
(block_list.c:72)       <backtrace> - block_list_create()
Segmentation fault


Expected results:


Additional info:

Filesystem is roughly 45Tb and compiled on x86_64 so we're gonna try adding a
137Gb swap disk to see if it gets any further.
Comment 1 Stephen Willey 2006-08-01 12:22:37 EDT
The fsck is now running after we added the 137Gb swap drive.  It appears to
consistently chew about 4Gb of RAM (sometimes higher) but it is working (for now).
Comment 2 Robert Peterson 2006-09-19 18:10:37 EDT
Without a major design change, gfs_fsck will always need memory
for its in-core bitmaps based on the size of the file system.

I looked into the possibility of using the journals as a 
scratch-pad for keeping bitmap information, but they're just not
big enough to do the job.

The memory requirements are approximately 230MB per terabyte of
storage, but that's variable based on the number of journals and
other things.  Therefore, the only way to get it to run is to add
memory or increase swap space as needed, as the customer did from 
comment #1.

However, gfs_fsck should not segfault when it runs out of memory.
Therefore I am fixing gfs_fsck so that it doesn't segfault, but 
rather reports the problem, how much additional memory is required
(rounding up to be on the safe side) and exits gracefully.
Comment 5 Red Hat Bugzilla 2007-05-10 16:59:29 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0139.html

Note You need to log in before you can comment on or make changes to this bug.