Bug 149706 - gfs_fsck seg fault after checking ref count
Summary: gfs_fsck seg fault after checking ref count
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs
Version: 4
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: AJ Lewis
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks: 144795
TreeView+ depends on / blocked
 
Reported: 2005-02-25 16:24 UTC by Corey Marthaler
Modified: 2010-01-12 03:03 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-03-14 23:18:19 UTC
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2005-02-25 16:24:04 UTC
Description of problem:
I ran fsck on four nodes to four different filesystems and hit this
seg fault on three of them.


morph-05: on /dev/gfs/gfs3
.
.
.
Checking reference count on inode at block 27474947
Checking reference count on inode at block 81844632
Checking reference count on inode at block 81710277
Checking reference count on inode at block 60097575
Checking reference count on inode at block 54619773
Checking reference count on inode at block 37394832
Found unlinked inode at 37394832
Locating/Creating lost and found directory
        Adjusting freemeta block count (178 -> 179).
        Adjusting used dinode block count (657 -> 656).
l+f directory at 25514
Added inode #37394832 to l+f dir
Checking reference count on inode at block 28443658
Segmentation fault


morph-04 on /dev/gfs/gfs2
.
.
.
Checking reference count on inode at block 263678
Checking reference count on inode at block 114926355
Checking reference count on inode at block 110600479
Found unlinked inode at 110600479
Locating/Creating lost and found directory
        Adjusting freemeta block count (107 -> 108).
        Adjusting used dinode block count (474 -> 473).
l+f directory at 32230
Added inode #110600479 to l+f dir
Checking reference count on inode at block 108896506
Checking reference count on inode at block 81765195
Checking reference count on inode at block 54699377
Checking reference count on inode at block 54540806
Checking reference count on inode at block 27790530
Checking reference count on inode at block 27788673
Checking reference count on inode at block 27535195
Checking reference count on inode at block 27283582
Checking reference count on inode at block 27207946
Checking reference count on inode at block 472286
Checking reference count on inode at block 422461
Checking reference count on inode at block 415425
Checking reference count on inode at block 352842
Segmentation fault


morph-02 on /dev/gfs/gfs0
.
.
.
Checking reference count on inode at block 54622778
Checking reference count on inode at block 27592053
Checking reference count on inode at block 27536744
Checking reference count on inode at block 27474947
Checking reference count on inode at block 81844632
Checking reference count on inode at block 81710277
Checking reference count on inode at block 60097575
Checking reference count on inode at block 54619773
Checking reference count on inode at block 37394832
Found unlinked inode at 37394832
Locating/Creating lost and found directory
        Adjusting freemeta block count (178 -> 179).
        Adjusting used dinode block count (657 -> 656).
l+f directory at 25514
Added inode #37394832 to l+f dir
Checking reference count on inode at block 28443658
Segmentation fault


Version-Release number of selected component (if applicable):
GFS fsck 6.1-0.pre16 (built Feb 23 2005 17:55:46)
Copyright (C) Red Hat, Inc.  2004-2005  All rights reserved.


How reproducible:
Sometimes

Comment 1 AJ Lewis 2005-02-25 16:34:34 UTC
Neat - did you do anything special to the filesystems before running the fsck? 
Load, crash, etc?  How big are the filesystems?

Comment 2 Corey Marthaler 2005-02-25 17:02:20 UTC
I had been running revolver with a heavy load so there was a lot of
I/O and nodes going up and down before I rebooted everyone and
attempted to fsck all the filesystems. The file systems are each 518G.

Comment 3 AJ Lewis 2005-02-25 17:06:44 UTC
Bleh - you might be running out of memory, and I'm not detecting it until the
NULL pointer is accessed.  Do you see anything before that that says "Unable to
allocate" anywhere?

Comment 4 Corey Marthaler 2005-02-25 17:09:07 UTC
no out of the ordinary messages on any of the nodes, at least as far
back as the scroll buffer goes.

Comment 5 Corey Marthaler 2005-02-25 20:59:10 UTC
I reproduced this by running the exact same senario, this time I only
chose one node and one filesystem to fsck.


Checking reference count on inode at block 72709144
Checking reference count on inode at block 262810
Checking reference count on inode at block 72709161
Checking reference count on inode at block 54362216
Checking reference count on inode at block 72774653
Checking reference count on inode at block 197100
Checking reference count on inode at block 54625627
Checking reference count on inode at block 36497898
Found unlinked inode at 36497898
Locating/Creating lost and found directory
        Adjusting freemeta block count (62 -> 63).
        Adjusting used dinode block count (197 -> 196).
l+f directory at 1104
Added inode #36497898 to l+f dir
Checking reference count on inode at block 72774624
Checking reference count on inode at block 54625515
Segmentation fault


Comment 6 Kiersten (Kerri) Anderson 2005-03-02 20:54:55 UTC
Blocker bug - added it to the list

Comment 7 AJ Lewis 2005-03-02 21:24:15 UTC
Haven't been able to reproduce this with the same setup - setting to NEEDINFO

Comment 8 AJ Lewis 2005-03-03 20:51:36 UTC
Crap - I am seeing this.  There is no longer a segfault, because the invalid
memory reference is now being checked.  What used to cause a segfault now prints
the following error:
Unable to find l+f inode in inode_hash!!

So, I know why it was segfaulting, now i just need to figure out why i can't
find the l+f inode in the hash.

Comment 9 AJ Lewis 2005-03-03 21:55:55 UTC
Fix will be in next build.

Comment 10 AJ Lewis 2005-03-07 22:17:37 UTC
This fix is in the 3/4 build.

Comment 11 Corey Marthaler 2005-03-14 23:18:19 UTC
fix verified.


Note You need to log in before you can comment on or make changes to this bug.