Description of problem: I ran fsck on four nodes to four different filesystems and hit this seg fault on three of them. morph-05: on /dev/gfs/gfs3 . . . Checking reference count on inode at block 27474947 Checking reference count on inode at block 81844632 Checking reference count on inode at block 81710277 Checking reference count on inode at block 60097575 Checking reference count on inode at block 54619773 Checking reference count on inode at block 37394832 Found unlinked inode at 37394832 Locating/Creating lost and found directory Adjusting freemeta block count (178 -> 179). Adjusting used dinode block count (657 -> 656). l+f directory at 25514 Added inode #37394832 to l+f dir Checking reference count on inode at block 28443658 Segmentation fault morph-04 on /dev/gfs/gfs2 . . . Checking reference count on inode at block 263678 Checking reference count on inode at block 114926355 Checking reference count on inode at block 110600479 Found unlinked inode at 110600479 Locating/Creating lost and found directory Adjusting freemeta block count (107 -> 108). Adjusting used dinode block count (474 -> 473). l+f directory at 32230 Added inode #110600479 to l+f dir Checking reference count on inode at block 108896506 Checking reference count on inode at block 81765195 Checking reference count on inode at block 54699377 Checking reference count on inode at block 54540806 Checking reference count on inode at block 27790530 Checking reference count on inode at block 27788673 Checking reference count on inode at block 27535195 Checking reference count on inode at block 27283582 Checking reference count on inode at block 27207946 Checking reference count on inode at block 472286 Checking reference count on inode at block 422461 Checking reference count on inode at block 415425 Checking reference count on inode at block 352842 Segmentation fault morph-02 on /dev/gfs/gfs0 . . . Checking reference count on inode at block 54622778 Checking reference count on inode at block 27592053 Checking reference count on inode at block 27536744 Checking reference count on inode at block 27474947 Checking reference count on inode at block 81844632 Checking reference count on inode at block 81710277 Checking reference count on inode at block 60097575 Checking reference count on inode at block 54619773 Checking reference count on inode at block 37394832 Found unlinked inode at 37394832 Locating/Creating lost and found directory Adjusting freemeta block count (178 -> 179). Adjusting used dinode block count (657 -> 656). l+f directory at 25514 Added inode #37394832 to l+f dir Checking reference count on inode at block 28443658 Segmentation fault Version-Release number of selected component (if applicable): GFS fsck 6.1-0.pre16 (built Feb 23 2005 17:55:46) Copyright (C) Red Hat, Inc. 2004-2005 All rights reserved. How reproducible: Sometimes
Neat - did you do anything special to the filesystems before running the fsck? Load, crash, etc? How big are the filesystems?
I had been running revolver with a heavy load so there was a lot of I/O and nodes going up and down before I rebooted everyone and attempted to fsck all the filesystems. The file systems are each 518G.
Bleh - you might be running out of memory, and I'm not detecting it until the NULL pointer is accessed. Do you see anything before that that says "Unable to allocate" anywhere?
no out of the ordinary messages on any of the nodes, at least as far back as the scroll buffer goes.
I reproduced this by running the exact same senario, this time I only chose one node and one filesystem to fsck. Checking reference count on inode at block 72709144 Checking reference count on inode at block 262810 Checking reference count on inode at block 72709161 Checking reference count on inode at block 54362216 Checking reference count on inode at block 72774653 Checking reference count on inode at block 197100 Checking reference count on inode at block 54625627 Checking reference count on inode at block 36497898 Found unlinked inode at 36497898 Locating/Creating lost and found directory Adjusting freemeta block count (62 -> 63). Adjusting used dinode block count (197 -> 196). l+f directory at 1104 Added inode #36497898 to l+f dir Checking reference count on inode at block 72774624 Checking reference count on inode at block 54625515 Segmentation fault
Blocker bug - added it to the list
Haven't been able to reproduce this with the same setup - setting to NEEDINFO
Crap - I am seeing this. There is no longer a segfault, because the invalid memory reference is now being checked. What used to cause a segfault now prints the following error: Unable to find l+f inode in inode_hash!! So, I know why it was segfaulting, now i just need to figure out why i can't find the l+f inode in the hash.
Fix will be in next build.
This fix is in the 3/4 build.
fix verified.