Bug 624691
Summary: | fsck.gfs2 deletes directories if they get too big | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Robert Peterson <rpeterso> | ||||||
Component: | cluster | Assignee: | Robert Peterson <rpeterso> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 6.0 | CC: | adas, bmarzins, cluster-maint, ddumas, fdinitto, lhh, rpeterso, rwheeler, ssaha, swhiteho | ||||||
Target Milestone: | rc | Keywords: | Regression | ||||||
Target Release: | 6.0 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | cluster-3.0.12-23.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | 624689 | ||||||||
: | 628013 (view as bug list) | Environment: | |||||||
Last Closed: | 2010-11-10 20:00:34 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 575968, 620384, 624689 | ||||||||
Bug Blocks: | 622576 | ||||||||
Attachments: |
|
Description
Robert Peterson
2010-08-17 14:00:14 UTC
I did some testing and discovered this bug does not exist in gfs1's fsck, gfs_fsck. So gfs_fsck does not have this problem. I also tested gfs2-utils-0.1.60-1.el5 and it does not have this problem, so it is, in fact, a regression. Here's how to recreate the failure and what it looks like: [root@kool ~]# mkfs.gfs2 -O -b512 -p lock_nolock -t "kool:bob" -j1 /dev/kool_vg/kool_bob Device: /dev/kool_vg/kool_bob Blocksize: 512 Device Size 40.00 GB (83886080 blocks) Filesystem Size: 40.00 GB (83886078 blocks) Journals: 1 Resource Groups: 160 Locking Protocol: "lock_nolock" Lock Table: "kool:bob" UUID: 05060249-E9BB-1DCF-C9C8-112EF09BD56C You have new mail in /var/spool/mail/root [root@kool ~]# sync [root@kool ~]# mount -tgfs2 /dev/kool_vg/kool_bob /mnt/bob [root@kool ~]# mkdir /mnt/bob/bob [root@kool ~]# for i in `seq 1 10000` ; do touch /mnt/bob/bob/file_name_$i ; done [root@kool ~]# !umo umount /mnt/bob [root@kool ~]# /sbin/fsck.gfs2 -V GFS2 fsck DEVEL.1274286054 (built May 19 2010 11:22:48) Copyright (C) Red Hat, Inc. 2004-2006 All rights reserved. [root@kool ~]# fsck.gfs2 /dev/kool_vg/kool_bob Initializing fsck Validating Resource Group index. Level 1 RG check. (level 1 passed) Starting pass1 Block 287425 (0x462c1) seems to be free space, but is marked as data in the bitmap. Okay to fix the bitmap? (y/n)y The bitmap was fixed. Block 287426 (0x462c2) seems to be free space, but is marked as data in the bitmap. Okay to fix the bitmap? (y/n)y The bitmap was fixed. Error: inode 269038 (0x41aee) has unrecoverable errors; invalidating. Block 269038 (0x41aee) seems to be free space, but is marked as inode in the bitmap. Okay to fix the bitmap? (y/n)y The bitmap was fixed. Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Directory entry 'bob' referencing inode 269038 (0x41aee) in dir inode 398 (0x18e) block type 0: was deleted or is not an inode. Clear directory entry to non-inode block? (y/n) Obviously, the fsck of the file system should come up clean. Created attachment 439170 [details]
Proposed patch for STABLE3
Here is the STABLE3 crosswrite patch. I'm testing it now.
I tested this patch and found it to be correct on system roth-08. I pushed the patch to the master branch of the gfs2-utils git tree, and the STABLE3, RHEL6, and RHEL60 branches of the cluster git tree for inclusion into 6.0. Changing status to POST until a new cluster package gets built. Now built with cherry pick from RHEL60 branch. Created attachment 441066 [details]
fsck.gfs2 core dump
I'm trying to verify this but I'm think I'm missing something from Bob's description. I did the mkfs w/ 512 byte block size:
mkfs -t gfs2 -O -b 512 -j 1 -p lock_dlm -t west:brawl0 /dev/brawl/brawl0
/dev/brawl/brawl0 is a 1G LV.
Then I ran a loop to fill a directory with entries
while touch bigdir/a_long_filename_to_make_quick_use_of_dentry_blocks_$i; do
let i=i+1
if ((i % 10000 == 0)); then
echo "$i"
height=`gfs2_edit -p $bigdir_ino field di_height /dev/brawl/brawl0`
if [ "$height" -eq 4 ]; then
break
fi
fi
done
This loop ran until the file system was filled and did not reach a di_height of 4.
...
760000
touch: cannot touch `bigdir/a_long_filename_to_make_quick_use_of_dentry_blocks_767252': No space left on device
[root@west-01 ~]# fsck.gfs2 -y /dev/brawl/brawl0; echo $?
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c completelete.
Starting pass2
Entries is 767253 - should be 442972 for inode block 269009 (0x41ad1)
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Found unlinked inode at 508636 (0x7c2dc)
Added inode #508636 (0x7c2dc) to lost+found dir
Found unlinked inode at 508665 (0x7c2f9)
Added inode #508665 (0x7c2f9) to lost+found dir
Found unlinked inode at 519865 (0x7eeb9)
Added inode #519865 (0x7eeb9) to lost+found dir
Found unlinked inode at 520069 (0x7ef85)
Segmentation fault (core dumped)
139
<abrt deleted core dump>
rerunning fsck.gfs2
[root@west-01 ~]# fsck.gfs2 -y /dev/brawl/brawl0; echo $?
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c completelete.
Starting pass2
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Found unlinked inode at 508636 (0x7c2dc)
Added inode #508636 (0x7c2dc) to lost+found dir
Found unlinked inode at 508665 (0x7c2f9)
Added inode #508665 (0x7c2f9) to lost+found dir
Found unlinked inode at 519865 (0x7eeb9)
Added inode #519865 (0x7eeb9) to lost+found dir
Found unlinked inode at 520069 (0x7ef85)
Segmentation fault (core dumped)
#0 bwrite (bh=0xcc03070000000000)
at /usr/src/debug/cluster-3.0.12/gfs2/libgfs2/buf.c:63
#1 0x000000000041d80d in dir_make_exhash (dip=0x534c1d0,
filename=0x7fff14898640 "lost_file_520069", len=16, inum=0xd0d2e0, type=8)
at /usr/src/debug/cluster-3.0.12/gfs2/libgfs2/fs_ops.c:1193
#2 dir_l_add (dip=0x534c1d0, filename=0x7fff14898640 "lost_file_520069",
len=16, inum=0xd0d2e0, type=8)
at /usr/src/debug/cluster-3.0.12/gfs2/libgfs2/fs_ops.c:1202
#3 dir_add (dip=0x534c1d0, filename=0x7fff14898640 "lost_file_520069",
len=16, inum=0xd0d2e0, type=8)
at /usr/src/debug/cluster-3.0.12/gfs2/libgfs2/fs_ops.c:1220
#4 0x0000000000406c3d in add_inode_to_lf (ip=0xd0d2c0)
at /usr/src/debug/cluster-3.0.12/gfs2/fsck/lost_n_found.c:168
#5 0x000000000041549c in scan_inode_list (sbp=0x7fff14898820)
at /usr/src/debug/cluster-3.0.12/gfs2/fsck/pass4.c:130
#6 pass4 (sbp=0x7fff14898820)
at /usr/src/debug/cluster-3.0.12/gfs2/fsck/pass4.c:195
#7 0x0000000000407a27 in main (argc=<value optimized out>,
argv=<value optimized out>)
at /usr/src/debug/cluster-3.0.12/gfs2/fsck/main.c:311
I reproduced this again w/o running out of space on the device. Once I get above 500,000 entries it truncates the directory and segfaults while moving stuff into lost+found. [root@morph-04 ~]# fsck.gfs2 -y /dev/morph-cluster/bigdir Initializing fsck Validating Resource Group index. Level 1 RG check. (level 1 passed) Starting pass1 Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c completeete.. Starting pass2 root inode 398 (0x18e): Entries is 516033 - should be 408582 Entries updated Pass2 complete Starting pass3 Pass3 complete Starting pass4 Found unlinked inode at 377457 (0x5c271) Added inode #377457 (0x5c271) to lost+found dir Found unlinked inode at 377486 (0x5c28e) Added inode #377486 (0x5c28e) to lost+found dir Found unlinked inode at 388686 (0x5ee4e) Added inode #388686 (0x5ee4e) to lost+found dir Found unlinked inode at 388890 (0x5ef1a) Segmentation fault (core dumped) I ran the new test against the old fsck.gfs2 and found that what I found in comment #7 is different than the original report. I was able to get much further with gfs2-utils-3.0.12-23.el6 than gfs2-utils-3.0.12-21.el6. I consider the fix good and this bug verified. I moved the issue in command #7 to 628013. Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |