Bug 450641

Summary: gfs2 in 2.6.26-rc2 appears busted; data corruption, wrong statfs info
Product: [Fedora] Fedora Reporter: Eric Sandeen <esandeen>
Component: kernelAssignee: Ben Marzinski <bmarzins>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: rawhideCC: bmarzins, rpeterso, swhiteho
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-06-30 19:47:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch to fix the block allocation none

Description Eric Sandeen 2008-06-10 03:34:01 UTC
Make a 2T local / single-node gfs2 filesystem:

[root@east-10 ~]# mkfs.gfs2 -p lock_nolock /dev/sdc
This will destroy any data on /dev/sdc.
  It appears to contain a ext3 filesystem.

Are you sure you want to proceed? [y/n] y 

Device:                    /dev/sdc
Blocksize:                 4096
Device Size                2326.31 GB (609827840 blocks)
Filesystem Size:           2326.31 GB (609827839 blocks)
Journals:                  1
Resource Groups:           9306
Locking Protocol:          "lock_nolock"
Lock Table:                ""


Mount it:

[root@east-10 ~]# mount /dev/sdc /mnt/test

Write a 1G file (using xfs_io to write a pattern, 0x01 in this case):

[root@east-10 ~]# xfs_io -f -F -c "pwrite -S 1 0 1G" /mnt/test/file
wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 262144 ops; 0:00:17.00 (57.244 MiB/sec and 14654.5153 ops/sec)

All done.  Check df:

[root@east-10 ~]# df -h /mnt/test
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdc              2.3T     0  2.3T   0% /mnt/test

0 blocks used?

The file claims to be using space:

[root@east-10 ~]# du -hc /mnt/test/*
1.1G	/mnt/test/file
1.1G	total

unmount, remount:

[root@east-10 ~]# umount /mnt/test
[root@east-10 ~]# mount /dev/sdc /mnt/test

Check file contents, find large swath of 0s where our data should be:

[root@east-10 ~]# hexdump -C /mnt/test/file
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
3c057000  01 01 01 01 01 01 01 01  01 01 01 01 01 01 01 01  |................|
*
40000000

Try repairing the filesystem, find corruption:

[root@east-10 ~]# gfs2_fsck /dev/sdc
Initializing fsck
Recovering journals (this may take a while).
Journal recovery complete.
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Inode 33342 (0x823e): Ondisk block count (262664) does not match what fsck found
(16333)
Fix ondisk block count? (y/n) y
Pass1 complete      
Starting pass1b
...

Versions:

[root@east-10 ~]# rpm -q gfs2-utils
gfs2-utils-0.1.44-1.el5
[root@east-10 ~]# uname -a
Linux east-10 2.6.26-rc2 #3 SMP Mon Jun 9 11:20:13 CDT 2008 x86_64 x86_64 x86_64
GNU/Linux


-Eric

Comment 1 Steve Whitehouse 2008-06-10 10:26:48 UTC
Ben, this might be a clue to what you are looking for.

Comment 2 Ben Marzinski 2008-06-12 00:31:58 UTC
Well, the reason that there is no pattern until byte 1006989312 (0x3c057000) is
because the first 483 pointers in the indirect pointer block are all zero,
according to gfs2_edit. 483 * 509 * 4096 = 1006989312.  Now the real question is
why are the first 483 pointers all zero. That I'm still looking into.
Incidentally, When I try to grow a file to this size, I get the exact same
thing, no data until byte 1006989312. It happens somewhere between growing the
file to 100Mb and 1000Mb. This should make it pretty easy to track down.

Comment 3 Steve Whitehouse 2008-06-13 15:58:45 UTC
I wonder whether the data thats left is the data from the start of the file or
the data which actually belongs in that place. If the former, then I suspect
that the order of addition of the new indirect blocks to the metadata tree might
be to blame.

Comment 4 Ben Marzinski 2008-06-17 21:58:14 UTC
Created attachment 309673 [details]
patch to fix the block allocation

This patch changes the computation for zero_metapath_length(). When you are
extending the metadata tree, The indirect blocks that point to the new data
block must either diverge from the existing tree either at the inode, or at the
first indirect block. They can diverge at the first indirect block because the
inode has room for 483 pointers while the indirect blocks have room for 509
pointers, so when the tree is grown, there is some free space in the first
indirect block. What zero_metapath_length now computes is the height where the
first indirect block for the new data block is located.  It can either be 1 (if
the indirect block diverges from the inode) or 2 (if it diverges from the first
indirect block).

Comment 5 Steve Whitehouse 2008-06-25 13:00:56 UTC
The patch is now upstream in Linus' kernel. Can we close this bz now, or are
there other issues still left unresolved?

Comment 6 Eric Sandeen 2008-06-25 13:47:20 UTC
Perhaps I should have filed 2 bugs; does the statfs issue remain?

-Eric

Comment 7 Ben Marzinski 2008-06-26 16:04:15 UTC
It works fine for me.

Comment 8 Steve Whitehouse 2008-06-30 11:38:09 UTC
So we ought to be able to close this now?