450641 – gfs2 in 2.6.26-rc2 appears busted; data corruption, wrong statfs info

Bug 450641 - gfs2 in 2.6.26-rc2 appears busted; data corruption, wrong statfs info

Summary: gfs2 in 2.6.26-rc2 appears busted; data corruption, wrong statfs info

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Ben Marzinski
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-06-10 03:34 UTC by Eric Sandeen
Modified:	2008-06-30 19:47 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-06-30 19:47:34 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
patch to fix the block allocation (517 bytes, patch) 2008-06-17 21:58 UTC, Ben Marzinski	no flags	Details \| Diff
View All

Description Eric Sandeen 2008-06-10 03:34:01 UTC

Make a 2T local / single-node gfs2 filesystem:

[root@east-10 ~]# mkfs.gfs2 -p lock_nolock /dev/sdc
This will destroy any data on /dev/sdc.
  It appears to contain a ext3 filesystem.

Are you sure you want to proceed? [y/n] y 

Device:                    /dev/sdc
Blocksize:                 4096
Device Size                2326.31 GB (609827840 blocks)
Filesystem Size:           2326.31 GB (609827839 blocks)
Journals:                  1
Resource Groups:           9306
Locking Protocol:          "lock_nolock"
Lock Table:                ""


Mount it:

[root@east-10 ~]# mount /dev/sdc /mnt/test

Write a 1G file (using xfs_io to write a pattern, 0x01 in this case):

[root@east-10 ~]# xfs_io -f -F -c "pwrite -S 1 0 1G" /mnt/test/file
wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 262144 ops; 0:00:17.00 (57.244 MiB/sec and 14654.5153 ops/sec)

All done.  Check df:

[root@east-10 ~]# df -h /mnt/test
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdc              2.3T     0  2.3T   0% /mnt/test

0 blocks used?

The file claims to be using space:

[root@east-10 ~]# du -hc /mnt/test/*
1.1G	/mnt/test/file
1.1G	total

unmount, remount:

[root@east-10 ~]# umount /mnt/test
[root@east-10 ~]# mount /dev/sdc /mnt/test

Check file contents, find large swath of 0s where our data should be:

[root@east-10 ~]# hexdump -C /mnt/test/file
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
3c057000  01 01 01 01 01 01 01 01  01 01 01 01 01 01 01 01  |................|
*
40000000

Try repairing the filesystem, find corruption:

[root@east-10 ~]# gfs2_fsck /dev/sdc
Initializing fsck
Recovering journals (this may take a while).
Journal recovery complete.
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Inode 33342 (0x823e): Ondisk block count (262664) does not match what fsck found
(16333)
Fix ondisk block count? (y/n) y
Pass1 complete      
Starting pass1b
...

Versions:

[root@east-10 ~]# rpm -q gfs2-utils
gfs2-utils-0.1.44-1.el5
[root@east-10 ~]# uname -a
Linux east-10 2.6.26-rc2 #3 SMP Mon Jun 9 11:20:13 CDT 2008 x86_64 x86_64 x86_64
GNU/Linux


-Eric

Comment 1 Steve Whitehouse 2008-06-10 10:26:48 UTC

Ben, this might be a clue to what you are looking for.

Comment 2 Ben Marzinski 2008-06-12 00:31:58 UTC

Well, the reason that there is no pattern until byte 1006989312 (0x3c057000) is
because the first 483 pointers in the indirect pointer block are all zero,
according to gfs2_edit. 483 * 509 * 4096 = 1006989312.  Now the real question is
why are the first 483 pointers all zero. That I'm still looking into.
Incidentally, When I try to grow a file to this size, I get the exact same
thing, no data until byte 1006989312. It happens somewhere between growing the
file to 100Mb and 1000Mb. This should make it pretty easy to track down.

Comment 3 Steve Whitehouse 2008-06-13 15:58:45 UTC

I wonder whether the data thats left is the data from the start of the file or
the data which actually belongs in that place. If the former, then I suspect
that the order of addition of the new indirect blocks to the metadata tree might
be to blame.

Comment 4 Ben Marzinski 2008-06-17 21:58:14 UTC

Created attachment 309673 [details]
patch to fix the block allocation

This patch changes the computation for zero_metapath_length(). When you are
extending the metadata tree, The indirect blocks that point to the new data
block must either diverge from the existing tree either at the inode, or at the
first indirect block. They can diverge at the first indirect block because the
inode has room for 483 pointers while the indirect blocks have room for 509
pointers, so when the tree is grown, there is some free space in the first
indirect block. What zero_metapath_length now computes is the height where the
first indirect block for the new data block is located.  It can either be 1 (if
the indirect block diverges from the inode) or 2 (if it diverges from the first
indirect block).

Comment 5 Steve Whitehouse 2008-06-25 13:00:56 UTC

The patch is now upstream in Linus' kernel. Can we close this bz now, or are
there other issues still left unresolved?

Comment 6 Eric Sandeen 2008-06-25 13:47:20 UTC

Perhaps I should have filed 2 bugs; does the statfs issue remain?

-Eric

Comment 7 Ben Marzinski 2008-06-26 16:04:15 UTC

It works fine for me.

Comment 8 Steve Whitehouse 2008-06-30 11:38:09 UTC

So we ought to be able to close this now?

Note You need to log in before you can comment on or make changes to this bug.