Description of problem: while backing up my btrfs filesystem with GNU tar yesterday, noticed that small files were not stored correctly: the content was replaced by zeroes, presumably because they were mis-detected as sparse files. Version-Release number of selected component (if applicable): tar-1.26-2.fc16.x86_64 kernel-3.2.0-0.rc2.git6.1.fc17.x86_64 How reproducible: Steps to Reproduce: 1. take a small file (IIRC problem does not happen immediately for newly-created file, so use an existing one or wait a bit...) 2. pack it with: tar -cS 3. tar file contains "sparse" file, its content is lost Actual results: # tar -cS test | (cd /tmp && tar -xf - test && cat test) # tar -c test | (cd /tmp && tar -xf - test && cat test) foo Expected results: # tar -cS test | (cd /tmp && tar -xf - test && cat test) foo # tar -c test | (cd /tmp && tar -xf - test && cat test) foo Additional info:
don't know if this problem is in GNU tar or in btrfs...
This could be related to the following optimization: http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00038.html Does stat return non-zero count of blocks for the file that causes problems?
very interesting find, Kamil! a bit of googling finds this: http://pubs.opengroup.org/onlinepubs/009604599/basedefs/sys/stat.h.html blkcnt_t st_blocks Number of blocks allocated for this object. AFAIK btrfs stores "small" files inside the metadata tree, so they take up 0 filesystem data blocks. so it is entirely plausible that this patch which you found is the reason why these files are mis-detected as entirely sparse files. perhaps it could be fixed by handling files whose size is < 1 blocksize as not sparse? there aren't big savings in that case anyway... or perhaps the better solution would be that btrfs stat reports 1 block allocated in this case? sorry but i can't try anything out because the reason why i did the backup and why i actually verified it is that i replaced the btrfs on my laptop with ext4 because it was unusably slow, and right now i don't have a btrfs anywhere...
dtardon->kdudka: I tested that (on a newly created, loop-mounted btrfs filesystem; I am not crazy enough to use btrfs on my machine .-), with the following results: echo hello > hello.txt stat hello.txt File: `hello.txt' Size: 6 Blocks: 8 IO Block: 4096 regular file Device: 29h/41d Inode: 259 Links: 1 Access: (0664/-rw-rw-r--) Uid: ( 501/ dtardon) Gid: ( 501/ dtardon) Access: 2011-12-13 06:25:36.928032479 +0100 Modify: 2011-12-13 06:26:41.111524638 +0100 Change: 2011-12-13 06:26:41.111524638 +0100 Birth: - vim hello.txt # edit & save stat hello.txt File: `hello.txt' Size: 6 Blocks: 0 IO Block: 4096 regular file Device: 29h/41d Inode: 262 Links: 1 Access: (0664/-rw-rw-r--) Uid: ( 501/ dtardon) Gid: ( 501/ dtardon) Access: 2011-12-13 06:32:51.581716486 +0100 Modify: 2011-12-13 06:32:51.581716486 +0100 Change: 2011-12-13 06:32:51.581716486 +0100 Birth: -
(In reply to comment #4) > stat hello.txt > File: `hello.txt' > Size: 6 Blocks: 0 IO Block: 4096 regular file Thank you for testing it. The above confirms that files with zero blocks but non-zero size may appear on btrfs. Those would be mistakenly detected as sparse files by tar -S. (In reply to comment #3) > http://pubs.opengroup.org/onlinepubs/009604599/basedefs/sys/stat.h.html > blkcnt_t st_blocks Number of blocks allocated for this object. > > AFAIK btrfs stores "small" files inside the metadata tree, > so they take up 0 filesystem data blocks. > so it is entirely plausible that this patch which you found > is the reason why these files are mis-detected as entirely sparse files. Thank you for the pointer and the explanation. > perhaps it could be fixed by handling files whose size > is < 1 blocksize as not sparse? > there aren't big savings in that case anyway... This means to revert aforementioned optimization patch if I am not mistaken. > or perhaps the better solution would be that btrfs stat > reports 1 block allocated in this case? This would solve the problem without decreasing the performance, which sounds even better. We should probably notify btrfs guys about this issue.
Josef, is there a way to address the issue in btrfs such that it does not return zero count of blocks for files with non-zero data inside?
Created attachment 546359 [details] patch to fix the problem Yup sorry about that, we were just doing bytes >> 9 for blocks which doesn't work out so well if bytes > 512 bytes. So this should fix it to always say 1 block for something that's less than 512 bytes. Please verify this fixes the problem for you.
As this seems to be stalled - Josef - was this already applied to F16 kernel? Should I reassign it to you and kernel component? I assume there will be no change in tar required, once the btrfs behaviour of stat->blocks is fixed.
This was fixed upstream with fadc0d8be4dfca80f6c568bc5874931893c6709b I assume its in the f16 kernel.