Bug 757557 - GNU tar -S eats data when storing files from btrfs
Summary: GNU tar -S eats data when storing files from btrfs
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: tar
Version: 16
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Ondrej Vasik
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-27 19:15 UTC by Michael Stahl
Modified: 2016-07-01 14:11 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-04-20 18:07:19 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
patch to fix the problem (1.50 KB, patch)
2011-12-13 18:53 UTC, Josef Bacik
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1352061 0 unspecified CLOSED btrfs: stat reports the st_blocks with delay (data loss in archivers) 2021-02-22 00:41:40 UTC

Internal Links: 1352061

Description Michael Stahl 2011-11-27 19:15:20 UTC
Description of problem:
while backing up my btrfs filesystem with GNU tar yesterday,
noticed that small files were not stored correctly:
the content was replaced by zeroes, presumably because
they were mis-detected as sparse files.

Version-Release number of selected component (if applicable):
tar-1.26-2.fc16.x86_64
kernel-3.2.0-0.rc2.git6.1.fc17.x86_64

How reproducible:


Steps to Reproduce:
1. take a small file (IIRC problem does not happen immediately for newly-created file, so use an existing one or wait a bit...)
2. pack it with: tar -cS 
3. tar file contains "sparse" file, its content is lost
  
Actual results:
# tar -cS test | (cd /tmp && tar -xf - test && cat test)
# tar -c test | (cd /tmp && tar -xf - test && cat test)
foo


Expected results:
# tar -cS test | (cd /tmp && tar -xf - test && cat test)
foo
# tar -c test | (cd /tmp && tar -xf - test && cat test)
foo

Additional info:

Comment 1 Michael Stahl 2011-11-27 19:18:38 UTC
don't know if this problem is in GNU tar or in btrfs...

Comment 2 Kamil Dudka 2011-12-12 13:30:55 UTC
This could be related to the following optimization:

http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00038.html

Does stat return non-zero count of blocks for the file that causes problems?

Comment 3 Michael Stahl 2011-12-12 21:21:02 UTC
very interesting find, Kamil!

a bit of googling finds this:

http://pubs.opengroup.org/onlinepubs/009604599/basedefs/sys/stat.h.html
 blkcnt_t  st_blocks  Number of blocks allocated for this object.

AFAIK btrfs stores "small" files inside the metadata tree,
so they take up 0 filesystem data blocks.
so it is entirely plausible that this patch which you found
is the reason why these files are mis-detected as entirely sparse files.

perhaps it could be fixed by handling files whose size
is < 1 blocksize as not sparse?
there aren't big savings in that case anyway...

or perhaps the better solution would be that btrfs stat
reports 1 block allocated in this case?

sorry but i can't try anything out because the reason why
i did the backup and why i actually verified it is that
i replaced the btrfs on my laptop with ext4 because it was
unusably slow, and right now i don't have a btrfs anywhere...

Comment 4 David Tardon 2011-12-13 05:37:27 UTC
dtardon->kdudka: I tested that (on a newly created, loop-mounted btrfs filesystem; I am not crazy enough to use btrfs on my machine .-), with the following results:

echo hello > hello.txt
stat hello.txt 
  File: `hello.txt'
  Size: 6         	Blocks: 8          IO Block: 4096   regular file
Device: 29h/41d	Inode: 259         Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  501/ dtardon)   Gid: (  501/ dtardon)
Access: 2011-12-13 06:25:36.928032479 +0100
Modify: 2011-12-13 06:26:41.111524638 +0100
Change: 2011-12-13 06:26:41.111524638 +0100
 Birth: -

vim hello.txt
# edit & save
stat hello.txt 
  File: `hello.txt'
  Size: 6         	Blocks: 0          IO Block: 4096   regular file
Device: 29h/41d	Inode: 262         Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  501/ dtardon)   Gid: (  501/ dtardon)
Access: 2011-12-13 06:32:51.581716486 +0100
Modify: 2011-12-13 06:32:51.581716486 +0100
Change: 2011-12-13 06:32:51.581716486 +0100
 Birth: -

Comment 5 Kamil Dudka 2011-12-13 12:58:14 UTC
(In reply to comment #4)
> stat hello.txt 
>   File: `hello.txt'
>   Size: 6          Blocks: 0          IO Block: 4096   regular file

Thank you for testing it.  The above confirms that files with zero blocks but non-zero size may appear on btrfs.  Those would be mistakenly detected as sparse files by tar -S.

(In reply to comment #3)
> http://pubs.opengroup.org/onlinepubs/009604599/basedefs/sys/stat.h.html
>  blkcnt_t  st_blocks  Number of blocks allocated for this object.
> 
> AFAIK btrfs stores "small" files inside the metadata tree,
> so they take up 0 filesystem data blocks.
> so it is entirely plausible that this patch which you found
> is the reason why these files are mis-detected as entirely sparse files.

Thank you for the pointer and the explanation.

> perhaps it could be fixed by handling files whose size
> is < 1 blocksize as not sparse?
> there aren't big savings in that case anyway...

This means to revert aforementioned optimization patch if I am not mistaken.

> or perhaps the better solution would be that btrfs stat
> reports 1 block allocated in this case?

This would solve the problem without decreasing the performance, which sounds even better.  We should probably notify btrfs guys about this issue.

Comment 7 Kamil Dudka 2011-12-13 16:45:58 UTC
Josef, is there a way to address the issue in btrfs such that it does not return zero count of blocks for files with non-zero data inside?

Comment 8 Josef Bacik 2011-12-13 18:53:57 UTC
Created attachment 546359 [details]
patch to fix the problem

Yup sorry about that, we were just doing bytes >> 9 for blocks which doesn't work out so well if bytes > 512 bytes.  So this should fix it to always say 1 block for something that's less than 512 bytes.  Please verify this fixes the problem for you.

Comment 9 Ondrej Vasik 2012-04-20 09:00:26 UTC
As this seems to be stalled - Josef - was this already applied to F16 kernel? Should I reassign it to you and kernel component? I assume there will be no change in tar required, once the btrfs behaviour of stat->blocks is fixed.

Comment 10 Josef Bacik 2012-04-20 18:07:19 UTC
This was fixed upstream with

fadc0d8be4dfca80f6c568bc5874931893c6709b

I assume its in the f16 kernel.


Note You need to log in before you can comment on or make changes to this bug.