Bug 1352061 - btrfs: stat reports the st_blocks with delay (data loss in archivers)
Summary: btrfs: stat reports the st_blocks with delay (data loss in archivers)
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 24
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-01 14:10 UTC by Pavel Raiskup
Modified: 2016-07-04 04:34 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-01 15:23:23 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Reproducer. (10.00 KB, application/x-tar)
2016-07-01 14:26 UTC, Pavel Raiskup
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 757557 0 unspecified CLOSED GNU tar -S eats data when storing files from btrfs 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1024095 0 unspecified CLOSED tar --sparse silently corrupts files on filesystems where non-empty files may have zero blocks 2021-02-22 00:41:40 UTC

Internal Links: 757557 1024095

Description Pavel Raiskup 2016-07-01 14:10:54 UTC
There are optimizations in data archivers (tar, rsync, ...) that rely on
st_blocks info.

It looks like btrfs doesn't show correct value in 'st_blocks' until the data are
synced.  For example, in tar there is optimization checking whether the
'st_size' reports more data than the 'st_blocks' can hold --> then tar considers
that file is sparse and does additional steps.

ATM, there happens that:

    a) some "tool" creates sparse file
    b) that tool does not flush explicitly
    c) tar is called immediately to archive that sparse file
    d) tar considers [1] the file is completely sparse (because st_blocks is
       zero) and archives no data.  Here comes data loss.

Because we fixed 'btrfs' to report non-zero 'st_blocks' when the file data are
in-lined.  I consider this is really bug btrfs worth fixing.

[1] http://git.savannah.gnu.org/cgit/paxutils.git/tree/lib/system.h?id=ec72abd9dd63bbff4534ec77e97b1a6cadfc3cf8#n392

Comment 1 Pavel Raiskup 2016-07-01 14:26:52 UTC
Created attachment 1174941 [details]
Reproducer.

$ make
    gcc main.c -O0 -g3 -o binary
    test `stat --file-system --format '%T' main.c` = btrfs
    while : ; do ./reproducer; done
    stat reported zero blocks!
    still zero blocks
    (mostly?) synced
    stat reported zero blocks!
    (mostly?) synced
    stat reported zero blocks!
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    still zero blocks
    (mostly?) synced
    ....


Expected output:
    gcc main.c -O0 -g3 -o binary
    test `stat --file-system --format '%T' main.c` = btrfs
    while : ; do ./reproducer; done
    (busy loop here)

Comment 2 Josh Boyer 2016-07-01 15:23:23 UTC
Please report this directly to the upstream btrfs maintainers:

Chris Mason <clm> (maintainer:BTRFS FILE SYSTEM)
Josef Bacik <jbacik> (maintainer:BTRFS FILE SYSTEM)
David Sterba <dsterba> (maintainer:BTRFS FILE SYSTEM)
linux-btrfs.org (open list:BTRFS FILE SYSTEM)
linux-kernel.org (open list)

Comment 3 Pavel Raiskup 2016-07-04 04:34:15 UTC
Upstream thread:
https://mail-archive.com/linux-btrfs@vger.kernel.org/msg55383.html


Note You need to log in before you can comment on or make changes to this bug.