Created attachment 816894 [details] sample file that tar --sparse does not archive correctly Description of problem: When I create tar archives with the --sparse flag, some files are corrupted silently. I do not see this bug in 1.23, but it is present in 1.26 and in 1.27. Version-Release number of selected component (if applicable): tar-1.26-24.fc19.x86_64 How reproducible: A sample file is attached. Try the following: bash-4.2$ tar --version tar (GNU tar) 1.27 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by John Gilmore and Jay Fenlason. bash-4.2$ mkdir out bash-4.2$ tar --sparse -cf - tar.sparse.broken.file | tar -C out --sparse -xpBf - bash-4.2$ ls -l tar.sparse.broken.file out/tar.sparse.broken.file -r--r--r-- 1 ajs ead 35 Oct 28 15:12 out/tar.sparse.broken.file -r--r--r-- 1 schorr ead 35 Oct 28 15:12 tar.sparse.broken.file bash-4.2$ md5sum tar.sparse.broken.file out/tar.sparse.broken.file 20b4497c7bdc00effbb5ad65d04a3bc3 tar.sparse.broken.file c54104d7894a1941ca710981da437f9f out/tar.sparse.broken.file bash-4.2$ od -c tar.sparse.broken.file 0000000 037 213 \b \b 274 243 u Q 002 003 c u s t _ a 0000020 u d i t . t a g \0 003 \0 \0 \0 \0 \0 \0 0000040 \0 \0 \0 0000043 bash-4.2$ od -c out/tar.sparse.broken.file 0000000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000040 \0 \0 \0 0000043 Steps to Reproduce: 1. create a tar archive of the attached file using the --sparse flag 2. extract the archive 3. note that the extracted file does not match the archived file Actual results: The files are different. The extracted file contains only zero bytes. Expected results: The files should match. Additional info:
This worked in 1.23, but not in 1.24. I'd run git bisect except that I can't get the autotools to run properly.
I think I see the problem. The ChangeLog says, in part: 2010-08-25 Paul Eggert <eggert.edu> tar: optimize -c --sparse when file is entirely sparse * src/sparse.c (sparse_scan_file): If the file is entirely sparse, that is, if ST_NBLOCKS is zero, don't bother scanning for nonzero blocks. Idea by Kit Westneat, communicated by Bernd Schubert in <http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00038.html>. Also, omit unnecessary lseek at start of file. On my Network Appliance fileserver, a small file may have zero blocks even though it is not empty. In other words, this patch is not correct on some filesystems. The bug is occurring only on my Netapp filesystem, not on ext4.
Created attachment 816928 [details] fix --sparse on filesystems where small files may appear to have zero blocks This patch reverts the shortcut added here to decide that a file is empty if st_blocks is zero: http://git.savannah.gnu.org/cgit/tar.git/commit/?id=a9895fd20c957ce184091672f1623a5bedd82407 On some filesystem such as Netapp, small files are contained in the inode and have st_blocks set to zero. So this test is not reliable.
Note: the ST_IS_SPARSE macro in lib/system.h will also give some false positives for the same reason. I don't know if that matters...
Thanks for the report and for the report upstream, I see it was fixed already: http://lists.gnu.org/archive/html/bug-tar/2013-10/msg00031.html I'll backport that fix and submit an bodhi update.
https://lists.fedoraproject.org/pipermail/scm-commits/Week-of-Mon-20131028/1136016.html
(In reply to Andrew J. Schorr from comment #4) > Note: the ST_IS_SPARSE macro in lib/system.h will also give some false > positives for the same reason. I don't know if that matters... Sorry, I missed this note. That FP should result in dumping the file into tar as-is, not recognized & stored as sparse file. As this is just about files of size < 512 bytes, it should be OK.
tar-1.26-29.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/tar-1.26-29.fc20
tar-1.26-27.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/tar-1.26-27.fc19
Thanks for the prompt attention. I think the patch to ST_IS_SPARSE is probably OK, although I'm not 100% confident that the code in sparse.c:sparse_scan_file shouldn't be fixed as well. I guess if ST_IS_SPARSE is fixed, it may prevent the code from ever getting there. So maybe fixing ST_IS_SPARSE is enough...
Package tar-1.26-27.fc19: * should fix your issue, * was pushed to the Fedora 19 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing tar-1.26-27.fc19' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-20256/tar-1.26-27.fc19 then log in and leave karma (feedback).
tar-1.26-27.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.
tar-1.26-29.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.