RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2037839 - tar: error extracting sparse file >= 8G archived by bsdtar
Summary: tar: error extracting sparse file >= 8G archived by bsdtar
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: libarchive
Version: 8.5
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: rc
: ---
Assignee: Lukas Javorsky
QA Contact: Vaclav Danek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-06 16:09 UTC by zzambers
Modified: 2022-11-08 12:50 UTC (History)
8 users (show)

Fixed In Version: libarchive-3.3.3-4.el8
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-08 10:54:59 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
tarissue.sh (165 bytes, application/x-shellscript)
2022-01-06 16:09 UTC, zzambers
no flags Details
archive.tar.gz (388 bytes, application/gzip)
2022-01-06 16:12 UTC, zzambers
no flags Details
archives-compare.png (154.32 KB, image/png)
2022-01-10 15:05 UTC, zzambers
no flags Details
archive-7G.tar (3.00 KB, application/x-tar)
2022-01-10 15:06 UTC, zzambers
no flags Details
archive-9G.tar (3.00 KB, application/x-tar)
2022-01-10 15:07 UTC, zzambers
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-106994 0 None None None 2022-01-06 16:12:15 UTC
Red Hat Product Errata RHBA-2022:7788 0 None None None 2022-11-08 10:55:01 UTC

Description zzambers 2022-01-06 16:09:56 UTC
Created attachment 1849292 [details]
tarissue.sh

Tar fails with error (EOF) when trying to extract archive created by bsdtar containing sparse file with size >= 8G.

Error:
tar -xzf archive.tar.gz -C extract
tar: Ignoring unknown extended header keyword 'SCHILY.fflags'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.security.selinux'
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

Reproducer:
reproducer attached

Package versions:
tar-1.30-5.el8.x86_64
bsdtar-3.3.3-1.el8.x86_64

Comment 1 zzambers 2022-01-06 16:12:24 UTC
Created attachment 1849293 [details]
archive.tar.gz

Example archive attached.

Comment 2 zzambers 2022-01-06 16:29:19 UTC
notes:
- problematic archive can still be extracted using bsdtar without problem
- also reproducible on fedora 34 using newer tar ( tar-1.34-1.fc34.x86_64 )

Comment 3 jiri vanek 2022-01-07 09:29:53 UTC
Hello!

To highlight the severity - this practically prevent proper usage of vagrant - and most likely other virtualisations where sparse disks are very common - on rhel 8

Comment 5 Lukas Javorsky 2022-01-07 14:34:28 UTC
I've opened a bug for the tar upstream as well, so they can help us fix this issue.

https://savannah.gnu.org/bugs/index.php?61768

Comment 6 Lukas Javorsky 2022-01-10 08:27:56 UTC
According to the answer from upstream, the only member of the archive file was stored using the SCHILY.fflags extended attribute.
This is not (yet) supported by GNU tar, and thus the tar is handling this archive as expected.

Comment 7 zzambers 2022-01-10 14:02:22 UTC
That is really unfortunate, because this makes archive created by bsdtar (using default settings), not extractable by tar, when packed file meets some conditions... :(

Comment 8 zzambers 2022-01-10 15:02:09 UTC
Also I am not sure I am satisfied with upstream explanation. When I create archive with 7G sparse file and other with 9G sparse file one is extractable (7G) and other is not (9G). I compared both tars using vbindiff and they both seem structurally same (they even have same size). There are are differences of just few bytes here and there and then there is list of some properties where 9G archive has size=... property, but 7G archive does not. I think this could be difference maker.

Comment 9 zzambers 2022-01-10 15:05:30 UTC
Created attachment 1849869 [details]
archives-compare.png

See difference in list of properties I was talking about. (size=... is only present for 9G archive)

Comment 10 zzambers 2022-01-10 15:06:52 UTC
Created attachment 1849870 [details]
archive-7G.tar

Comment 11 zzambers 2022-01-10 15:07:30 UTC
Created attachment 1849871 [details]
archive-9G.tar

Comment 12 zzambers 2022-01-10 15:35:16 UTC
I am not concerned by warning messages about "unknown extended header keyword" as long as archive can be successfully unpacked. Also bsdtar on my fedora 34 does not generate those and their presence has no effect on this problem. (7G and 9G example archives (higher) were generated on fedora 34 and they do not have SCHILY.fflags)

Btw test archives are created from sparse files which are composed of just single big hole. (So probably archive does not need to store anything except from metadata.)

Comment 13 Lukas Javorsky 2022-01-11 14:15:02 UTC
Jiri,

Could you also please explain more why this is a high Priority and also why it just now affects the proper usage of vagrant.

As I see it, this problem was there for a long time and it didn't affect it, so does anything changed?

Comment 14 zzambers 2022-01-11 15:33:46 UTC
I confirmed that error by tar is caused by size attribute (for sparse files >=8G) and after studying some documentation and source code of both tar and libarchive (used by bsdtar), my conclusion is that size attribute is wrong. Looks like problem is on bsdtar's side (despite bsdtar being able to unpack such archive created by itself). Size attribute is only needed when file data size is >=8G (limit of old tar header), but check for this in libarchive seems to be done too early, not taking into account data size reduction caused by file being sparse.

Comment 15 zzambers 2022-01-11 15:38:22 UTC
I have made PR with fix for libarchive [1]. I'll probably wait for their response and then change this bug to bsdtar.

[1] https://github.com/libarchive/libarchive/pull/1653

Comment 16 mkulik 2022-01-11 16:50:27 UTC
I looked very quickly.

9GB file: skip_file (size=9663675904) at /usr/src/debug/tar-1.34-2.fc35.x86_64/src/list.c:1408
7GB file: skip_file (size=0) at /usr/src/debug/tar-1.34-2.fc35.x86_64/src/list.c:1408

backtrace to extract_file in extract.c:

diffing execution we can see that sparse_extract_file in extract.c:1264 will return size=0 for 7GB file and size=9663675904 for 9GB file.

size variable in this function is defined as: *size = file.stat_info->archive_file_size - file.dumped_size

Looking at the end of execution of sparse_extract_file:

(9 GB file)
(gdb) print file.stat_info.archive_file_size
$26 = 9663676416

(7 GB file)
(gdb) print file.stat_info.archive_file_size
$11 = 512 <- MIN value defined in sourcecode, real is less

This information is set in xheader_decode from stat.st_size:

  /* The archived (effective) file size is always set directly in tar header
     field, possibly overridden by "size" extended header - in both cases,
     result is now decoded in st->stat.st_size */

(9 GB file)
$37 = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 420, st_uid = 1000, st_gid = 1000, __pad0 = 0, st_rdev = 0, st_size = 9663676416, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, 
    tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = 0}, __glibc_reserved = {0, 0, 0}}

(7 GB file)
$23 = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 420, st_uid = 1000, st_gid = 1000, __pad0 = 0, st_rdev = 0, st_size = 512, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, 
  st_ctim = {tv_sec = 0, tv_nsec = 0}, __glibc_reserved = {0, 0, 0}}

From documentation:
  The size field is the size of the file in bytes; linked files are archived with this field specified as zero.

It seems that you are right and the size argument is set incorrectly.

Comment 17 zzambers 2022-01-11 18:36:23 UTC
Also according to doc [2] extension is designed in such way that tar implementation not supporting sparse file extension should just extract them in "condensed" form to specially named directory. From there they can then be restored by separate tool. For this to work reported size needs to be size of actual data ("condensed" form). But this is not true for archive with 9GB sparse file. Also your analysis shows that in case of 9GB file st_size is set to size of original sparse file, while for 7GB file it holds size of it's "condensed" form.

[2] https://www.gnu.org/software/tar/manual/html_node/Sparse-Recovery.html

Comment 18 zzambers 2022-01-12 13:24:50 UTC
My fix for this issue to libarchive was accepted upstream -> changed component to libarchive. (libarchive is used by bsdtar)

Comment 19 jiri vanek 2022-01-13 14:11:09 UTC
for record: https://github.com/libarchive/libarchive/pull/1653/commits/afef3d7fc131df0dac09a46b8673898860a193db

Would be nice to have it ported to el8 and fedoras.... Especially fedoras

Comment 20 jiri vanek 2022-01-13 14:12:46 UTC
(In reply to Lukas Javorsky from comment #13)
> Jiri,
> 
> Could you also please explain more why this is a high Priority and also why
> it just now affects the proper usage of vagrant.
> 
> As I see it, this problem was there for a long time and it didn't affect it,
> so does anything changed?

We moved the main vagrant master form rhel7 to rhel8 that's all.

Comment 21 jiri vanek 2022-01-13 14:13:50 UTC
This is actually exempalr  reason why people migrate to newer rhel only once it is at least in 1/2 of it lifecycles

Comment 24 Honza Horak 2022-01-14 10:40:51 UTC
(In reply to jiri vanek from comment #20)
> (In reply to Lukas Javorsky from comment #13)
> > Jiri,
> > 
> > Could you also please explain more why this is a high Priority and also why
> > it just now affects the proper usage of vagrant.
> > 
> > As I see it, this problem was there for a long time and it didn't affect it,
> > so does anything changed?
> 
> We moved the main vagrant master form rhel7 to rhel8 that's all.

Can you explain a bit more, please, what is the impact of not fixing this (at this point) for your team? It's not clear what master means in this context and what the consequences are for who.

We already have too much for RHEL-8.6.0, so currently plan this for RHEL-8.7.0. If you can provide more info why this must be done for 8.6.0 or 8.5.0.z, we can revisit this plan.

Comment 25 Honza Horak 2022-05-16 12:42:20 UTC
Matej, do you think the patch from comment #19 is ok for RHEL? And we should take a look at RHEL-9 as well..

Comment 26 Matej Mužila 2022-05-30 10:41:09 UTC
(In reply to Honza Horak from comment #25)
> Matej, do you think the patch from comment #19 is ok for RHEL? And we should
> take a look at RHEL-9 as well..

I think, that the patch from comment #19 would be ok for RHEL >= 8.

Comment 29 Lukas Javorsky 2022-07-18 09:04:57 UTC
Merged

Comment 38 errata-xmlrpc 2022-11-08 10:54:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libarchive bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7788


Note You need to log in before you can comment on or make changes to this bug.