Bug 1056672

Summary: tar uses wrong magic number for xz archives
Product: Red Hat Enterprise Linux 6 Reporter: Bryn M. Reeves <bmr>
Component: tarAssignee: Pavel Raiskup <praiskup>
Status: CLOSED ERRATA QA Contact: qe-baseos-daemons
Severity: medium Docs Contact:
Priority: medium    
Version: 6.5CC: dkutalek, jpopelka, ovasik, psklenar
Target Milestone: rcKeywords: EasyFix
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: tar-1.23-12.el6 Doc Type: Bug Fix
Doc Text:
Previously, tar did not automatically detect archives compressed by the xz program if the user did not specify the "-J" or "--xz" option on the command line. As a consequence, if the processed archive had the ".xz" extension, tar extracted or listed the contents of the archive but printed an error message and eventually exited with a non-zero exit status. If the archive did not have this extension, tar failed. With this update, the automatic recognition mechanism has been improved. As a result, tar no longer prints an error message in this scenario, and it extracts or lists the contents of such archives correctly regardless of the extension.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-22 06:13:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1070830, 1159820    

Description Bryn M. Reeves 2014-01-22 17:01:28 UTC
Description of problem:
The tar program uses a built-in table of magic numbers to identify compressed archives. If the table fails to match a given file then tar will still attempt to open it by assuming the compression type based on file extension:

 219 static struct zip_magic const magic[] = {
 220   { ct_tar },
 221   { ct_none, },
 222   { ct_compress, 2, "\037\235",  COMPRESS_PROGRAM, "-Z" },
 223   { ct_gzip,     2, "\037\213",  GZIP_PROGRAM,     "-z"  },
 224   { ct_bzip2,    3, "BZh",       BZIP2_PROGRAM,    "-j" },
 225   { ct_lzip,     4, "LZIP",      LZIP_PROGRAM,     "--lzip" },
 226   { ct_lzma,     6, "\xFFLZMA",  LZMA_PROGRAM,     "--lzma" },
 227   { ct_lzop,     4, "\211LZO",   LZOP_PROGRAM,     "--lzop" },
 228   { ct_xz,       6, "\0xFD7zXZ", XZ_PROGRAM,       "-J" },
 229 };
[...]
 324             case ct_none:
 325               if (shortfile)
 326                 ERROR ((0, 0, _("This does not look like a tar archive")));
 327               set_comression_program_by_suffix (archive_name_array[0], NULL);
 328               if (!use_compress_program_option)
 329                 return archive;
 330               break;

An exception is for "short" files; in this case an error message is logged and tar will eventually exit with failure status:

# tar tf foo.tar.xz 
tar: This does not look like a tar archive
foo/
foo/bar
tar: Exiting with failure status due to previous errors
# echo $?
2

This is misleading as tar has guessed the compression type and successfully processed the archive.

Version-Release number of selected component (if applicable):
tar-1.23-11.el6

How reproducible:
100%

Steps to Reproduce:
1. Create an xz compressed archive that is < 1 block in size
2. Run tar on the resulting archive (t/x/whatever)

Actual results:
# rm -rf foo
# mkdir foo
# touch foo/bar
# tar cf foo.tar foo
# xz foo.tar
# tar tf foo.tar.xz 
tar: This does not look like a tar archive
foo/
foo/bar
tar: Exiting with failure status due to previous errors
# echo $?
2


Expected results:
# rm -rf foo
# mkdir foo
# touch foo/bar
# tar cf foo.tar foo
# xz foo.tar
# tar tf foo.tar.xz 
foo/
foo/bar
# echo $?
0


Additional info:
This happens because tar uses an invalid magic number for XZ files in its magic table:

 228   { ct_xz,       6, "\0xFD7zXZ", XZ_PROGRAM,       "-J" },

The correct magic string for XZ is "fd37 7a58 5a" (\xfd7zXZ). The '\0' encodes a null byte causing the rest of the magic string to appear empty:

(gdb) p *p                               -----------
$38 = {type = ct_xz, length = 6, magic = 0x445e20 "", program = 0x44596a "xz", rpl_option = 0x44596d "-J"}

'FD' is then encoded as ASCII etc.

It seems like the intent was to avoid gcc treating '\xfd7' as a single hex escape this will cause a "hex escape sequence out of range" warning as the 7 is interpreted as part of the escape.

It seems like the simplest way to solve this is to also use a hex escape for the '7' char; this leaves the string unambiguous and fixes the problem with very small tar archives for me:

  { ct_xz,       6, "\xFD\x37zXZ", XZ_PROGRAM,       "-J" },

Comment 1 Bryn M. Reeves 2014-01-22 17:04:04 UTC
Turns out this was already fixed upstream in a couple of commits in 2010:

commit 80a6ef7d94ce144db0249384e55846baa404f4dd
Author: Sergey Poznyakoff <gray.ua>
Date:   Mon Jun 28 00:04:49 2010 +0300

    Minor fix.
    
    * src/buffer.c (magic): Split the character constant to help
    cc recognize character boundaries (7 is a valid hex character).

diff --git a/src/buffer.c b/src/buffer.c
index 5b7cbc7..444f612 100644
--- a/src/buffer.c
+++ b/src/buffer.c
@@ -225,7 +225,7 @@ static struct zip_magic const magic[] = {
   { ct_lzip,     4, "LZIP",      LZIP_PROGRAM,     "--lzip" },
   { ct_lzma,     6, "\xFFLZMA",  LZMA_PROGRAM,     "--lzma" },
   { ct_lzop,     4, "\211LZO",   LZOP_PROGRAM,     "--lzop" },
-  { ct_xz,       6, "\xFD7zXZ",  XZ_PROGRAM,       "-J" },
+  { ct_xz,       6, "\xFD" "7zXZ",  XZ_PROGRAM,       "-J" },
 };

commit 9b31db388e6af753ec2e1c84db53a5d47e94ec15
Author: Sergey Poznyakoff <gray.ua>
Date:   Sun Jun 27 23:42:08 2010 +0300

    Minor fix.
    
    * src/buffer.c (magic): Fix xz magic.

diff --git a/src/buffer.c b/src/buffer.c
index 239d3f1..5b7cbc7 100644
--- a/src/buffer.c
+++ b/src/buffer.c
@@ -225,7 +225,7 @@ static struct zip_magic const magic[] = {
   { ct_lzip,     4, "LZIP",      LZIP_PROGRAM,     "--lzip" },
   { ct_lzma,     6, "\xFFLZMA",  LZMA_PROGRAM,     "--lzma" },
   { ct_lzop,     4, "\211LZO",   LZOP_PROGRAM,     "--lzop" },
-  { ct_xz,       6, "\0xFD7zXZ", XZ_PROGRAM,       "-J" },
+  { ct_xz,       6, "\xFD7zXZ",  XZ_PROGRAM,       "-J" },
 };

Comment 8 errata-xmlrpc 2015-07-22 06:13:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1285.html