Bug 64410 - Tar produces corrupt files in some situations.
Tar produces corrupt files in some situations.
Product: Red Hat Linux
Classification: Retired
Component: tar (Show other bugs)
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Jeff Johnson
Ben Levenson
Depends On:
  Show dependency treegraph
Reported: 2002-05-03 16:41 EDT by Ben Woodard
Modified: 2007-04-18 12:42 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2002-11-20 14:10:22 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
description of use cases for tar, bzip2 and output files (707 bytes, text/plain)
2002-11-20 14:10 EST, Jay Fenlason
no flags Details

  None (edit)
Description Ben Woodard 2002-05-03 16:41:01 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020326

Description of problem:
I have a large CVS archive that I tried to tar up. However, tar creates a bad
archive in some cases. My guess it has something to do with the tty mode when
tar forks a subprocess such as bzip2 or gzip. This only seems to occur with some
archives. In particular it seems to happen when I try to tar up a CVS repository
containing the kernel.

This bug seems identical to the one reported in buzilla bug report 390. 

Version-Release number of selected component (if applicable):
tar 1.13.19-6

How reproducible:

Steps to Reproduce:
1. check the whole kernel into CVS
2. tar cjf - cvs_archive >cvs_archive.tar.bz2
3. tar tjf cvs_archive.tar.bz2

Actual Results:  <lots of output>

bzip2: (stdin): trailing garbage after EOF ignored

Expected Results:  <lots of output>
(and no error message)

Additional info:

Tried it with bash and tcsh and the same thing occured. (My first hypothesis was
that tcsh was not switching out of cooked mode properly.)

Tried it with gzip (z tar option) and bzip (j tar option) and the same thing

If you do: 

tar cf - cvs_archive | bzip2 > cvs_archive.tar.bz2 

So it seems to have something to do with when tar forks off a subprocess.

Also note that the md5sums of the files created by

tar cjf cvs_archive-2.tar.bz2 cvs_archive 


tar cf - cvs_archive | bzip2 > cvs_archive.tar.bz2 

are different.
Comment 1 Robert M. Riches Jr. 2002-07-02 13:02:51 EDT
I'm seeing the same symptoms on an Alpha machine
RH7.1 Linux using tar-1.13.19-4 and bzip2-1.0.1-4.
(It affects my nightly backups--yikes!  Time to add a
verification step to the script.)
Comment 2 Robert M. Riches Jr. 2002-07-02 17:28:36 EDT
After a little testing, using about 88MB of data, I have found the 'trailing
is apparently real but benign.  I did four runs: 1) with 'z' switch, 2) with 'j'
3) no switch but piped to bzip2, and 4) no switch bug piped to gzip.  In both
the switch produced slightly larger files than the no switch and pipe.  Only the
switch complained about 'trailing garbage'.  Extracting the data from all four
produced identical results (per 'diff -r').

So, it appears one workaround would be to use '-' as the output file and
pipe (with a shell script) the output into bzip2 (or gzip).

Examination of the file produced with the 'j' switch showed a block of \0 bytes
at the end of the file.  Removing them produced a file that is identical to the
produced by no switch and manually piping the output through bzip2.  So, it
appear the bug is in 'tar' and involves padding the (compressed) output file
a bunch of null bytes after the compression program has finished its job.

If I had time, I'd love to dig into the code and submit a patch.  But, I'll have
to leave
that to the professionals for now.

Comment 3 Ben Woodard 2002-07-02 21:11:21 EDT
I believe that I saw this on a 7.1 based machine as well. It could be that the
problem was with the version of tar bundled with 7.1.

Comment 4 Preston Brown 2002-07-23 11:03:55 EDT
have you tried tar 1.13.25-4 from Red Hat Linux 7.3?
Comment 5 Bob Matthews 2002-07-25 10:49:37 EDT
Preston, I tried this with 1.13.25, 1.13.25-4.7 and 1.13.25-7 (8.0 version).  I
is busted in all these versions.
Comment 6 Jay Fenlason 2002-11-20 14:10:15 EST
Created attachment 85742 [details]
description of use cases for tar, bzip2 and output files
Comment 7 Jay Fenlason 2002-11-20 14:44:23 EST
Tar is working as designed.  If the filename is given as '-', tar assumes the  
user knows what they are doing, and that the output is going to be sent  
(eventually) to an actual tape device.  Many tape devices will only accept  
input in multiples of their physical blocksize, so this reblocking is required  
to allow compression to work with those tape devices.  
Upshot: to create a bzipped tar file that is not padded out to a multiple of  
tar's blocksize, use either  
tar -c -j -f {output filename}  
(Where output filename is not a device file)  
tar -c -f - | bzip2 > {output filename}  
This is all described it way too much detail in the tar documentation.  Read  
the "Blocking Factor" section.

Note You need to log in before you can comment on or make changes to this bug.