Bug 64410 - Tar produces corrupt files in some situations.
Summary: Tar produces corrupt files in some situations.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: tar
Version: 7.2
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jeff Johnson
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-05-03 20:41 UTC by Ben Woodard
Modified: 2007-04-18 16:42 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2002-11-20 19:10:22 UTC
Embargoed:


Attachments (Terms of Use)
description of use cases for tar, bzip2 and output files (707 bytes, text/plain)
2002-11-20 19:10 UTC, Jay Fenlason
no flags Details

Description Ben Woodard 2002-05-03 20:41:01 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020326

Description of problem:
I have a large CVS archive that I tried to tar up. However, tar creates a bad
archive in some cases. My guess it has something to do with the tty mode when
tar forks a subprocess such as bzip2 or gzip. This only seems to occur with some
archives. In particular it seems to happen when I try to tar up a CVS repository
containing the kernel.

This bug seems identical to the one reported in buzilla bug report 390. 

Version-Release number of selected component (if applicable):
tar 1.13.19-6


How reproducible:
Sometimes

Steps to Reproduce:
1. check the whole kernel into CVS
2. tar cjf - cvs_archive >cvs_archive.tar.bz2
3. tar tjf cvs_archive.tar.bz2
	

Actual Results:  <lots of output>

bzip2: (stdin): trailing garbage after EOF ignored


Expected Results:  <lots of output>
(and no error message)

Additional info:

Tried it with bash and tcsh and the same thing occured. (My first hypothesis was
that tcsh was not switching out of cooked mode properly.)

Tried it with gzip (z tar option) and bzip (j tar option) and the same thing
happens.

If you do: 

tar cf - cvs_archive | bzip2 > cvs_archive.tar.bz2 

So it seems to have something to do with when tar forks off a subprocess.

Also note that the md5sums of the files created by

tar cjf cvs_archive-2.tar.bz2 cvs_archive 

and 

tar cf - cvs_archive | bzip2 > cvs_archive.tar.bz2 

are different.

Comment 1 Robert M. Riches Jr. 2002-07-02 17:02:51 UTC
I'm seeing the same symptoms on an Alpha machine
RH7.1 Linux using tar-1.13.19-4 and bzip2-1.0.1-4.
(It affects my nightly backups--yikes!  Time to add a
verification step to the script.)


Comment 2 Robert M. Riches Jr. 2002-07-02 21:28:36 UTC
After a little testing, using about 88MB of data, I have found the 'trailing
garbage'
is apparently real but benign.  I did four runs: 1) with 'z' switch, 2) with 'j'
switch,
3) no switch but piped to bzip2, and 4) no switch bug piped to gzip.  In both
cases,
the switch produced slightly larger files than the no switch and pipe.  Only the
'j'
switch complained about 'trailing garbage'.  Extracting the data from all four
runs
produced identical results (per 'diff -r').

So, it appears one workaround would be to use '-' as the output file and
manually
pipe (with a shell script) the output into bzip2 (or gzip).

Examination of the file produced with the 'j' switch showed a block of \0 bytes
at the end of the file.  Removing them produced a file that is identical to the
file
produced by no switch and manually piping the output through bzip2.  So, it
would
appear the bug is in 'tar' and involves padding the (compressed) output file
with
a bunch of null bytes after the compression program has finished its job.

If I had time, I'd love to dig into the code and submit a patch.  But, I'll have
to leave
that to the professionals for now.



Comment 3 Ben Woodard 2002-07-03 01:11:21 UTC
I believe that I saw this on a 7.1 based machine as well. It could be that the
problem was with the version of tar bundled with 7.1.



Comment 4 Preston Brown 2002-07-23 15:03:55 UTC
 ben: 
 
have you tried tar 1.13.25-4 from Red Hat Linux 7.3?

Comment 5 Bob Matthews 2002-07-25 14:49:37 UTC
Preston, I tried this with 1.13.25, 1.13.25-4.7 and 1.13.25-7 (8.0 version).  I
is busted in all these versions.

Comment 6 Jay Fenlason 2002-11-20 19:10:15 UTC
Created attachment 85742 [details]
description of use cases for tar, bzip2 and output files

Comment 7 Jay Fenlason 2002-11-20 19:44:23 UTC
Tar is working as designed.  If the filename is given as '-', tar assumes the  
user knows what they are doing, and that the output is going to be sent  
(eventually) to an actual tape device.  Many tape devices will only accept  
input in multiples of their physical blocksize, so this reblocking is required  
to allow compression to work with those tape devices.  
  
Upshot: to create a bzipped tar file that is not padded out to a multiple of  
tar's blocksize, use either  
tar -c -j -f {output filename}  
(Where output filename is not a device file)  
or  
tar -c -f - | bzip2 > {output filename}  
  
This is all described it way too much detail in the tar documentation.  Read  
the "Blocking Factor" section.


Note You need to log in before you can comment on or make changes to this bug.