From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020326 Description of problem: I have a large CVS archive that I tried to tar up. However, tar creates a bad archive in some cases. My guess it has something to do with the tty mode when tar forks a subprocess such as bzip2 or gzip. This only seems to occur with some archives. In particular it seems to happen when I try to tar up a CVS repository containing the kernel. This bug seems identical to the one reported in buzilla bug report 390. Version-Release number of selected component (if applicable): tar 1.13.19-6 How reproducible: Sometimes Steps to Reproduce: 1. check the whole kernel into CVS 2. tar cjf - cvs_archive >cvs_archive.tar.bz2 3. tar tjf cvs_archive.tar.bz2 Actual Results: <lots of output> bzip2: (stdin): trailing garbage after EOF ignored Expected Results: <lots of output> (and no error message) Additional info: Tried it with bash and tcsh and the same thing occured. (My first hypothesis was that tcsh was not switching out of cooked mode properly.) Tried it with gzip (z tar option) and bzip (j tar option) and the same thing happens. If you do: tar cf - cvs_archive | bzip2 > cvs_archive.tar.bz2 So it seems to have something to do with when tar forks off a subprocess. Also note that the md5sums of the files created by tar cjf cvs_archive-2.tar.bz2 cvs_archive and tar cf - cvs_archive | bzip2 > cvs_archive.tar.bz2 are different.
I'm seeing the same symptoms on an Alpha machine RH7.1 Linux using tar-1.13.19-4 and bzip2-1.0.1-4. (It affects my nightly backups--yikes! Time to add a verification step to the script.)
After a little testing, using about 88MB of data, I have found the 'trailing garbage' is apparently real but benign. I did four runs: 1) with 'z' switch, 2) with 'j' switch, 3) no switch but piped to bzip2, and 4) no switch bug piped to gzip. In both cases, the switch produced slightly larger files than the no switch and pipe. Only the 'j' switch complained about 'trailing garbage'. Extracting the data from all four runs produced identical results (per 'diff -r'). So, it appears one workaround would be to use '-' as the output file and manually pipe (with a shell script) the output into bzip2 (or gzip). Examination of the file produced with the 'j' switch showed a block of \0 bytes at the end of the file. Removing them produced a file that is identical to the file produced by no switch and manually piping the output through bzip2. So, it would appear the bug is in 'tar' and involves padding the (compressed) output file with a bunch of null bytes after the compression program has finished its job. If I had time, I'd love to dig into the code and submit a patch. But, I'll have to leave that to the professionals for now.
I believe that I saw this on a 7.1 based machine as well. It could be that the problem was with the version of tar bundled with 7.1.
ben: have you tried tar 1.13.25-4 from Red Hat Linux 7.3?
Preston, I tried this with 1.13.25, 1.13.25-4.7 and 1.13.25-7 (8.0 version). I is busted in all these versions.
Created attachment 85742 [details] description of use cases for tar, bzip2 and output files
Tar is working as designed. If the filename is given as '-', tar assumes the user knows what they are doing, and that the output is going to be sent (eventually) to an actual tape device. Many tape devices will only accept input in multiples of their physical blocksize, so this reblocking is required to allow compression to work with those tape devices. Upshot: to create a bzipped tar file that is not padded out to a multiple of tar's blocksize, use either tar -c -j -f {output filename} (Where output filename is not a device file) or tar -c -f - | bzip2 > {output filename} This is all described it way too much detail in the tar documentation. Read the "Blocking Factor" section.