Bug 154882 - tar 1.14 doesn't extract sparse files correctly, breaking amanda
tar 1.14 doesn't extract sparse files correctly, breaking amanda
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: tar (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Peter Vrabec
Ben Levenson
http://lists.gnu.org/archive/html/bug...
:
Depends On:
Blocks: 156322
  Show dependency treegraph
 
Reported: 2005-04-14 13:53 EDT by Matt Hyclak
Modified: 2007-11-30 17:07 EST (History)
7 users (show)

See Also:
Fixed In Version: RHBA-2005-380
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-05 09:34:20 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Output of tar command (2.12 KB, text/plain)
2005-04-14 14:09 EDT, Matt Hyclak
no flags Details
Output of working and non-working options in tar 1.15.1 (3.85 KB, text/plain)
2005-04-15 11:52 EDT, Matt Hyclak
no flags Details

  None (edit)
Description Matt Hyclak 2005-04-14 13:53:42 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.6) Gecko/20050323 Galeon/1.3.20

Description of problem:
Amanda uses the --sparse option to tar when creating backups. Tar 1.14 is known to have problems with extracting sparse files (see http://lists.gnu.org/archive/html/bug-tar/2005-02/msg00003.html).

Version-Release number of selected component (if applicable):
tar-1.14-4, amanda-2.4.4p3-1

How reproducible:
Always

Steps to Reproduce:
1. /bin/tar --create --file /dev/null --directory / --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/euclid.math.ohiou.edu__0.new --sparse --ignore-failed-read --totals --exclude-from /var/log/amanda/sendsize._.20050414123909.exclude .

Where /var/lib/amanda/gnutar-lists/euclid.math.ohiou.edu__0.new and /var/log/amanda/sendsize._.20050414123909.exclude are both empty files.

2.
3.
  

Actual Results:  The tar output claimed I had 1.2TB on a 20GB partition

Expected Results:  It would have told me I was using about 3.5GB of my 20GB partition.

Additional info:
Comment 1 Matt Hyclak 2005-04-14 14:09:27 EDT
Created attachment 113173 [details]
Output of tar command

This seems to be exactly the problem mentioned in the Bug-Tar list post...it's
trying to seek on a file and is failing.
Comment 3 Matt Hyclak 2005-04-14 16:26:50 EDT
I rebuilt and installed tar 1.15.1 from rawhide and the problem persists.
Excluding the files in /var/lib/nscd, etc. returns the correct estimate.

I forgot to mention the client is x86_64 (Dual Opteron), which may have
something to do with it as well.

Last time you'll hear from me today, I promise :-)
Comment 4 Matt Hyclak 2005-04-15 11:52:56 EDT
Created attachment 113232 [details]
Output of working and non-working options in tar 1.15.1

I reported the bug on the bug-tar@gnu.org mailing list. I have full output that
demonstrates the problem and that ignoring the files works.
Comment 5 Peter Vrabec 2005-05-04 06:53:51 EDT
I don't understand what the problem is.

There is 1.2T lastlog file on 64bit machines
$du --apparent-size -h /var/log/lastlog
Comment 6 Matt Hyclak 2005-05-04 07:11:57 EDT
Yes, but that file does not take up 1.2T on the file system. 

[hyclak@euclid ~]$ du --apparent-size -h /var/log/lastlog 
1.2T    /var/log/lastlog
[hyclak@euclid ~]$ du -h /var/log/lastlog 
56K     /var/log/lastlog

Should tar not be reporting the size of the file as 56K? After all, the tarball
it would create will not be in the TB range...
Comment 7 Joshua Baker-LePain 2005-05-04 07:32:52 EDT
In addition, it's the "Warning: Cannot seek to 0: Bad file descriptor" messages
that have me worried.  tar 1.13.25 on RHEL3, e.g., doesn't exhibit that behavior
on sparse files.

Doing a couple of quick tests with the stock tar, I also see that it exhibits
different behavior when actually writing a tarball vs. '-f /dev/null'.  On a 
32bit system, 
'tar cf /dev/null --sparse --ignore-failed-read --totals /var' says 
'Total bytes written: 70400000 (68MiB, ?/s)', and has the bad file descriptor
warnings, but 
'tar cf var.tar --sparse --ignore-failed-read --totals /var' says
'Total bytes written: 48076800 (46MiB, ?/s)' and doesn't have the bad
file descriptor warnings.

I don't have a 64bit RHEL4 system ATM, but it'd be interesting to see if it
shows the same differing behavior when actually writing a tarball...
Comment 8 Matt Hyclak 2005-05-04 21:51:40 EDT
Same occurs on x86_64. Seems to be a /dev/null problem.

[root@euclid ~]# tar cf /dev/null --sparse --ignore-failed-read --totals /var
-- cannot seek errors --
Total bytes written: 1254807336960 (1.2TiB, 11GiB/s)

[root@euclid ~]# tar cf /stuff/var.tar --sparse --ignore-failed-read --totals /var
Total bytes written: 671088640 (640MiB, 75KiB/s)
Comment 9 Peter Vrabec 2005-05-05 05:29:12 EDT
/dev/null is non seekable, that's it.

There used to be a similar behaviour when extracting tarball(with sparse files
in) to not seekable file, pipe, ... . See #146225.

If the tar behaviour in reporting file size has not changed against previus
versions I realy don't want to change it. Try upstream.

Is there any serious problem in here?
Comment 10 Joshua Baker-LePain 2005-05-05 07:47:39 EDT
The serious problem is that this completely breaks a backup tool (amanda).  And
it has changed against previous versions.  This from a RHEL3 machine 
(tar-1.13.25-13):

sudo tar cf /dev/null --sparse --ignore-failed-read --totals /var
.
Total bytes written: 358891520 (342MB, 86MB/s)

sudo tar cf var.tar --sparse --ignore-failed-read --totals /var 
.
Total bytes written: 358891520 (342MB, 8.6MB/s)

Comment 11 Matt Hyclak 2005-05-05 09:02:52 EDT
I sent an e-mail upstream to bug-tar and had not received any responses as of a
week ago. I've been unable to get onto the archives for the last week to see if
there have been any responses.
Comment 12 Jay Fenlason 2005-05-05 11:04:37 EDT
The error described in comment #10 causes problems for Amanda because it uses 
a similar invocation of tar to estimate the amount of tape that will be used 
by a dump.  When Amanda attempts to dump /var/log/lastlog, tar tells it the 
dump will be in the 1.2 terabyte range.  Very few people have tapes that 
large, so Amanda reports that the dump will not fit on the tape and aborts it.  
Even though an actual dump would have fit on the tape. 
Comment 13 Tomas Mraz 2005-05-12 11:19:28 EDT
The problem is that on RHEL3 the nfsnobody user hadn't such a high uid and so
the /var/log/lastlog file isn't so large there.

So I don't think the tar actually changed its behaviour in this regard between
RHEL3 and RHEL4.
Comment 14 Joshua Baker-LePain 2005-05-12 11:30:24 EDT
The difference isn't as striking on RHEL3 (or 32bit RHEL4 for that matter)
since the "apparent size" (in du terms) is not in the TiB range, but it's still 
different.

RHEL3:
tar cf /dev/null  --sparse --ignore-failed-read --totals /var/log/lastlog 
tar: Removing leading `/' from member names
Total bytes written: 10240 (10kB, ?B/s)

tar cf tmp.tar  --sparse --ignore-failed-read --totals /var/log/lastlog 
tar: Removing leading `/' from member names
Total bytes written: 10240 (10kB, ?B/s)

Note: Same output

RHEL4 (32bit):
tar cf /dev/null --sparse --ignore-failed-read --totals /var/log/lastlog 
/bin/tar: Removing leading `/' from member names
/bin/tar: /var/log/lastlog: Warning: Cannot seek to 0: Bad file descriptor
Total bytes written: 19138560 (19MiB, 19MiB/s)

tar cf tmp.tar /dev/null --sparse --ignore-failed-read --totals /var/log/lastlog 
/bin/tar: Removing leading `/' from member names
Total bytes written: 10240 (10KiB, ?/s)

Note: Different output
Comment 15 Matt Hyclak 2005-05-12 11:58:39 EDT
Definitely seems to be a 64-bitism:

[hyclak@euclid ~]$ uname -a
Linux euclid.math.ohiou.edu 2.6.9-5.0.5.ELsmp #1 SMP Tue Apr 19 17:06:07 CDT
2005 x86_64 x86_64 x86_64 GNU/Linux
[hyclak@euclid ~]$ du -sh /var/log/lastlog 
56K     /var/log/lastlog
[hyclak@euclid ~]$ ls -lh /var/log/lastlog 
-r--------  1 root root 1.2T May 12 09:11 /var/log/lastlog

[hyclak@morton526-L01 ~]$ uname -a
Linux morton526-L01.math.ohiou.edu 2.6.9-5.0.5.ELsmp #1 SMP Wed Apr 20 00:16:40
BST 2005 i686 i686 i386 GNU/Linux
[hyclak@morton526-L01 ~]$ du -sh /var/log/lastlog 
72K     /var/log/lastlog
[hyclak@morton526-L01 ~]$ ls -lh /var/log/lastlog 
-r--------  1 root root 19M May 12 11:43 /var/log/lastlog
Comment 20 Peter Vrabec 2005-07-20 10:47:48 EDT
/dev/null is handled differently in tar>=1.14 but you can obtain tar-1.13
behavior with:
$tar cf - --sparse --totals lastlog | dd of=/dev/null

https://www.redhat.com/archives/fedora-list/2005-May/msg00786.html
Comment 21 Jay Fenlason 2005-07-20 11:02:17 EDT
So you're recommending rewriting Amanda to work around this bug? 
Comment 22 Joshua Baker-LePain 2005-07-20 11:07:33 EDT
Or do we need to go upstream to the tar maintainers?  Given comments above, that
seems like it may not work very well.

To me, this walks like a tar bug, quacks like a tar bug, and is leaving
tar-bug droppings all over the place...
Comment 23 Peter Vrabec 2005-07-20 11:45:08 EDT
Bad about "$tar cf - --sparse --totals lastlog | dd of=/dev/null" is that it
takes much more time to finish than "$tar cf /dev/null --sparse --totals lastlog"
especially when lastlog is a huge sparse file. See #149407. Neither of these
choices helps us. One takes to much time another return bad totals.
Comment 24 Joshua Baker-LePain 2005-07-20 11:50:27 EDT
*Has* this been fixed upstream?  The original reporter pointed to a posting
on the bug-tar list, and there is a response to that post claiming the bug
is fixed in CVS after the 1.15.1 release (see 
<http://lists.gnu.org/archive/html/bug-tar/2005-02/msg00006.html>).  
I can't get the CVS version to build, though (bootstrap is dying), so I can't
confirm.
Comment 25 Jay Fenlason 2005-07-20 11:53:01 EDT
Tar being slow can be easily worked around by increasing the timeouts in 
Amanda.  Tar returning invalid data is unrecoverable.  I'll vote for slow and 
correct over fast and bogus. 
Comment 26 Peter Vrabec 2005-07-20 12:03:28 EDT
(In reply to comment #24)
> *Has* this been fixed upstream?  The original reporter pointed to a posting
> on the bug-tar list, and there is a response to that post claiming the bug
> is fixed in CVS after the 1.15.1 release (see 
> <http://lists.gnu.org/archive/html/bug-tar/2005-02/msg00006.html>).  
> I can't get the CVS version to build, though (bootstrap is dying), so I can't
> confirm.

Yes it's fixed, but that's a different problem.
Comment 27 Matt Hyclak 2005-07-20 13:26:03 EDT
Yes, I compiled 1.15.1 on my x86_64 system and the problem still exists. See
Comment #4.
Comment 28 Peter Vrabec 2005-07-25 04:01:50 EDT
I vote for slow and correct solution too. Also upstream is inclined to use this one.
http://lists.gnu.org/archive/html/bug-tar/2005-07/msg00025.html
Comment 29 Peter Vrabec 2005-07-25 05:12:39 EDT
fix candidate:
http://people.redhat.com/pvrabec/tar-1.14-8.RHEL4.src.rpm
Comment 30 Matt Hyclak 2005-07-25 09:35:35 EDT
(In reply to comment #29)
> fix candidate:
> http://people.redhat.com/pvrabec/tar-1.14-8.RHEL4.src.rpm

Does this use patch 1 or patch 2 from
http://lists.gnu.org/archive/html/bug-tar/2005-07/msg00025.html

I have recompiled and will test tonight.

Also, has the problem 1.14 had extracting been fixed in a backport? That has
presented problems for amanda users in the past such that the recommended
versions are 1.13.25 and 1.15.1 at this time. See
http://www.amanda.org/docs/faq.html#id2554919

Comment 31 Joshua Baker-LePain 2005-07-25 09:44:14 EDT
"Slow and correct" definitely describes the behavior of this patched version.
This on a 64bit machine with /var on a 4 disk hardware RAID5:

time sudo ./bin/tar cf /dev/null --sparse --ignore-failed-read --totals
/var/log/lastlog 
./bin/tar: Removing leading `/' from member names
./bin/tar: /var/log/lastlog: file changed as we read it
Total bytes written: 10240 (10KiB, 1B/s)

real    106m20.334s
user    35m39.307s
sys     67m16.855s
Comment 32 Matt Hyclak 2005-07-25 13:10:52 EDT
(In reply to comment #31)
Same here:

time /bin/tar --create --file /dev/null --directory / --one-file-system
--listed-incremental /var/lib/amanda/gnutar-lists/euclid.math.ohiou.edu__1.new
--sparse --ignore-failed-read --totals .
Total bytes written: 5226465280 (4.9GiB, 503KiB/s)

real    169m19.540s
user    46m43.811s
sys     100m20.684s


That's a 5.5GB partition with dual Opteron 242's. tar had one CPU pegged for
that amount of time. 3 hours makes it essentially useless for amanda, as it
expects estimates to be done by default in 5 minutes. That can be bumped up, but
3 hours is a little excessive. 
Comment 33 Peter Vrabec 2005-07-26 03:27:38 EDT
I don't think this is useless for amanda. If we reduce lastlog size on 64bit
machines, it will start working.
Comment 34 Matt Hyclak 2005-07-26 09:39:44 EDT
Fair enough...I wasn't thinking in those terms. 

As far as I can tell, the 1.14-8 works just fine, and combined with the change
to lastlog size should fix the problem. 
Comment 35 Anchor Systems Managed Hosting 2005-09-20 01:38:50 EDT
We are also experiencing the problem here, via amanda backups. The /var
partition estimate ends up being 1.2TB which is way bigger than any tape and
hence the backup doesn't proceed for this partition.

This is on a dual Intel EM64T machine running RHEL4 x86_64.
Comment 36 Anchor Systems Managed Hosting 2005-09-20 01:39:55 EDT
Is a bugfix planned for this?
Comment 37 Peter Vrabec 2005-09-20 04:48:20 EDT
Bugfix planned in update #2.
Comment 38 Red Hat Bugzilla 2005-10-05 09:34:20 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-380.html
Comment 39 Peter Bieringer 2005-12-24 13:05:45 EST
Same problem hit me, the suggested errata don't work, I had installed tar
-1.14-8.RHEL4. After updating to tar-1.15.1-11.FC4 from Fedora Core 4, it works
fine.

Please dig further into the issue and either adjust 1.14 or provide 1.15 for RHEL4

Thank you.


Note You need to log in before you can comment on or make changes to this bug.