From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.6) Gecko/20050323 Galeon/1.3.20 Description of problem: Amanda uses the --sparse option to tar when creating backups. Tar 1.14 is known to have problems with extracting sparse files (see http://lists.gnu.org/archive/html/bug-tar/2005-02/msg00003.html). Version-Release number of selected component (if applicable): tar-1.14-4, amanda-2.4.4p3-1 How reproducible: Always Steps to Reproduce: 1. /bin/tar --create --file /dev/null --directory / --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/euclid.math.ohiou.edu__0.new --sparse --ignore-failed-read --totals --exclude-from /var/log/amanda/sendsize._.20050414123909.exclude . Where /var/lib/amanda/gnutar-lists/euclid.math.ohiou.edu__0.new and /var/log/amanda/sendsize._.20050414123909.exclude are both empty files. 2. 3. Actual Results: The tar output claimed I had 1.2TB on a 20GB partition Expected Results: It would have told me I was using about 3.5GB of my 20GB partition. Additional info:
Created attachment 113173 [details] Output of tar command This seems to be exactly the problem mentioned in the Bug-Tar list post...it's trying to seek on a file and is failing.
I rebuilt and installed tar 1.15.1 from rawhide and the problem persists. Excluding the files in /var/lib/nscd, etc. returns the correct estimate. I forgot to mention the client is x86_64 (Dual Opteron), which may have something to do with it as well. Last time you'll hear from me today, I promise :-)
Created attachment 113232 [details] Output of working and non-working options in tar 1.15.1 I reported the bug on the bug-tar mailing list. I have full output that demonstrates the problem and that ignoring the files works.
I don't understand what the problem is. There is 1.2T lastlog file on 64bit machines $du --apparent-size -h /var/log/lastlog
Yes, but that file does not take up 1.2T on the file system. [hyclak@euclid ~]$ du --apparent-size -h /var/log/lastlog 1.2T /var/log/lastlog [hyclak@euclid ~]$ du -h /var/log/lastlog 56K /var/log/lastlog Should tar not be reporting the size of the file as 56K? After all, the tarball it would create will not be in the TB range...
In addition, it's the "Warning: Cannot seek to 0: Bad file descriptor" messages that have me worried. tar 1.13.25 on RHEL3, e.g., doesn't exhibit that behavior on sparse files. Doing a couple of quick tests with the stock tar, I also see that it exhibits different behavior when actually writing a tarball vs. '-f /dev/null'. On a 32bit system, 'tar cf /dev/null --sparse --ignore-failed-read --totals /var' says 'Total bytes written: 70400000 (68MiB, ?/s)', and has the bad file descriptor warnings, but 'tar cf var.tar --sparse --ignore-failed-read --totals /var' says 'Total bytes written: 48076800 (46MiB, ?/s)' and doesn't have the bad file descriptor warnings. I don't have a 64bit RHEL4 system ATM, but it'd be interesting to see if it shows the same differing behavior when actually writing a tarball...
Same occurs on x86_64. Seems to be a /dev/null problem. [root@euclid ~]# tar cf /dev/null --sparse --ignore-failed-read --totals /var -- cannot seek errors -- Total bytes written: 1254807336960 (1.2TiB, 11GiB/s) [root@euclid ~]# tar cf /stuff/var.tar --sparse --ignore-failed-read --totals /var Total bytes written: 671088640 (640MiB, 75KiB/s)
/dev/null is non seekable, that's it. There used to be a similar behaviour when extracting tarball(with sparse files in) to not seekable file, pipe, ... . See #146225. If the tar behaviour in reporting file size has not changed against previus versions I realy don't want to change it. Try upstream. Is there any serious problem in here?
The serious problem is that this completely breaks a backup tool (amanda). And it has changed against previous versions. This from a RHEL3 machine (tar-1.13.25-13): sudo tar cf /dev/null --sparse --ignore-failed-read --totals /var . Total bytes written: 358891520 (342MB, 86MB/s) sudo tar cf var.tar --sparse --ignore-failed-read --totals /var . Total bytes written: 358891520 (342MB, 8.6MB/s)
I sent an e-mail upstream to bug-tar and had not received any responses as of a week ago. I've been unable to get onto the archives for the last week to see if there have been any responses.
The error described in comment #10 causes problems for Amanda because it uses a similar invocation of tar to estimate the amount of tape that will be used by a dump. When Amanda attempts to dump /var/log/lastlog, tar tells it the dump will be in the 1.2 terabyte range. Very few people have tapes that large, so Amanda reports that the dump will not fit on the tape and aborts it. Even though an actual dump would have fit on the tape.
The problem is that on RHEL3 the nfsnobody user hadn't such a high uid and so the /var/log/lastlog file isn't so large there. So I don't think the tar actually changed its behaviour in this regard between RHEL3 and RHEL4.
The difference isn't as striking on RHEL3 (or 32bit RHEL4 for that matter) since the "apparent size" (in du terms) is not in the TiB range, but it's still different. RHEL3: tar cf /dev/null --sparse --ignore-failed-read --totals /var/log/lastlog tar: Removing leading `/' from member names Total bytes written: 10240 (10kB, ?B/s) tar cf tmp.tar --sparse --ignore-failed-read --totals /var/log/lastlog tar: Removing leading `/' from member names Total bytes written: 10240 (10kB, ?B/s) Note: Same output RHEL4 (32bit): tar cf /dev/null --sparse --ignore-failed-read --totals /var/log/lastlog /bin/tar: Removing leading `/' from member names /bin/tar: /var/log/lastlog: Warning: Cannot seek to 0: Bad file descriptor Total bytes written: 19138560 (19MiB, 19MiB/s) tar cf tmp.tar /dev/null --sparse --ignore-failed-read --totals /var/log/lastlog /bin/tar: Removing leading `/' from member names Total bytes written: 10240 (10KiB, ?/s) Note: Different output
Definitely seems to be a 64-bitism: [hyclak@euclid ~]$ uname -a Linux euclid.math.ohiou.edu 2.6.9-5.0.5.ELsmp #1 SMP Tue Apr 19 17:06:07 CDT 2005 x86_64 x86_64 x86_64 GNU/Linux [hyclak@euclid ~]$ du -sh /var/log/lastlog 56K /var/log/lastlog [hyclak@euclid ~]$ ls -lh /var/log/lastlog -r-------- 1 root root 1.2T May 12 09:11 /var/log/lastlog [hyclak@morton526-L01 ~]$ uname -a Linux morton526-L01.math.ohiou.edu 2.6.9-5.0.5.ELsmp #1 SMP Wed Apr 20 00:16:40 BST 2005 i686 i686 i386 GNU/Linux [hyclak@morton526-L01 ~]$ du -sh /var/log/lastlog 72K /var/log/lastlog [hyclak@morton526-L01 ~]$ ls -lh /var/log/lastlog -r-------- 1 root root 19M May 12 11:43 /var/log/lastlog
/dev/null is handled differently in tar>=1.14 but you can obtain tar-1.13 behavior with: $tar cf - --sparse --totals lastlog | dd of=/dev/null https://www.redhat.com/archives/fedora-list/2005-May/msg00786.html
So you're recommending rewriting Amanda to work around this bug?
Or do we need to go upstream to the tar maintainers? Given comments above, that seems like it may not work very well. To me, this walks like a tar bug, quacks like a tar bug, and is leaving tar-bug droppings all over the place...
Bad about "$tar cf - --sparse --totals lastlog | dd of=/dev/null" is that it takes much more time to finish than "$tar cf /dev/null --sparse --totals lastlog" especially when lastlog is a huge sparse file. See #149407. Neither of these choices helps us. One takes to much time another return bad totals.
*Has* this been fixed upstream? The original reporter pointed to a posting on the bug-tar list, and there is a response to that post claiming the bug is fixed in CVS after the 1.15.1 release (see <http://lists.gnu.org/archive/html/bug-tar/2005-02/msg00006.html>). I can't get the CVS version to build, though (bootstrap is dying), so I can't confirm.
Tar being slow can be easily worked around by increasing the timeouts in Amanda. Tar returning invalid data is unrecoverable. I'll vote for slow and correct over fast and bogus.
(In reply to comment #24) > *Has* this been fixed upstream? The original reporter pointed to a posting > on the bug-tar list, and there is a response to that post claiming the bug > is fixed in CVS after the 1.15.1 release (see > <http://lists.gnu.org/archive/html/bug-tar/2005-02/msg00006.html>). > I can't get the CVS version to build, though (bootstrap is dying), so I can't > confirm. Yes it's fixed, but that's a different problem.
Yes, I compiled 1.15.1 on my x86_64 system and the problem still exists. See Comment #4.
I vote for slow and correct solution too. Also upstream is inclined to use this one. http://lists.gnu.org/archive/html/bug-tar/2005-07/msg00025.html
fix candidate: http://people.redhat.com/pvrabec/tar-1.14-8.RHEL4.src.rpm
(In reply to comment #29) > fix candidate: > http://people.redhat.com/pvrabec/tar-1.14-8.RHEL4.src.rpm Does this use patch 1 or patch 2 from http://lists.gnu.org/archive/html/bug-tar/2005-07/msg00025.html I have recompiled and will test tonight. Also, has the problem 1.14 had extracting been fixed in a backport? That has presented problems for amanda users in the past such that the recommended versions are 1.13.25 and 1.15.1 at this time. See http://www.amanda.org/docs/faq.html#id2554919
"Slow and correct" definitely describes the behavior of this patched version. This on a 64bit machine with /var on a 4 disk hardware RAID5: time sudo ./bin/tar cf /dev/null --sparse --ignore-failed-read --totals /var/log/lastlog ./bin/tar: Removing leading `/' from member names ./bin/tar: /var/log/lastlog: file changed as we read it Total bytes written: 10240 (10KiB, 1B/s) real 106m20.334s user 35m39.307s sys 67m16.855s
(In reply to comment #31) Same here: time /bin/tar --create --file /dev/null --directory / --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/euclid.math.ohiou.edu__1.new --sparse --ignore-failed-read --totals . Total bytes written: 5226465280 (4.9GiB, 503KiB/s) real 169m19.540s user 46m43.811s sys 100m20.684s That's a 5.5GB partition with dual Opteron 242's. tar had one CPU pegged for that amount of time. 3 hours makes it essentially useless for amanda, as it expects estimates to be done by default in 5 minutes. That can be bumped up, but 3 hours is a little excessive.
I don't think this is useless for amanda. If we reduce lastlog size on 64bit machines, it will start working.
Fair enough...I wasn't thinking in those terms. As far as I can tell, the 1.14-8 works just fine, and combined with the change to lastlog size should fix the problem.
We are also experiencing the problem here, via amanda backups. The /var partition estimate ends up being 1.2TB which is way bigger than any tape and hence the backup doesn't proceed for this partition. This is on a dual Intel EM64T machine running RHEL4 x86_64.
Is a bugfix planned for this?
Bugfix planned in update #2.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-380.html
Same problem hit me, the suggested errata don't work, I had installed tar -1.14-8.RHEL4. After updating to tar-1.15.1-11.FC4 from Fedora Core 4, it works fine. Please dig further into the issue and either adjust 1.14 or provide 1.15 for RHEL4 Thank you.