Bug 232462

Summary: Tar vs new 2.6.21 series kernels, tar strikes out.
Product: [Fedora] Fedora Reporter: Gene Heskett <gene.heskett>
Component: tarAssignee: Radovan Augustin <raugusti>
Status: CLOSED WORKSFORME QA Contact: Ben Levenson <benl>
Severity: urgent Docs Contact:
Priority: medium    
Version: 6CC: gene.heskett, pvrabec
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-31 14:57:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gene Heskett 2007-03-15 17:03:52 UTC
Description of problem:
Near total incompatibility with any kernel in the 2.6.21-rc* group.

Version-Release number of selected component (if applicable):
tar-1.15.1-24.fc6


How reproducible:
Boot to any of the .21-rc* kernels, fire off a run of amanda, the backup utility.

Steps to Reproduce:
1. Boot to any 2.6.21-rc* kernel
2.fire off, or let the amanda crontab execute it, a run of amdump.
3.This will fail with an out of tape message unless the tape has more storage
than the total of the rest of the system.
  
Actual results:
Tar, regardless of the level of the backup its requested to do, will find
totally bugus sizes for even small, just one directory, disklist entries.

For instance, a du -h /GenesAmandaHelper-0.6 will report that there is 766 megs
of data in this subdir off the / system.  Tar however, apparently thinks there
is 136.2GB of data there to backup.  Where its getting this data I have NDI.

My amanda databases are so fubared by this now that it will take about a month
of running a 2.6.20.3 kernel to make them sane again.

Expected results:
A normal amanda backup

Additional info:
I do not think this is a filesystem problem, its all ext3, managed by LVM2 and
regardless of the kernel booted to, an 'ls -lc' to display the m or c times as
the convention might be, those times look absolutely sane to this user.  Thats
why I'm pointing the accusatory finger at tar.

I use vtapes so getting at this data is a piece of cake since a vtape is just
another dir on a separate hard drive.

My /usr/movies tree hasn't changed in several months as I haven't shot any
weddings lately.  It presently contains according to a du -h, a hair over 8.1GB
of data.  From one of these backups done under a 2.6.21-rc* kernel, here is an
ls -l of that particular file as it exists as a level 1 under these conditions:
-rw------- 1 amanda disk 7624687616 Mar 12 00:43 00001.coyote._usr_movies.1
Not quite 8.1GB but waaaayyyyy too big when it really should be just a list of
files since nothing has been changed recently enough to knock any dust off it.

That same file, with the kernel being 2.6.20.3-rsdl-03.30 or any other earlier
kernel will be as shown by this ls -lR of the Dailys dir:
[root@coyote Dailys]# ls -lR|grep movies.1
-rw------- 1 amanda disk         15 Mar 12 00:43 00001-coyote._usr_movies.1
-rw------- 1 amanda disk 7624687616 Mar 12 00:43 00001.coyote._usr_movies.1
-rw------- 1 amanda disk         10 Mar  3 02:03 00040-coyote._usr_movies.1
-rw------- 1 amanda disk      65536 Mar  3 02:03 00040.coyote._usr_movies.1
-rw------- 1 amanda disk         15 Nov 22 11:14 00003-coyote._usr_movies.1
-rw------- 1 amanda disk         10 Mar  5 01:45 00026-coyote._usr_movies.1
-rw------- 1 amanda disk      65536 Mar  5 01:45 00026.coyote._usr_movies.1
-rw------- 1 amanda disk         10 Mar  6 01:15 00027-coyote._usr_movies.1
-rw------- 1 amanda disk      65536 Mar  6 01:15 00027.coyote._usr_movies.1
-rw------- 1 amanda disk         10 Mar  7 01:52 00027-coyote._usr_movies.1
-rw------- 1 amanda disk      65536 Mar  7 01:52 00027.coyote._usr_movies.1
-rw------- 1 amanda disk         10 Mar  8 02:18 00025-coyote._usr_movies.1
-rw------- 1 amanda disk      65536 Mar  8 02:18 00025.coyote._usr_movies.1
-rw------- 1 amanda disk         10 Mar 10 03:27 00028-coyote._usr_movies.1
-rw------- 1 amanda disk      65536 Mar 10 03:27 00028.coyote._usr_movies.1
-rw-------  1 amanda disk         10 Mar 11 01:02 00026-coyote._usr_movies.1
-rw-------  1 amanda disk      65536 Mar 11 01:02 00026.coyote._usr_movies.1

So you can see what it is that is the problem, all of those should have been as
the 65k examples show.  Nothing but a file listing, the directry's contents have
not changed.  And no, I didn't know there was a Nov dated one there, something
is apparently wrong with the label headers on that particular vtape.

I'd like to get up on a soapbox and preach for a stable api and return values
system for tar, but sadly I know that is never going to happen.

Comment 1 Peter Vrabec 2007-05-31 13:38:50 UTC
Could you send the tar command, which amanda use to archive your data. I'm not 
friendly with amanda and I don't have a tape at the moment, but I have tried 
to create and extract some archives without problem.


Comment 2 Gene Heskett 2007-05-31 14:57:25 UTC
I don't believe the exact version of the tar command has any bearing on this
problem.  I've had some correspondence with the tar maintainers and they are not
of a mind to fix tar, and would consider such a fixed version to be very badly
broken.

The problem stems from the vaporous nature of disk major,minor device number
schemes as set forth in LANANNA.  When combined with the use of the LVM2 disk
manager, you have the whole disk addressed at a 253 or 254 major device.

Now, I come along, and add the module pktcdvd module to my kernels build so that
my burner can be used as a packet device.

On the next boot, pktcdvd (also an experimental service/device according to
LANANNA) grabs the disks major number and shoves the disk down to 252 or so. 
Absolutely NOTHING else has changed, but tar, in its infinite wisdom, sees this
as a whole new disk and tries to do a level 0 backup of the whole 200gb drive.

The cure turned out to be very simple to apply, and I'd assume that it requires
the dm-mod module to remain a module, and that this fix will not work if its
built into the kernel.  To remain a module, and still boot to it, then of course
an initrd must be built and installed also, not a difficult requirement.

Add this line to your /etc/modprobe.conf:

options dm-mod major=238

Which will take it out of the LANANNA 'experimental' category and give it a good
stable home major device number of 236.  Then write a small shell script to loop
amanda through at least a dumpcycles worth of backups plus 2 or 3 more so that
this new number then becomes the 'std' number it is used to seeing.

Once this is done, than tar, and amanda, will be happier than a clam, till the
next time the kernel guys decide to do something without bothering to consult
all us frogs.  When that might be, I have NDI.  I personally, would like to see
tar  modified so that its reliance on this can be switched off until such time
as tar has updated all its index files, at which point that command option could
be removed with no ill effects.  But the tar people are adamant that this is not
going to happen.

Cheers Gene.