From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
Description of problem:
tar does an admirable job of writing out files as fast as it can, but
when you wish to restrict how fast it writes (because of network
constraints over a networked filesystem, or due to scheduling issues
and device IO constraints), simply slowing its read times (e.g. by
buffering its intput file through a program that pauses periodically)
is often not granular enough.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. create a large tar file with many files
2. extract it over a networked filesystem
3. watch network bandwith go away
Actual Results: network bandwidth or local device throughput inpact.
Expected Results: ability to constrain that impact
To solve for this, I propose (and will attach a suggested patch for)
adding a command-line option to add a variable number of milliseconds
to pause after each record is written. Combining this with options to
change the record size (or number of blocks per record) allows you to
set any number of bytes-per-second through put that you like.
Hopefully Red Hat will adopt this change across products and/or
contribute it back to the project that maintains GNU tar. It would be
very useful to me and I assume to others.
Created attachment 100967 [details]
The command-line option added, plus docs
This is the patch to add the --write-pause-time=TIME command-line flag, along
with the "man" and "info" documentation change for the option.
Managing network bandwidth with a tar CLI option will "work",
but isn't a general solution, nor is it of sufficiently
general interest to add to the tar package and diverge from
Try sending the patch to the upstream tar maintainers.
Actually, network utilization was just an example off the top of my
head (and yes, there are other ways to attack that particular
problem), however scheduling issues and IO constraints were my primary
problem, and this patch HAS solved for those problems.
To replicate in specific, store a large tar file with many large
(preferably 1GB+) sub-files. Then attempt to un-tar this file while
most (say, 90% or so) of the system's physical memory is in use by a
program that is routinely using that memory.
Don't plan on being able to use the machine for much until it's done
unless you've applied this patch and used the given command-line argument.
Oh, and no it's not just memory starvation. The core problem is really
the fact that the scheduler is unable to look at system resources as a
whole and determine that the combination of swapping and large numbers
of user-generated reads and writes will starve the IO controler in
question and result in deadlocks that in turn result in near 0% CPU
utilization and massive redundancy in IO operations. This could be
shrugged off as simply "loading the system" if my patch (and the use
of the option in question) did not remove the problem by nudging the
scheduler in the right direction.
Now, I don't have the opportunity to pay Red Hat any more because I
can't afford RHEL, but I bought every retail release of RHL from 4.0
to 9.0 and I'm a stock-holder and this is the first significant thing
I've asked for. I don't think it's that harsh a request that Red Hat
(who have far more weight with the Gnu tar folks than I do) apply and
push upstream this relatively simple patch that doesn't affect anyone
who chooses not to use it... is it?
Thank you for your time.