From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040124 Galeon/1.3.14 Description of problem: tar does an admirable job of writing out files as fast as it can, but when you wish to restrict how fast it writes (because of network constraints over a networked filesystem, or due to scheduling issues and device IO constraints), simply slowing its read times (e.g. by buffering its intput file through a program that pauses periodically) is often not granular enough. Version-Release number of selected component (if applicable): tar-1.13.25-12 How reproducible: Always Steps to Reproduce: 1. create a large tar file with many files 2. extract it over a networked filesystem 3. watch network bandwith go away Actual Results: network bandwidth or local device throughput inpact. Expected Results: ability to constrain that impact Additional info: To solve for this, I propose (and will attach a suggested patch for) adding a command-line option to add a variable number of milliseconds to pause after each record is written. Combining this with options to change the record size (or number of blocks per record) allows you to set any number of bytes-per-second through put that you like. Hopefully Red Hat will adopt this change across products and/or contribute it back to the project that maintains GNU tar. It would be very useful to me and I assume to others.
Created attachment 100967 [details] The command-line option added, plus docs This is the patch to add the --write-pause-time=TIME command-line flag, along with the "man" and "info" documentation change for the option.
Managing network bandwidth with a tar CLI option will "work", but isn't a general solution, nor is it of sufficiently general interest to add to the tar package and diverge from upstream sources. Try sending the patch to the upstream tar maintainers.
Actually, network utilization was just an example off the top of my head (and yes, there are other ways to attack that particular problem), however scheduling issues and IO constraints were my primary problem, and this patch HAS solved for those problems. To replicate in specific, store a large tar file with many large (preferably 1GB+) sub-files. Then attempt to un-tar this file while most (say, 90% or so) of the system's physical memory is in use by a program that is routinely using that memory. Don't plan on being able to use the machine for much until it's done unless you've applied this patch and used the given command-line argument. Oh, and no it's not just memory starvation. The core problem is really the fact that the scheduler is unable to look at system resources as a whole and determine that the combination of swapping and large numbers of user-generated reads and writes will starve the IO controler in question and result in deadlocks that in turn result in near 0% CPU utilization and massive redundancy in IO operations. This could be shrugged off as simply "loading the system" if my patch (and the use of the option in question) did not remove the problem by nudging the scheduler in the right direction. Now, I don't have the opportunity to pay Red Hat any more because I can't afford RHEL, but I bought every retail release of RHL from 4.0 to 9.0 and I'm a stock-holder and this is the first significant thing I've asked for. I don't think it's that harsh a request that Red Hat (who have far more weight with the Gnu tar folks than I do) apply and push upstream this relatively simple patch that doesn't affect anyone who chooses not to use it... is it? Thank you for your time.