Red Hat Bugzilla – Bug 164722
sort ignores buffer-size when reading from stdin
Last modified: 2007-11-30 17:07:08 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc3 Firefox/1.0.6
Description of problem:
Here is a description of the fix from the changelog of a newer version of coreutils. See the referenced bug report to the coreutils mailing list.
2003-09-04 Paul Eggert <email@example.com>
Don't ignore -S if input is a pipe. Bug report by Michael McFarland in
* src/sort.c (sort_buffer_size): Omit SIZE_BOUND arg. Compute the
size_bound ourselves. if an input file is a pipe and the user
specified a size, use that size instead of trying to guess the
pipe size. This has the beneficial side effect of avoiding the
overhead of default_sort_size in that case. All callers changed.
(sort): Remove static var size; now done by sort_buffer_size.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. sort a 512MB file using "sort -S 128M -o <outfilename> <filename>" and note memory and temp file usage.
2. sort the same file using "cat <filename> | sort -S 128M -o <outfilename> -" and note the memory and temp file usage.
3. note that sort ignores the -S argument when reading from stdin.
Actual Results: sort uses 128MB of memory and creates 4 temporary files when sorting the file directly.
sort uses only 32M of memory and creates 16 temporary files when sorting the file while reading from stdin and, in most cases, takes much longer to perform the final merge.
Expected Results: The sort from stdin should use the amount of memory specified (128MB in the example above) for sorting.
This has been fixed in coreutils since 9/2003. Either a newer version of coreutils should be released for RHEL3 or the fix needs to be backported to coreutils-4.5.3 package.
This defect causes significant performance problems when attempting to sort data resulting in large (multi-gigabyte in my case) output files. The number of output files generated in this case causes severe I/O bottlenecks during the merge phase. For a 60GB output file, and when attempting to use a 2GB sort buffer, the number of merge files generated goes from 30 to almost 1900.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.