Bug 164722 - sort ignores buffer-size when reading from stdin
sort ignores buffer-size when reading from stdin
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: coreutils (Show other bugs)
3.0
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Tim Waugh
http://lists.gnu.org/archive/html/bug...
RHEL3U7NAK
:
Depends On:
Blocks: 190430
  Show dependency treegraph
 
Reported: 2005-07-30 21:08 EDT by Rob Riggs
Modified: 2007-11-30 17:07 EST (History)
0 users

See Also:
Fixed In Version: RHBA-2007-0474
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-06-11 14:54:24 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rob Riggs 2005-07-30 21:08:31 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc3 Firefox/1.0.6

Description of problem:
Here is a description of the fix from the changelog of a newer version of coreutils.  See the referenced bug report to the coreutils mailing list.

2003-09-04  Paul Eggert  <eggert@twinsun.com>

	Don't ignore -S if input is a pipe.  Bug report by Michael McFarland in
	<http://mail.gnu.org/archive/html/bug-coreutils/2003-09/msg00008.html>.

	* src/sort.c (sort_buffer_size): Omit SIZE_BOUND arg.  Compute the
	size_bound ourselves. if an input file is a pipe and the user
	specified a size, use that size instead of trying to guess the
	pipe size.  This has the beneficial side effect of avoiding the
	overhead of default_sort_size in that case.  All callers changed.
	(sort): Remove static var size; now done by sort_buffer_size.


Version-Release number of selected component (if applicable):
coreutils-4.5.3-26

How reproducible:
Always

Steps to Reproduce:
1. sort a 512MB file using "sort -S 128M -o <outfilename> <filename>" and note memory and temp file usage.
2. sort the same file using "cat <filename> | sort -S 128M -o <outfilename> -" and note the memory and temp file usage.
3. note that sort ignores the -S argument when reading from stdin.
  

Actual Results:  sort uses 128MB of memory and creates 4 temporary files when sorting the file directly.

sort uses only 32M of memory and creates 16 temporary files when sorting the file while reading from stdin and, in most cases, takes much longer to perform the final merge. 

Expected Results:  The sort from stdin should use the amount of memory specified (128MB in the example above) for sorting.

Additional info:

This has been fixed in coreutils since 9/2003.  Either a newer version of coreutils should be released for RHEL3 or the fix needs to be backported to coreutils-4.5.3 package.

This defect causes significant performance problems when attempting to sort data resulting in large (multi-gigabyte in my case) output files.  The number of output files generated in this case causes severe I/O bottlenecks during the merge phase.  For a 60GB output file, and when attempting to use a 2GB sort buffer, the number of merge files generated goes from 30 to almost 1900.
Comment 11 Red Hat Bugzilla 2007-06-11 14:54:27 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0474.html

Note You need to log in before you can comment on or make changes to this bug.