Bug 164722 - sort ignores buffer-size when reading from stdin
Summary: sort ignores buffer-size when reading from stdin
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: coreutils
Version: 3.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Tim Waugh
QA Contact:
URL: http://lists.gnu.org/archive/html/bug...
Whiteboard: RHEL3U7NAK
Depends On:
Blocks: 190430
TreeView+ depends on / blocked
 
Reported: 2005-07-31 01:08 UTC by Rob Riggs
Modified: 2007-11-30 22:07 UTC (History)
0 users

Fixed In Version: RHBA-2007-0474
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-06-11 18:54:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0474 0 normal SHIPPED_LIVE coreutils bug fix update 2007-06-07 22:35:57 UTC

Description Rob Riggs 2005-07-31 01:08:31 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc3 Firefox/1.0.6

Description of problem:
Here is a description of the fix from the changelog of a newer version of coreutils.  See the referenced bug report to the coreutils mailing list.

2003-09-04  Paul Eggert  <eggert>

	Don't ignore -S if input is a pipe.  Bug report by Michael McFarland in
	<http://mail.gnu.org/archive/html/bug-coreutils/2003-09/msg00008.html>.

	* src/sort.c (sort_buffer_size): Omit SIZE_BOUND arg.  Compute the
	size_bound ourselves. if an input file is a pipe and the user
	specified a size, use that size instead of trying to guess the
	pipe size.  This has the beneficial side effect of avoiding the
	overhead of default_sort_size in that case.  All callers changed.
	(sort): Remove static var size; now done by sort_buffer_size.


Version-Release number of selected component (if applicable):
coreutils-4.5.3-26

How reproducible:
Always

Steps to Reproduce:
1. sort a 512MB file using "sort -S 128M -o <outfilename> <filename>" and note memory and temp file usage.
2. sort the same file using "cat <filename> | sort -S 128M -o <outfilename> -" and note the memory and temp file usage.
3. note that sort ignores the -S argument when reading from stdin.
  

Actual Results:  sort uses 128MB of memory and creates 4 temporary files when sorting the file directly.

sort uses only 32M of memory and creates 16 temporary files when sorting the file while reading from stdin and, in most cases, takes much longer to perform the final merge. 

Expected Results:  The sort from stdin should use the amount of memory specified (128MB in the example above) for sorting.

Additional info:

This has been fixed in coreutils since 9/2003.  Either a newer version of coreutils should be released for RHEL3 or the fix needs to be backported to coreutils-4.5.3 package.

This defect causes significant performance problems when attempting to sort data resulting in large (multi-gigabyte in my case) output files.  The number of output files generated in this case causes severe I/O bottlenecks during the merge phase.  For a 60GB output file, and when attempting to use a 2GB sort buffer, the number of merge files generated goes from 30 to almost 1900.

Comment 11 Red Hat Bugzilla 2007-06-11 18:54:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0474.html



Note You need to log in before you can comment on or make changes to this bug.