Bug 178676 - Pathalogically Obscure and Inconsistent sort behavior - relative key positions dependent on -t usage
Pathalogically Obscure and Inconsistent sort behavior - relative key position...
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: coreutils (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Tim Waugh
:
: 178674 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-01-23 02:16 EST by JW
Modified: 2007-11-30 17:11 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-05-15 08:29:46 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description JW 2006-01-23 02:16:49 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; MSIE 6.0; Windows; U; AIIEEEE!; Win98; Windows 98; en-US; Gecko masquerading as IE; should it matter?; rv:1.8b) Gecko/20050217

Description of problem:
When using sort to sort simple files the separator character specified with -t option is excluded from the sort field (as expected), but the default separator (when no -t option used) is actually included in the field.  This makes use of sort extraodinarily difficult because it is so inconsistent.


Version-Release number of selected component (if applicable):
coreutils-5.2.1-48.1

How reproducible:
Always

Steps to Reproduce:
1.echo "x 4a 2"  >testfile
2.echo "x 47 1" >>testfile
3.sort       -k 2.1,2.2n testfile
4.sort -t' ' -k 2.1,2.2n testfile

  

Actual Results:  step 3 produces:
x 47 1
x 4a 2
step 4 produces:
x 4a 2
x 47 1


Expected Results:  The same result in both cases.


Additional info:

The reason that this bug isn't always very visible is due to the happy coincidence that ascii digits, and the decimal point, are in same lexical order as their numerical values.

In the above example the intention is to sort on the entire 2nd numerical field.  Of course the "4a" isn't a proper number, but this has been chosen so that the bug becomes more easily discerned.

What is actually happening is that in case 4) the values "47" and "4a" get numerically compared as expected. However in case 3) the values " 4" and " 4" get compared, and subsequent non-numerical comparison of "7..." and "a..." is performed. This is because without the -t option the separator is included in the field (!!).  If the "a" was a digit then it would, through the aforementioned happy coincidence, appear to sort correctly.

The reason for this very dire behavior is because if "-t" is used to specify the field separator then the separator is excluded from the field (which seems very sensible indeed).  But if -t isn't used then the default field separator (space) is included in the field (!&*@$%#).  It is considered to be character #1.  Goodness knows what happens if there are multiple spaces.

Now how, one might wonder, could a "separator" be a separator but not actually be a "separator" - both at the same time.  It is abject nonsense to include a separator in a field because then it isn't a separator because it is also considered a part of a field.

If one has distinct fields and separators then they should be ... distinct.
Comment 1 JW 2006-01-23 02:19:05 EST
*** Bug 178674 has been marked as a duplicate of this bug. ***
Comment 2 Tim Waugh 2006-05-15 08:29:46 EDT
The default field separator is TAB.

Note You need to log in before you can comment on or make changes to this bug.