Bug 18766 - sort ignores case; man page implies otherwise
sort ignores case; man page implies otherwise
Status: CLOSED NOTABUG
Product: Red Hat Linux
Classification: Retired
Component: textutils (Show other bugs)
7.0
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Bernhard Rosenkraenzer
David Lawrence
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2000-10-09 23:33 EDT by degraaf
Modified: 2007-04-18 12:29 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2000-10-10 09:26:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description degraaf 2000-10-09 23:33:37 EDT
The sort command in RH 7.0 ignores case, but the man page implies that
it doesn't.
I first noticed this when I listed all my RPM's:
        rpm -qa | sort 
and the capitalized names did not pop to the top.

Here's a list of words, then sorted two ways:

$ cat list
aaaa    1
AAAA    2
aaaa    3
bbbb    4
BBBB    5
Abbb    6
Baaa    7
aBBB    8
bAAA    9

$ sort list
aaaa    1
AAAA    2
aaaa    3
Abbb    6
aBBB    8
Baaa    7
bAAA    9
bbbb    4
BBBB    5

$ sort -f list
aaaa    1
AAAA    2
aaaa    3
Abbb    6
aBBB    8
Baaa    7
bAAA    9
bbbb    4
BBBB    5

Case is utterly ignored, so adding the -f option has no effect.
There is no complement to -f to force case to be considered.

The definitive man page from UNIX SysV says:
  Comparisons ar based on one or more sort keys extracted from each
  line of input.  By default, there is one sort key, the entire line,
  and ordering is lexicographic by bytes in machine collating
  sequence.

The Linux sort(1) man page is devoid of such elegant precision and gives
no hint of which sorting method is used, although, as we all know, there
are many possibilities.  The one clue - the presence of the -f option -
implies that case matters, but the reality is that it doesn't.

This is wrong and should be fixed.  Sort is way too important a command
to get wrong.

There are (at least) two ways to fix it:

1)  Change the program to sort in ASCII collating sequence, but fold
lower-case to upper-case when -f is given.  Fix the man page to say so.

2)  Leave the program as is, but fix the man page to define what
collating sequence is used; and delete the -f option altogether.

Method 2) would be a cheap and dirty solution in my opinion, and
severely degrades sorting capability.

BTW, here's the "right answer" from a UNIX SysV system:

$ cat list
aaaa    1
AAAA    2
aaaa    3
bbbb    4
BBBB    5
Abbb    6
Baaa    7
aBBB    8
bAAA    9

$ sort list
AAAA    2
Abbb    6
BBBB    5
Baaa    7
aBBB    8
aaaa    1
aaaa    3
bAAA    9
bbbb    4

$ sort -f list
AAAA    2
aaaa    1
aaaa    3
Abbb    6
aBBB    8
Baaa    7
bAAA    9
bbbb    4
BBBB    5

Well, it's almost right.
I can't explain or understand, with the -f option, why lines 1 and 2
were swapped while lines 4 and 5 were not.   Sigh...
Comment 1 degraaf 2000-10-10 09:26:06 EDT
P.S.:  This morning, while restlessly awakening at 0630 (why does it
always work this way?), I realized a possible explanation for the
mysterious swapping of lines 1 and 2 in the -f case.  Indeed,
an octal dump of the original list revealed an extraneous <sp>
hidden in the white space of lines 1 and 3, concealed by the <TAB>.
That would indeed promote 2 ahead of 1.
So the UNIX SysV sort really does what it says it does, after all.
Comment 2 Bernhard Rosenkraenzer 2000-10-10 09:45:39 EDT
The case sensitivity is locale related.
If you don't want it, use "LC_COLLATE=C sort".

Note You need to log in before you can comment on or make changes to this bug.