Bug 284881 - sort -n -t, does not work
Summary: sort -n -t, does not work
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: coreutils
Version: 7
Hardware: All
OS: Linux
medium
low
Target Milestone: ---
Assignee: Ondrej Vasik
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-09-10 17:11 UTC by Jan "Yenya" Kasprzak
Modified: 2008-01-28 15:42 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-01-28 15:42:52 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Jan "Yenya" Kasprzak 2007-09-10 17:11:46 UTC
Description of problem:
sort -n -t, does not work as expected (in the default, i.e. en_US.UTF-8 locale).

Version-Release number of selected component (if applicable):
coreutils-6.9-3.fc7

How reproducible:
100%

Steps to Reproduce:
1. keep the default locales (LC_COLLATE and LC_NUMERIC) value of en_US.UTF-8
2. sort -n -t, -k1 <<'EOF'
2101,:4AgE3<G4RNDP`
21012,:A0QIX6AI10gMP
2101,2IJIETPY=g<10@
21012,V8:AACI4TD925@
21014,:1MG<hEb@AIhU`
2101,4H@38`5ELC66M`
2101,4h>HM812P4820P
21014,V8:AACI4TD925@
2101,5AHBVEQW@dUGE@
EOF

Actual results:
2101,:4AgE3<G4RNDP`
21012,:A0QIX6AI10gMP
2101,2IJIETPY=g<10@
21012,V8:AACI4TD925@
21014,:1MG<hEb@AIhU`
2101,4H@38`5ELC66M`
2101,4h>HM812P4820P
21014,V8:AACI4TD925@
2101,5AHBVEQW@dUGE@

Expected results:
2101,4H@38`5ELC66M`
2101,4h>HM812P4820P
2101,5AHBVEQW@dUGE@
2101,:4AgE3<G4RNDP`
21012,:A0QIX6AI10gMP
21012,V8:AACI4TD925@
21014,:1MG<hEb@AIhU`
21014,V8:AACI4TD925@

Additional info:
Locales from glibc-common-2.6-4.

When LC_ALL=C is set, sorting works correctly.

Apparently sort ignores the comma inside the number values (though in the en_US
locale it probably should only be used as _thousands_ separator, so 2101,4
should not be a valid number in English).

Also sort does not handle the "-t," separator argument correctly, because it
seems the value after the comma is still being included in the sorting key.

Comment 1 Ondrej Vasik 2007-10-26 14:22:45 UTC
LC_COLLATE seems to be irrelevant. It seems that LC_NUMERIC=en_US.UTF-8 is
responsible for the problem. Only for en_US (same with en_US.UTF-8 ,en_US and
en_US.iso885915) locales AND with comma separator I have output not sorted. All
other locales and separators I checked seems to be ok. Will try to dig something
from debug.

Comment 2 Jim Meyering 2007-10-30 11:49:07 UTC
Thanks for the report, but what you're seeing is the required behavior.
The problem is that by using -k1 you're telling it to use the entire line as the
key, when you really want to use just the first column.  Use -k1,1 instead, and
it works the way you expect.

Comment 3 Jan "Yenya" Kasprzak 2007-10-30 12:00:18 UTC
OK, maybe the sort(1) manpage should be fixed then. Currently it says:
       -k, --key=POS1[,POS2]
              start a key at POS1, end it at POS2 (origin 1)
Maybe add something like "Without POS2, the whole part of the line starting at
POS1 to the end of line is used." there.

Comment 4 Ondrej Vasik 2008-01-28 15:42:52 UTC
Suggested manpage improvement added in RAWHIDE coreutils-6.10-2.fc9 , closing
that bugzilla as NOTABUG , in the next update of F7/F8 coreutils I will backport
the patch there too.


Note You need to log in before you can comment on or make changes to this bug.