Bug 1055597

Summary: sort produces incorrectly ordered results
Product: [Fedora] Fedora Reporter: Tom Hughes <tom>
Component: coreutilsAssignee: Ondrej Oprala <ooprala>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: admiller, bugzilla, kdudka, kzak, ooprala, ovasik, pblaho, pbrady, p, twaugh
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-01-16 03:36:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tom Hughes 2014-01-20 15:26:52 UTC
Use sort from coreutils-8.21-20.fc20.x86_64 seems to (sometimes) produce incorrectly ordered results. Using this input:

x 1 dsfdfdsf
x2 1 dsfdfdsf
x2 2 dsfdfdsf
x 2 dsfdfdsf

and "sort -k 1,1 -k 2n,2" does not seem to correctly order on the primary key, giving:

x 1 dsfdfdsf
x2 1 dsfdfdsf
x2 2 dsfdfdsf
x 2 dsfdfdsf

removing the trailing data from each line causes the results to be as expected, as does adding "i" to the first key so that "sort -k 1i,1 -k 2n,2" gives the expected result:

x 1 dsfdfdsf
x 2 dsfdfdsf
x2 1 dsfdfdsf
x2 2 dsfdfdsf

The sort from coreutils-8.21-11.fc19.x86_64 in F19 does not seem to have this problem.

Comment 1 Ondrej Vasik 2014-01-20 15:44:46 UTC
Seems to be related to the i18n patch. With LC_ALL=C I'm getting the same output as you expect.
(Used locales and --debug output is usually useful for sort reports)

Comment 2 Tom Hughes 2014-01-20 16:00:53 UTC
Locale is en_GB.utf8 if that helps.

Comment 3 Adri Verhoef 2014-03-28 10:24:58 UTC
Another example.

Inputfile is /tmp/a-a containing four lines with three tab-separated columns of which the middle one has these values:

AA
AAA
A
A A

$ cat /tmp/a-a 
2	AA	E
3	AAA	E
1	A	E
0	A A	E
$ cut -f2 /tmp/a-a | sort
A
AA
A A
AAA
$ cut -f2 /tmp/a-a | sort --debug
sort: using ‘en_US.UTF-8’ sorting rules
A
_
AA
__
A A
___
AAA
___
$ cut -f2 /tmp/a-a | sort --debug -r
sort: using ‘en_US.UTF-8’ sorting rules
AAA
___
A A
___
AA
__
A
_

All looks normal and is properly sorted till this far.

Now sort the file with the three columns, the middle one being the key to sort.
'$TAB' has the value of a real Tab character (^I).
$  TAB="	";echo x"$TAB"x
x	x
$ < /tmp/a-a sort -t "$TAB" -k 2,2 
3	AAA	E
0	A A	E
2	AA	E
1	A	E
$ < /tmp/a-a sort -t "$TAB" -k 2,2 -r
1	A	E
2	AA	E
0	A A	E
3	AAA	E
$ < /tmp/a-a sort -t "$TAB" -k 2,2 -i
1	A	E
2	AA	E
0	A A	E
3	AAA	E
$ < /tmp/a-a sort -t "$TAB" -k 2,2 --debug
sort: using ‘en_US.UTF-8’ sorting rules
3>AAA>E
  ___
_______
0>A A>E
  ___
_______
2>AA>E
  __
______
1>A>E
  _
_____

More info:
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ type sort
sort is hashed (/usr/bin/sort)
$ rpm -qf /usr/bin/sort /usr/share/i18n/locales/en_US
coreutils-8.21-21.fc20.x86_64
glibc-common-2.18-12.fc20.x86_64

Comment 4 Adri Verhoef 2015-01-15 19:41:54 UTC
The problem has been resolved for me in Fedora 21 with
$ rpm -qf /usr/bin/sort /usr/share/i18n/locales/en_US
coreutils-8.22-19.fc21.x86_64
glibc-common-2.20-7.fc21.x86_64

Comment 5 Tom Hughes 2015-01-16 00:06:28 UTC
Agreed that this seems to be correct in F21.

Comment 6 Pádraig Brady 2015-01-16 03:36:51 UTC

*** This bug has been marked as a duplicate of bug 1003544 ***