Bug 1046735 - sort command always sorts in C locale
Summary: sort command always sorts in C locale
Keywords:
Status: CLOSED DUPLICATE of bug 1001775
Alias: None
Product: Fedora
Classification: Fedora
Component: coreutils
Version: 20
Hardware: All
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Ondrej Vasik
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-26 19:38 UTC by Krzysztof Halasa
Modified: 2013-12-26 20:18 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-12-26 20:18:27 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Krzysztof Halasa 2013-12-26 19:38:50 UTC
Description of problem:
sort command seems to ignore locale completely

Version-Release number of selected component (if applicable):
coreutils-8.21-18.fc20.x86_64

How reproducible:
Set locale to e.g. pl_PL.UTF-8 and try to sort data with national multibyte characters.

Steps to Reproduce:
1. export LC_ALL=pl_PL.UTF-8
2. echo -e "ą\na\nb\nc\nć\nł\nz\nż\nź" | sort

Actual results:
a
b
c
z
ą
ć
ł
ź
ż

Expected results:
a
ą
b
c
ć
ł
z
ź
ż

Additional info:
Looks like a problem with the i18n coreutils patch. This part:
@@ -2689,14 +3311,6 @@ compare (struct line const *a, struct li
     diff = - NONZERO (blen);
   else if (blen == 0)
     diff = 1;
-  else if (hard_LC_COLLATE)
-    {
-      /* Note xmemcoll0 is a performance enhancement as
-         it will not unconditionally write '\0' after the
-         passed in buffers, which was seen to give around
-         a 3% increase in performance for short lines.  */
-      diff = xmemcoll0 (a->text, alen + 1, b->text, blen + 1);
-    }
   else if (! (diff = memcmp (a->text, b->text, MIN (alen, blen))))
     diff = alen < blen ? -1 : alen != blen;

removes call to xmemcoll0(), leaving the final comparison to memcmp() which is not locale-aware. Bringing the removed part back restores correct default sort order for me, though I guess it doesn't eliminate the problem completely (for example, it doesn't fix "sort -d" which erroneously ignores multibyte letters).

Comment 1 Ondrej Vasik 2013-12-26 20:18:27 UTC
Thanks for the report.
Yes, I know about this fact, AFAIK Ondrej Oprala (who introduced this regression) already has some improvement, unfortunately he didn't pushed the changes into the git so far (I expect he will push it once back from vacation, in January). Closing duplicate, as it was already reported in #1001775 (just bringing the xmemcoll back breaks ~10 multibyte checks, so the fix has to be improved there)...

*** This bug has been marked as a duplicate of bug 1001775 ***


Note You need to log in before you can comment on or make changes to this bug.