1046735 – sort command always sorts in C locale

Bug 1046735 - sort command always sorts in C locale

Summary: sort command always sorts in C locale

Keywords:
Status:	CLOSED DUPLICATE of bug 1001775
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	coreutils
Sub Component:
Version:	20
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Ondrej Vasik
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-12-26 19:38 UTC by Krzysztof Halasa
Modified:	2013-12-26 20:18 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-12-26 20:18:27 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Krzysztof Halasa 2013-12-26 19:38:50 UTC

Description of problem:
sort command seems to ignore locale completely

Version-Release number of selected component (if applicable):
coreutils-8.21-18.fc20.x86_64

How reproducible:
Set locale to e.g. pl_PL.UTF-8 and try to sort data with national multibyte characters.

Steps to Reproduce:
1. export LC_ALL=pl_PL.UTF-8
2. echo -e "ą\na\nb\nc\nć\nł\nz\nż\nź" | sort

Actual results:
a
b
c
z
ą
ć
ł
ź
ż

Expected results:
a
ą
b
c
ć
ł
z
ź
ż

Additional info:
Looks like a problem with the i18n coreutils patch. This part:
@@ -2689,14 +3311,6 @@ compare (struct line const *a, struct li
     diff = - NONZERO (blen);
   else if (blen == 0)
     diff = 1;
-  else if (hard_LC_COLLATE)
-    {
-      /* Note xmemcoll0 is a performance enhancement as
-         it will not unconditionally write '\0' after the
-         passed in buffers, which was seen to give around
-         a 3% increase in performance for short lines.  */
-      diff = xmemcoll0 (a->text, alen + 1, b->text, blen + 1);
-    }
   else if (! (diff = memcmp (a->text, b->text, MIN (alen, blen))))
     diff = alen < blen ? -1 : alen != blen;

removes call to xmemcoll0(), leaving the final comparison to memcmp() which is not locale-aware. Bringing the removed part back restores correct default sort order for me, though I guess it doesn't eliminate the problem completely (for example, it doesn't fix "sort -d" which erroneously ignores multibyte letters).

Comment 1 Ondrej Vasik 2013-12-26 20:18:27 UTC

Thanks for the report.
Yes, I know about this fact, AFAIK Ondrej Oprala (who introduced this regression) already has some improvement, unfortunately he didn't pushed the changes into the git so far (I expect he will push it once back from vacation, in January). Closing duplicate, as it was already reported in #1001775 (just bringing the xmemcoll back breaks ~10 multibyte checks, so the fix has to be improved there)...

*** This bug has been marked as a duplicate of bug 1001775 ***

Note You need to log in before you can comment on or make changes to this bug.