Description of problem: sort command seems to ignore locale completely Version-Release number of selected component (if applicable): coreutils-8.21-18.fc20.x86_64 How reproducible: Set locale to e.g. pl_PL.UTF-8 and try to sort data with national multibyte characters. Steps to Reproduce: 1. export LC_ALL=pl_PL.UTF-8 2. echo -e "ą\na\nb\nc\nć\nł\nz\nż\nź" | sort Actual results: a b c z ą ć ł ź ż Expected results: a ą b c ć ł z ź ż Additional info: Looks like a problem with the i18n coreutils patch. This part: @@ -2689,14 +3311,6 @@ compare (struct line const *a, struct li diff = - NONZERO (blen); else if (blen == 0) diff = 1; - else if (hard_LC_COLLATE) - { - /* Note xmemcoll0 is a performance enhancement as - it will not unconditionally write '\0' after the - passed in buffers, which was seen to give around - a 3% increase in performance for short lines. */ - diff = xmemcoll0 (a->text, alen + 1, b->text, blen + 1); - } else if (! (diff = memcmp (a->text, b->text, MIN (alen, blen)))) diff = alen < blen ? -1 : alen != blen; removes call to xmemcoll0(), leaving the final comparison to memcmp() which is not locale-aware. Bringing the removed part back restores correct default sort order for me, though I guess it doesn't eliminate the problem completely (for example, it doesn't fix "sort -d" which erroneously ignores multibyte letters).
Thanks for the report. Yes, I know about this fact, AFAIK Ondrej Oprala (who introduced this regression) already has some improvement, unfortunately he didn't pushed the changes into the git so far (I expect he will push it once back from vacation, in January). Closing duplicate, as it was already reported in #1001775 (just bringing the xmemcoll back breaks ~10 multibyte checks, so the fix has to be improved there)... *** This bug has been marked as a duplicate of bug 1001775 ***