Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 100938 - Incorrect collation order
Incorrect collation order
Product: Fedora
Classification: Fedora
Component: glibc (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jakub Jelinek
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2003-07-27 13:01 EDT by Alan Cox
Modified: 2007-11-30 17:10 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-07-26 11:38:59 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
CY ordering (UTF-8) (398 bytes, text/plain)
2004-09-28 09:21 EDT, Alan Cox
no flags Details

  None (edit)
Description Alan Cox 2003-07-27 13:01:12 EDT
Description of problem:

In cy_GB.UTF-8 the collation order correctly sorts unaccented symbols. Accented
symbols are supposed to be sorted with the accent ignored but this does not occur

(That is the symbols aeiouwy with the accented forms a^ a/ a\ a" etc)  [I can't
put the symbols in because bugzilla's form seems to be 8859-1]
Comment 1 Ulrich Drepper 2004-09-28 00:11:55 EDT
Attach a test file.  I.e., a line with different words on separate
lines with the lines in the order in which they must appear.  These
need not be real words, just character sequences are OK.
Comment 2 Alan Cox 2004-09-28 09:21:57 EDT
Created attachment 104429 [details]
CY ordering (UTF-8)
Comment 3 Ulrich Drepper 2005-07-26 11:38:59 EDT
I actually tried this now.  The sorting order seems to be correct

This is the beginning.  The various accented characters are at the highest level
sorted along with the non-accented variant.  The only difference between this
sorting and what you I think hint at in the ordering file is that the accents
should be treated with a lower priority than the case.  But that is a choice of
the collation standard.  I do not have the intention to change that.  It would
mean changing the entire huge collation file.  E.g.,

<U0061> <a>;<BAS>;<MIN>;IGNORE # 198 a
<U00E1> <a>;<ACA>;<MIN>;IGNORE # 200
<U0041> <a>;<BAS>;<CAP>;IGNORE # 319 A
<U00C1> <a>;<ACA>;<CAP>;IGNORE # 320

These are the entries for 'a' and 'á'.  If you'd want the accents to have a
lower priority each and every character definition would have the second and
third field reversed:

<U0061> <a>;<MIN>;<BAS>;IGNORE # 198 a

This is not only a lot of work (which I won't do), it also would mean that this
locale is different from any other locale in this respect.

I'm closing the bug is WORKSFORME since this is what I think it does.

Note You need to log in before you can comment on or make changes to this bug.