100938 – Incorrect collation order

Bug 100938 - Incorrect collation order

Summary: Incorrect collation order

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	glibc
Sub Component:
Version:	1
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-07-27 17:01 UTC by Alan Cox
Modified:	2007-11-30 22:10 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-07-26 15:38:59 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
CY ordering (UTF-8) (398 bytes, text/plain) 2004-09-28 13:21 UTC, Alan Cox	no flags	Details
View All

Description Alan Cox 2003-07-27 17:01:12 UTC

Description of problem:

In cy_GB.UTF-8 the collation order correctly sorts unaccented symbols. Accented
symbols are supposed to be sorted with the accent ignored but this does not occur

(That is the symbols aeiouwy with the accented forms a^ a/ a\ a" etc)  [I can't
put the symbols in because bugzilla's form seems to be 8859-1]

Comment 1 Ulrich Drepper 2004-09-28 04:11:55 UTC

Attach a test file.  I.e., a line with different words on separate
lines with the lines in the order in which they must appear.  These
need not be real words, just character sequences are OK.

Comment 2 Alan Cox 2004-09-28 13:21:57 UTC

Created attachment 104429 [details]
CY ordering (UTF-8)

Comment 3 Ulrich Drepper 2005-07-26 15:38:59 UTC

I actually tried this now.  The sorting order seems to be correct
already/meanwhile.a
A
Ã¡
Ã
Ã 
Ã
Ã¢
Ã
Ã¤
Ã
b
B

This is the beginning.  The various accented characters are at the highest level
sorted along with the non-accented variant.  The only difference between this
sorting and what you I think hint at in the ordering file is that the accents
should be treated with a lower priority than the case.  But that is a choice of
the collation standard.  I do not have the intention to change that.  It would
mean changing the entire huge collation file.  E.g.,

<U0061> <a>;<BAS>;<MIN>;IGNORE # 198 a
<U00E1> <a>;<ACA>;<MIN>;IGNORE # 200
<U0041> <a>;<BAS>;<CAP>;IGNORE # 319 A
<U00C1> <a>;<ACA>;<CAP>;IGNORE # 320

These are the entries for 'a' and 'Ã¡'.  If you'd want the accents to have a
lower priority each and every character definition would have the second and
third field reversed:

<U0061> <a>;<MIN>;<BAS>;IGNORE # 198 a

This is not only a lot of work (which I won't do), it also would mean that this
locale is different from any other locale in this respect.

I'm closing the bug is WORKSFORME since this is what I think it does.

Note You need to log in before you can comment on or make changes to this bug.