Red Hat Bugzilla – Bug 460806
[bn_IN] Bengali collation/sorting order is not defined in glibc
Last modified: 2009-05-05 03:29:47 EDT
Description of problem:
Bengali sort order is not defined in /usr/share/i18n/locales/iso14651_t1_common.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.check file /usr/share/i18n/locales/iso14651_t1_common
Bengali sort order should be there
requested by Jens Petersen (#27995)
This bug should not be assigned to Jakub. Is there an email address for the i18n team?
i18n team following fedora-i18n-bugs list
as per discussion we had on mailing, we can go for collation sequence
can you provide me these sequences?
so i can start working on it
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.
More information and reason for this action is here:
For the bn-IN collation order, please consider the sequence as listed under the column "Glibc" in this page:
I have written collation tables for bengali,
available at http://pravins.fedorapeople.org/bn_IN.tar.gz
steps for quick testing
1) untar file
2) cd bn_IN
3) run commnad
LOCPATH=$PWD LC_ALL=bn_TT sort test_file > sorted_output
test_file -> unsorted data
sorted_output -> sorted output
5) i have added some entries in test_file for testing
one can edit it and add more words for testing
Note: following characters are not available in http://ankur.org.in/wiki/CollationSequence
i have added those also in collation table, please check those also
please let me know yours comments
Thanks Pravin for getting this in place. A few notes follow here:
1. The collation sequence currently sorts as:
[...] ও ঔ ক [...] হ া ি [...] ো ৌ ং ঃ ঁ [...]
Comment: ং, ঃ and ঁ ought to sort as [...] ঔ ং ঃ ঁ ক [...] and not after ৌ
Can you please check if we are missing something here?
2. The additional characters added:
U+09E1 (duplicate reference)
U+09C3 (already present in our sequence)
The one missing from the list is: U+09BC
These additional characters have been put as sorted currently by your patch, under the column titled "Glibc (Proposed)" in the page:
Thanks a lot.
We are also working on this issue. You can download the *draft* locale files (for bn_BD) from . There are two files; one for normal sequence (the way we taught in our childhood :-) and the second one for dictionary sequence .
As the Bengali (bn) Collation sequence will be common for both Bangladesh (bn_BD) and India (bn_IN), we will work together.
(In reply to comment #9)
> We are also working on this issue. You can download the *draft* locale files
> (for bn_BD) from . There are two files; one for normal sequence (the way we
> taught in our childhood :-) and the second one for dictionary sequence .
but i think we can implement only one in glibc.
> As the Bengali (bn) Collation sequence will be common for both Bangladesh
> (bn_BD) and India (bn_IN), we will work together.
Now i think we can do more accurate work for Bengali collation.
It will be if you see once /usr/share/i18n/locales/iso14651_t1_common, this is common file for collation data and included in almost all locale file in LC_COLLATE section,
We should also add Bengali collation data in this file only.
Advantage of keeping collation data in this file is that, your collation data will be available to OS even if one select other locale.
I saw your file, nice work, but it will be nice if you follow iso14651_t1_common file structure.
check my earlier comments, i have done some work for Bengali and it is in final stage, it will be nice if you check it once and suggest any modification if require.
It will be nice if Bengali India and Bangladesh collation orders are 100% same, but even if there are any difference it will not create any problem as both have different locale file and we can update differences in respective locale file. so it will not affect on and other.
(for example, Devanagari script is share by Marathi, Hindi, Sanskrit language's though there basic sorting is in common file and some miscellaneous sorting stuff in respective locale file check mr_IN reorder)
(In reply to comment #10)
> but i think we can implement only one in glibc.
Yes. And even if it first astonished people, we implement the vorrect dictionary/library sorting in Western locales.
Thanks for confirming
updated collation table as per comment #8,
please test it once, see comment #7 for testing
let me know if any modification required, will prepare patch then
Pravin, we will test and let you know soon.
I have tested your collation rules for Bengali and seems its working great. I think you can prepare the patch for it now :) .
The result of the tests seem to be in order with the bn-IN proposal. We are ok to go ahead with this sequence in both:
bn_IN locale file
Pravin, your current tarball contains only one bn file. Since we have two separate locale files for bn-IN and bn-BD would we require further testing with the actual locale files?
(In reply to comment #15)
> The result of the tests seem to be in order with the bn-IN proposal. We are ok
> to go ahead with this sequence in both:
> /usr/share/i18n/locales/iso14651_t1_common, and
> bn_IN locale file
Thanks to both of you for testing and your comments
> Pravin, your current tarball contains only one bn file. Since we have two
> separate locale files for bn-IN and bn-BD would we require further testing with
> the actual locale files?
no, not require as we are not changing anything in locale file itself, so results will be same for both locale
Just I am removing '-% TODO: Bengali sorting should be added' line from bn_BD locale as sorting will be there now :)
Ulrich it will be nice if you look at it once
Created attachment 340873 [details]
It will add bengali collation tables in iso14651_t1_common file
patch is now available in glibc upstream,
Bengali collation will be available with next upstream release
Thanks to all for information and help.