Description of problem: Bengali sort order is not defined in /usr/share/i18n/locales/iso14651_t1_common. Version-Release number of selected component (if applicable): glibc-common-2.8.90-11 How reproducible: every time Steps to Reproduce: 1.check file /usr/share/i18n/locales/iso14651_t1_common 2. 3. Actual results: Bengali sort order should be there Expected results: Additional info:
requested by Jens Petersen (#27995)
This bug should not be assigned to Jakub. Is there an email address for the i18n team?
i18n team following fedora-i18n-bugs list
Runa, Sayamindu as per discussion we had on mailing, we can go for collation sequence used by a) Sansad b) Anandabazar can you provide me these sequences? so i can start working on it
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle. Changing version to '10'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Hello, For the bn-IN collation order, please consider the sequence as listed under the column "Glibc" in this page: http://ankur.org.in/wiki/CollationSequence Thanks Runa
I have written collation tables for bengali, available at http://pravins.fedorapeople.org/bn_IN.tar.gz steps for quick testing 1) untar file in terminal 2) cd bn_IN 3) run commnad LOCPATH=$PWD LC_ALL=bn_TT sort test_file > sorted_output test_file -> unsorted data sorted_output -> sorted output 5) i have added some entries in test_file for testing one can edit it and add more words for testing Note: following characters are not available in http://ankur.org.in/wiki/CollationSequence i have added those also in collation table, please check those also U+09E0 U+098c U+09E1 U+09D7 U+09E1 U+09BD U+09C3 U+09C4 please let me know yours comments
Hello, Thanks Pravin for getting this in place. A few notes follow here: 1. The collation sequence currently sorts as: [...] ও ঔ ক [...] হ া ি [...] ো ৌ ং ঃ ঁ [...] Comment: ং, ঃ and ঁ ought to sort as [...] ঔ ং ঃ ঁ ক [...] and not after ৌ Ref: http://ankur.org.in/wiki/CollationSequence Can you please check if we are missing something here? 2. The additional characters added: U+09E0 U+098c U+09E1 U+09D7 U+09E1 (duplicate reference) U+09BD U+09C3 (already present in our sequence) U+09C4 The one missing from the list is: U+09BC These additional characters have been put as sorted currently by your patch, under the column titled "Glibc (Proposed)" in the page: http://ankur.org.in/wiki/CollationSequence Thanks a lot.
We are also working on this issue. You can download the *draft* locale files (for bn_BD) from [0]. There are two files; one for normal sequence (the way we taught in our childhood :-) and the second one for dictionary sequence [1]. As the Bengali (bn) Collation sequence will be common for both Bangladesh (bn_BD) and India (bn_IN), we will work together. [0] http://www.ankur.org.bd/downloads/glibc_collation/ [1] http://www.ankur.org.bd/wiki/Documentation#About_Bangla_Characters
(In reply to comment #9) > We are also working on this issue. You can download the *draft* locale files > (for bn_BD) from [0]. There are two files; one for normal sequence (the way we > taught in our childhood :-) and the second one for dictionary sequence [1]. but i think we can implement only one in glibc. > As the Bengali (bn) Collation sequence will be common for both Bangladesh > (bn_BD) and India (bn_IN), we will work together. Thats nice Now i think we can do more accurate work for Bengali collation. It will be if you see once /usr/share/i18n/locales/iso14651_t1_common, this is common file for collation data and included in almost all locale file in LC_COLLATE section, We should also add Bengali collation data in this file only. Advantage of keeping collation data in this file is that, your collation data will be available to OS even if one select other locale. I saw your file, nice work, but it will be nice if you follow iso14651_t1_common file structure. check my earlier comments, i have done some work for Bengali and it is in final stage, it will be nice if you check it once and suggest any modification if require. It will be nice if Bengali India and Bangladesh collation orders are 100% same, but even if there are any difference it will not create any problem as both have different locale file and we can update differences in respective locale file. so it will not affect on and other. (for example, Devanagari script is share by Marathi, Hindi, Sanskrit language's though there basic sorting is in common file and some miscellaneous sorting stuff in respective locale file check mr_IN reorder)
(In reply to comment #10) > but i think we can implement only one in glibc. Yes. And even if it first astonished people, we implement the vorrect dictionary/library sorting in Western locales.
aha, Thanks for confirming updated collation table as per comment #8, please test it once, see comment #7 for testing let me know if any modification required, will prepare patch then
Pravin, we will test and let you know soon.
Hello Pravin, I have tested your collation rules for Bengali and seems its working great. I think you can prepare the patch for it now :) .
The result of the tests seem to be in order with the bn-IN proposal. We are ok to go ahead with this sequence in both: /usr/share/i18n/locales/iso14651_t1_common, and bn_IN locale file Pravin, your current tarball contains only one bn file. Since we have two separate locale files for bn-IN and bn-BD would we require further testing with the actual locale files? regards Runa
(In reply to comment #15) > The result of the tests seem to be in order with the bn-IN proposal. We are ok > to go ahead with this sequence in both: > > /usr/share/i18n/locales/iso14651_t1_common, and > bn_IN locale file Thanks to both of you for testing and your comments > > Pravin, your current tarball contains only one bn file. Since we have two > separate locale files for bn-IN and bn-BD would we require further testing with > the actual locale files? no, not require as we are not changing anything in locale file itself, so results will be same for both locale Just I am removing '-% TODO: Bengali sorting should be added' line from bn_BD locale as sorting will be there now :) Attaching patch Ulrich it will be nice if you look at it once
Created attachment 340873 [details] It will add bengali collation tables in iso14651_t1_common file
patch is now available in glibc upstream, Bengali collation will be available with next upstream release Thanks to all for information and help.