Description of problem: iswalpha(), iswdigit(), and iswpunct() return incorrect values for some Bangla (Bengali) characters. Specifically, iswalpha returns false and iswpunct returns true for 0981 (CANDRABINDU), 0982 (ANUSVARA), 0983 (VISARGA), and vowel modifiers (09BE, 09BF, etc.). iswdigit does not return true for the Bangla digits (09E6 through 09EF) Version-Release number of selected component (if applicable): glibc-2.7-2 How reproducible: Steps to Reproduce: Compile and run the attached program (bn_ctypes.c). Output should be self-explanatory. Actual results: Expected results: Additional info:
Created attachment 320304 [details] Short program to demonstrate the bug.
Gone through the output yeah, It is not giving right result for Bengali Matras, Vowel Modifiers as well as Digit's since there are only following types LOWER UPPER ALPHA DIGIT ALNUM PUNCT GRAPH for Vowel Modifier U0981, U0982 U0981 and Matras U09BC to U09D7 should come under ALPHA category only and Digit (09E6 through 09EF) should come under digit category Will check for other Indic script while resolving this bug Mandar what you says?
Sounds great. Minor clarification 1: I think everything that is marked as punctuation in the range U09bc to U09e3 should actually be alpha. Minor clarification 2: they should also come under ALNUM, GRAPH and PRINT categories, but maybe this is automatically ensured by the implementation. Irrelevant point: I don't think U0981, U0982, U0983 are really vowel modifiers. In school, we'd learnt that these were consonants of the ayogbaha (অযোগবাহ) type.
(In reply to comment #3) > Sounds great. > > Minor clarification 1: I think everything that is marked as punctuation in the > range U09bc to U09e3 should actually be alpha. > > Minor clarification 2: they should also come under ALNUM, GRAPH and PRINT > categories, but maybe this is automatically ensured by the implementation. > yeah, we will do this thing :) I will confirm with Ulrich, where to apply fix exactly > Irrelevant point: I don't think U0981, U0982, U0983 are really vowel modifiers. > In school, we'd learnt that these were consonants of the ayogbaha (অযোগবাহ) > type. After a consonant, vowel or Matra character, a character can be used which modifies the vowel sound and is called a "Vowel Modifier". This can be a Chandrabindu, Anuswar or Visarg. BIS for ISCII has defined this.
This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This bug is present in Malayalam(ml_IN) too. Matra signs are not recognized as letters
yeah, even i found same is present in all Indic script. I will file bug against all languages where respective language expert can give there suggestion
moving to rawhide
% The "digit" class must only contain the BASIC LATIN digits, says ISO C 99 % (sections 7.25.2.1.5 and 5.2.1) because of this we cant add our indic digit in digit class, but we can add indic digits in outdigit category as done in fa_IR http://www2.open-std.org/jtc1/SC22/WG20/docs/n608.txt b) outdigit - why is this required and how is it different from "digit" "digit" classifies all digits, while "outdigit" classifies the values used for outputting. This keyword is needed to determine the characters used for outputting.
(In reply to comment #9) > b) outdigit - why is this required and how is it different from > "digit" > > "digit" classifies all digits, while "outdigit" classifies the > values used for outputting. This keyword is needed to determine > the characters used for outputting. Just do it as fa_IR locale does it. It's the correct way: - define the digits usiing outdigits - provide an input mapping using the new map "to_inpunct" - provide additional information like thousands separators and decimal points using the "to_outpunct" map Why that you'll automatically get support for input and output of the additional number format.
Created attachment 327320 [details] patch created against glibc devel branch This patch will add modified i18n CTYPE in bn_IN, as i18n CTYPE has Indic matra characters in punct group, it will solve that problem
unlike fa_IR , Bengali script doesn't have separate thousands separators and decimal points (uses latin one only) if no issues with this, i will add same 'bn_IN CTYPE' in other indic locale also (with outdigit)
Adding Jamil Ahmed, the Fedora bn coordinator to the bug and modified the summary. The bn_BD locale file would also need to be modified in this case.
(In reply to comment #11) > This patch will add modified i18n CTYPE in bn_IN, as i18n CTYPE has Indic matra > characters in punct group, it will solve that problem This is wrong. Don't copy i18n. You'll have to use reorder_after.
Oops, it's LC_CTYPE, forget what I wrote. If the changes affect all languages using the values you're changing, then just change the i18n file. There is no reason to assume this file is correct for non-Latin languages.
Created attachment 327634 [details] Patch created against glibc devel branch This patch will modify i18n CTYPE, as i18n CTYPE has Indic matra characters in punct group, it will solve that problem, also it will add outdigit class in all indic locale
I've applied the patch to the upstream cvs. It'll be in the next rawhide build. This means all these similar BZs opened can then be closed, right?
(In reply to comment #17) > I've applied the patch to the upstream cvs. It'll be in the next rawhide > build. This means all these similar BZs opened can then be closed, right? Yes :)
*** Bug 473888 has been marked as a duplicate of this bug. ***
*** Bug 473898 has been marked as a duplicate of this bug. ***
*** Bug 474105 has been marked as a duplicate of this bug. ***
*** Bug 474107 has been marked as a duplicate of this bug. ***
*** Bug 474117 has been marked as a duplicate of this bug. ***
*** Bug 474119 has been marked as a duplicate of this bug. ***
*** Bug 474124 has been marked as a duplicate of this bug. ***
*** Bug 474127 has been marked as a duplicate of this bug. ***
Thanks for making patch upstream. feel free to reopen if any problem.
Pravin, I looked into patch and found that changes that were proposed in patch causes those locales not to work with latest glibc build. I found following locales not working with glibc-2.9.90-2.i386 as_IN bn_BD hi_IN hne_IN mai_IN Actually, above patch made localedef to report following error "circular dependencies between locale definitions" We need to fix this as soon as possible as F11 Alpha already got 5 locales missing now. Can you also look into this issue?
Created attachment 330199 [details] Will solve circular dependencies between locale definitions mr_IN, hi_IN, bn_BD, bn_IN and as_IN
Parag, thanks for notifying this bug it created big problem, above patch will solve this problem
I applied the patch upstream.