Red Hat Bugzilla – Full Text Bug Listing
|Summary:||[bn, bn_IN] Better handling for BENGALI LETTER A/E|
|Product:||[Fedora] Fedora||Reporter:||Jong Bae KO <jko>|
|Component:||harfbuzz||Assignee:||Parag Nemade <pnemade>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:|
|Version:||rawhide||CC:||ankit, i18n-bugs, mclasen, petersen, pnemade, psatpute, sayamindu, tagoh|
|Target Milestone:||---||Keywords:||FutureFeature, i18n|
|Fixed In Version:||Doc Type:||Enhancement|
|Doc Text:||Story Points:||---|
|Last Closed:||2013-06-03 04:11:50 EDT||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Jong Bae KO 2006-03-16 21:26:49 EST
Description of problem: I quote from Gnome bugzilla #118299(http://bugzilla.gnome.org/show_bug.cgi?id=118299) " Trying to keep track of the four different issues in bug 113551 was pretty much impossible for me, so splitting up the comments into separate bug reports. * unmadindu@Softhome.net (Sayamindu Dasgupta): 1. Yaphala --------------- b. The sequence 0985 09CD 09AF 09BE (অ্যা) is not rendered properly. I quote from the Unicode Indic FAQ. Q: What are the Bengali characters used to transcribe the sound "a" (as in English "bat") in Unicode? A: In Bengali, the sequence "zophola" (U+09CD U+09AF) + the "aa" matra (U+09BE) is used for transcribing the English "a" in "bat". This zophola_aa can be seen as a special "composite" matra to write a new Bengali sound, imported from English. Represent these sequences using a halant (virama): Vowel_A_zophola_AA = 0985 09CD 09AF 09BE ( a- halant ya -aa ) Vowel_E_zophola_AA = 098F 09CD 09AF 09BE ( e- halant ya -aa ) If you need to add a candrabindu or other combining mark in the sequence, represent the sequence as: Vowel_A_zophola_AA + candrabindu = 0985 09CD 09AF 09BE 0981 ( a- halant ya -aa candrabindu ) * Additional Comments From Taneem Ahmed 2003-06-01 03:13: Also, a very quick hack (and a bit ugly) is to set U+985 to _ct from _iv, this will fix the 1b issue. I will also upload an image with the result. There is a small side effect, but I am sure everyone can live with that, instead of pango rendering it wrong. [ Image is http://bugzilla.gnome.org/showattachment.cgi?attach_id=17030, I don't know what the "small side effect" referred to above is - OT ] * Additional Comments From Owen Taylor 2003-06-01 04:42: Two quick thoughts on 1b: Does the 'independent vowel + halant + ya + aa' combination work in Windows? The OT bengali specification strongly implies that uniscribe doesn't handle it. It should be pretty trivial to handle by adding an extra flag to scriptFlags and writing a special case for it in indic_ot_reorder(). * Additional Comments From Taneem Ahmed 2003-06-01 04:54: I tried what you said, 1b does not get fixed with out the _ct hack. Let me explain this problem. Take the following input: U+985 U+9CD U+9AF U+9BE The problem with this is that U+985 is an independent vowel, and right now this input will become three syllables, (U+985) (U+9CD) (U+9AF U+9BE). This is not right obviously. Even if we somehow treat it as one syllable, we end up setting the tag blwf_p to all of them. This is a very very special case for U+985 where it acts as a consonant instead of a vowel. If you want to deal with it properly then we will have to add quite a few checks for U+985 in the reorder code to add proper tags. But as indic-ot.c is used by all the indic scripts, I think it will be even a bigger hack, risk, and extra delay. As this is a pure Bengali issue, I thought it will be better to keep the hack limited to Bengali :) The only side effect for my hack is that U+985 can now take up other independent vowels, which may actually be considered as a feature :) And I don't have access to a windows box at home, don't know what windows does. Can someone else please check? * Additional Comments From Owen Taylor 2003-06-01 10:49 It seems to me that the next step for 1b is to: - Find a uniscribe enabled copy of Microsoft windows - See if 'U+985 U+9CD U+9AF U+9BE' renders as desired - Try another sequence that would make sense for a consonant, but doesn't make sense for U+985, say U+985 + halant + <normal consonant> and see how that renders. Another approach would be simply to ask on the OpenType mailing list (http://www.microsoft.com/typography/otspec/otlist.htm) and ask for clarification of the relationship between the Unicode Indic FAQ item and the Bengali OpenType spec. * Additional Comments From Taneem Ahmed 2003-06-01 16:50 I just looked at the Bengali part of chapter 9 of Unicode4.0. It cleary states what to do for 1b. I don't think we need to bring it up with OpenType mailing list, unless we want to know if they are planning to add some new feature in OT layout table. And IMHO if uniscribe does not render it properly then we need to let them know, not follow them :) Comment #1 from Sayamindu Dasgupta (points: 8) 2003-07-25 15:35 UTC [reply] On a related note, I think the Bengali letter E (098F) should also be considered as a consonant. This is specified in the Indic FAQ, as well as in Chapter 9 of the Unicode standard (http://www.unicode.org/book/preview/ch09.pdf). Also, I am not very sure about this, but the sequence 09B0 09CD 098B should be allowed to form a reph with the vowel 098B. This is required for the Bengali word "Nairhit" and afaik, the latest beta of Uniscribe forms a reph (We had some discussion with Paul Nelson of Microsoft Typography on this - if you want I can forward the related emails to you) - or do I file this as yet another bug? Comment #2 from Sayamindu Dasgupta (points: 8) 2004-02-24 18:36 UTC [reply] Something I would like to point out here. The letter A acts as a consonant, *only* when it is followed by halant + ya. In other cases, it should act as a normal vowel. I have just received a file where the user using a version of pango with the _ct hack wrote Bengali letter AA as A + AA vowel sign. Visually the result is the same, but can cause problems while searching anddoing other stuff. Example rendering at http://www.peacefulaction.org/sayamindu/images/garbage.png Recently I had the chance to play around with a Microsoft Windows XP box - and they can't handle a halant ya - as Microsoft has not released official Bengali supporting version of Uniscribe yet. Comment #3 from Owen Taylor (pango developer, points: 25) 2004-02-24 19:30 UTC [reply] So, is making the _ct change for A and E better or nothing or not? I can leave this bug open, but I want to know whether I should make that change for 1.4.0. Comment #4 from Sayamindu Dasgupta (points: 8) 2004-02-25 03:48 UTC [reply] My proposal - make the changes. Microsoft is doing the same thing with Uniscribe, and ditto with the QT people. However, we should try to have a better way to do this in the next versions. Comment #5 from Owen Taylor (pango developer, points: 25) 2004-02-27 19:43 UTC [reply] Fri Feb 27 14:26:34 2004 Owen Taylor <firstname.lastname@example.org> * modules/indic/indic-ot-class-tables.c (bengCharClasses): Mark BENGALI LETTER A (U+0985) and BENGALI LETTER E (U+098F) as consonants which gives better behavior when they are combined wiht halant, though it isn't exactly right. (#118299, Sayamindu Dasgupta) (Filed as ICU bug 3626 (http://www.jtcsv.com/cgibin/icu-bugs/))" Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Activity log from the external bug reference Who: email@example.com When: 2003-07-24 20:21:08 UTC What: OtherBugsDependingOnThis Removed: Added: 113551 Who: firstname.lastname@example.org When: 2003-07-25 15:12:26 UTC What: Target Milestone Removed: Added: 1.2.4 Who: email@example.com When: 2003-08-25 14:38:22 UTC What: Target Milestone Removed: 1.2.4 Added: 1.2.5 Who: firstname.lastname@example.org When: 2003-09-15 21:50:28 UTC What: AssignedTo Removed: email@example.com Added: firstname.lastname@example.org Who: email@example.com When: 2003-09-15 21:51:17 UTC What: Component Removed: general Added: indic Who: firstname.lastname@example.org When: 2004-02-23 19:47:05 UTC What: Target Milestone Removed: 1.2.5 Added: 1.4.0 Who: email@example.com When: 2004-02-27 19:43:39 UTC What: Summary(1.4.0) Removed: Treat BENGALI LETTER A as consonan(1.4.0) Added: Better handling for BENGALI LETTER A/E(future) Who: firstname.lastname@example.org When: 2004-12-13 21:22:45 UTC What: Target Milestone Removed: future Added: Small fix http://bugzilla.gnome.org/show_activity.cgi?id=118299
Comment 1 Matthias Clasen 2006-06-20 01:14:39 EDT
Reassigning pango bugs to Behdad.
Comment 2 Akira TAGOH 2006-11-01 08:41:08 EST
Please confirm if this still happens on fc6.
Comment 3 A S Alam 2007-11-15 06:49:41 EST
it am not able to repdouce in Rawhide pango-1.19.0-1.fc9
Comment 4 Runa Bhattacharjee 2008-03-03 05:49:38 EST
Changing Language tag to [bn] as it affects both the locales.
Comment 5 Bug Zapper 2008-05-13 22:07:07 EDT
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 6 Tony Fu 2008-09-09 23:11:27 EDT
requested by Jens Petersen (#27995)
Comment 7 Bug Zapper 2009-06-09 18:07:18 EDT
This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 8 Jens Petersen 2009-06-23 02:20:09 EDT
Pravin or Parag is this ok now?
Comment 9 Parag Nemade 2009-06-23 03:40:38 EDT
Runa, Can you please check this bug if it is still exists in F11? I think this is very old bug and I need some updated information on this bug.
Comment 10 Runa Bhattacharjee 2009-06-24 23:22:05 EDT
Passing on the needinfo request to Sayamindu
Comment 11 Sayamindu Dasgupta 2009-06-25 11:23:43 EDT
This has not been fixed yet. For example, the sequence: U+0985 BENGALI LETTER A + U+09C7 BENGALI VOWEL SIGN E should not combine, but in F11, it does. Actual Result: অে Expected Result: U+0985 BENGALI LETTER A/DOTTEDCIRCLE/U+09C7 BENGALI VOWEL SIGN E
Comment 12 Pravin Satpute 2009-06-26 00:04:45 EDT
changing version to f11
Comment 13 sandeep shedmake 2009-06-27 03:19:43 EDT
Created attachment 349635 [details] Image showing rendering of [U+0985] + [U+09C7] From Comment #11, Is the attached image rendering correctly the required sequence: U+0985 (BENGALI LETTER A) + U+09C7 (BENGALI VOWEL SIGN E) ? I made a change in pango-1.24.1-1.fc11.x86_64, to get the above required sequence.
Comment 14 Sayamindu Dasgupta 2009-06-27 03:55:10 EDT
(In reply to comment #13) > Created an attachment (id=349635) [details] > Image showing rendering of [U+0985] + [U+09C7] > > From Comment #11, > > Is the attached image rendering correctly the required sequence: > U+0985 (BENGALI LETTER A) + U+09C7 (BENGALI VOWEL SIGN E) ? > > I made a change in pango-1.24.1-1.fc11.x86_64, to get the above required > sequence. Yep - this is the correct rendering.
Comment 15 Pravin Satpute 2009-07-15 09:48:33 EDT
update upstream bugzilla with patches http://bugzilla.gnome.org/show_bug.cgi?id=118299 hopefully we will get rid of this very old bug 2006-03-16, IMO which is regression of some wrong changes in pango
Comment 17 Bug Zapper 2010-03-15 07:50:05 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 13 development cycle. Changing version to '13'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 18 Akira TAGOH 2011-05-24 08:23:20 EDT
Does this issue still persist in f15?
Comment 20 Pravin Satpute 2011-05-25 03:51:18 EDT
pango indic development is almost stop, we will try to take care of this in harfbuzz-ng indic development
Comment 21 Fedora Admin XMLRPC Client 2012-01-10 10:43:53 EST
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
Comment 22 Fedora Admin XMLRPC Client 2013-05-14 08:12:42 EDT
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
Comment 23 Akira TAGOH 2013-06-03 02:04:25 EDT
Pravin, is this still persist on pango with harfbuzz?
Comment 24 Pravin Satpute 2013-06-03 04:11:50 EDT
Original problem for which the bug was opened is resolved. Regarding its regression. i.e. u0985 (অ) + u09C7 (ে) should not combine. This is now allowed in all script. Even in Devanagari u0905 (अ) + u093f (ि) it is connecting now. These combinations specifically allowed since there are chanced or such sequence in comics world for exaggeration. Related thread on Harfbuzz http://lists.freedesktop.org/archives/harfbuzz/2013-February/002961.html So as far as i think this is not bug anymore. If it is from someones opinion it must be debated on harfbuzz mailing list. Vowel + Matra combination required in such cases. Fixed in harfbuzz-0.9.12-2.fc18.x86_64