450699 – [ta_IN] Errors in "sh" and "shrI" in Lohit Tamil font

Bug 450699 - [ta_IN] Errors in "sh" and "shrI" in Lohit Tamil font

Summary: [ta_IN] Errors in "sh" and "shrI" in Lohit Tamil font

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	lohit-fonts
Sub Component:
Version:	10
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Rahul Bhalerao
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-06-10 14:55 UTC by R (Chandra) Chandrasekhar
Modified:	2009-03-05 14:24 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-03-05 14:24:55 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description R (Chandra) Chandrasekhar 2008-06-10 14:55:19 UTC

Description of problem: Errors in "sh" and "shrI" in Lohit Tamil font

Version-Release number of selected component (if applicable):

Kubuntu 8.04 (Hardy Heron)
ttf-indic-fonts-core version 1:0.5.0-0ubuntu1

Problems with "sh" and "shrI" in Lohit Tamil fonts:

A. The ITRANS Tamil character "sh" in Lohit Tamil is rendered as ஶ் : the dot
above or pulli should be over the character rather than beside it. Most other
consonants are rendered correctly. For example, the consonant "k" is correctly
rendered as க். I have been told that the "font must have a correct GPOS table
for pulli positioning and the rendering engine must utilise that information
when drawing the glyph" for the character to show up correctly. Of the two, I
think it more likely that it is the font rather than the rendering system that
is likely to be faulty because க் shows up correctly.

B. The character "shrI" should properly be rendered as ஸ்ரீ. This character should
correctly be Unicode 0BB6 + 0BCD + 0BB0 + 0BC0 whereas in Lohit Tamil (and
possibly other Tamil fonts) it is made from Unicode 0BB8 + 0BCD + 0BB0 + 0BC0,
which is wrong. This should be corrected to accord with "The Unicode Standard,
Revision 5.0", p 325.

How reproducible:

Use SCIM or some scheme that allows raw Unicode to be input into a Unicode-aware
text editor and see the results onscreen.

Steps to Reproduce: A
1. Ensure that locale for Tamil is set up and that Lohit Tamil is default Tamil
font.
2. Install SCIM or other method to input raw Unicode from keyboard.
3. Into a Unicode-aware text editor, enter raw Unicode so: U+0BB6 + U+0BCD.
4. Verify that inputting U+0B95 + U+0BCD gives க் as expected. So problem lies
with font.

Actual results: For step 3: ஶ்

Expected results: For step 3: ஶ with dot (pulli) above as for க் above.

Steps to Reproduce: B
1. Ensure that locale for Tamil is set up and that Lohit Tamil is default Tamil
font.
2. Install SCIM or other scheme to input raw Unicode from keyboard.
3. Into a Unicode-aware text editor, enter raw Unicode so:
U+0BB6 + U+0BCD + U+0BB0 + U+0BC0

Actual results: For step 3: ஶ்ரீ

Expected results: For step 3: ஸ்ரீ

Additional info:

1. The character ஸ்ரீ is wrongly encoded as U+0BB8 + U+0BCD + U+0BB0 + U+0BC0 in
Lohit Tamil.
2. These errors could exist in other Tamil fonts in addition to Lohit Tamil.

Comment 1 Bug Zapper 2008-11-26 02:24:09 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 2 Padmanabhan V. K. 2009-01-22 20:38:35 UTC

(In reply to comment #0)
> A. The ITRANS Tamil character "sh" in Lohit Tamil is rendered as ஶ் : the dot
> above or pulli should be over the character rather than beside it. Most other
> consonants are rendered correctly. For example, the consonant "k" is correctly
> rendered as க். I have been told that the "font must have a correct GPOS table
> for pulli positioning and the rendering engine must utilise that information
> when drawing the glyph" for the character to show up correctly. Of the two, I
> think it more likely that it is the font rather than the rendering system that
> is likely to be faulty because க் shows up correctly.
> 

IMHO the rendering engine is faulty. Since you use Kubuntu, I'd assume your rendering engine is whatever QT/KDE uses (probably that's just called QT?). Hence this part of the issue should be reported to them.

In my case I found the same issue with Lohit Tamil with the Pango rendering engine (used in GTK+/GNOME/etc.). Downloading the source for my version of Pango and changing one of its file worked for me.

The latest version of the pango file can be viewed here:
http://svn.gnome.org/viewvc/pango/trunk/modules/indic/indic-ot-class-tables.c?revision=2684&view=markup
In the array tamlCharClasses[], entry 55 (corresponding to unicode character 0BB6) should be changed to _ct instead of _xx for "sh," "shaa," "shi," etc. to be rendered properly.

If I can get over my laziness I'll probably file a bug for this on Pango.

> B. The character "shrI" should properly be rendered as ஸ்ரீ. This character should
> correctly be Unicode 0BB6 + 0BCD + 0BB0 + 0BC0 whereas in Lohit Tamil (and
> possibly other Tamil fonts) it is made from Unicode 0BB8 + 0BCD + 0BB0 + 0BC0,
> which is wrong. This should be corrected to accord with "The Unicode Standard,
> Revision 5.0", p 325.

I agree that Lohit Tamil should be enhanced to render 0BB6 + 0BCD + 0BB0 + 0BC0 also as the "shri" ligature. However for backward-compatibility 0BB8 + 0BCD + 0BB0 + 0BC0 should probably also produce the "shri" ligature -- there would certainly be older documents where this encoding was used for the "shri" ligature, and other fonts also seem to have used this encoding, e.g. all the fonts in Mandriva's fonts-ttf-tamil-1.1-2mdk package that are installed in the machine I am typing this from.

But Lohit Tamil is producing the ligature for 0BB7 + 0BCD + 0BB0 + 0BC0, which doesn't seem to be useful.

HTH, Thanks!

Comment 3 Padmanabhan V. K. 2009-01-23 17:42:27 UTC

(In reply to comment #2)
> In my case I found the same issue with Lohit Tamil with the Pango rendering
> engine (used in GTK+/GNOME/etc.). Downloading the source for my version of
> Pango and changing one of its file worked for me.
> 
> The latest version of the pango file can be viewed here:
> http://svn.gnome.org/viewvc/pango/trunk/modules/indic/indic-ot-class-tables.c?revision=2684&view=markup
> In the array tamlCharClasses[], entry 55 (corresponding to unicode character
> 0BB6) should be changed to _ct instead of _xx for "sh," "shaa," "shi," etc. to
> be rendered properly.
> 
> If I can get over my laziness I'll probably file a bug for this on Pango.

I found that there is already a bug for this in Pango: http://bugzilla.gnome.org/show_bug.cgi?id=481203
This was also tracked here in bug 218905 which has since been closed for being filed against an old rawhide version. http://bugzilla.gnome.org/attachment.cgi?id=127113 shows the rendering with the fix in Pango (with no fix required in Lohit Tamil for this particular issue).

Comment 4 Rahul Bhalerao 2009-03-04 14:11:58 UTC

I am patching the font file for proper combination with 0bb6, but it wont work unless the corresponding patch for pango is commited.

Comment 5 Rahul Bhalerao 2009-03-05 14:24:55 UTC

Font fixed in lohit-fonts-2.3.8. Please wait for an update in pango for final resolve.

Note You need to log in before you can comment on or make changes to this bug.