Bug 799565 - Lohit Malayalam font does not have support for 0D4E MALAYALAM LETTER DOT REPH
Lohit Malayalam font does not have support for 0D4E MALAYALAM LETTER DOT REPH
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: lohit-malayalam-fonts (Show other bugs)
16
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Pravin Satpute
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-02 22:46 EST by Shriramana Sharma
Modified: 2013-01-03 00:10 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-01-03 00:10:57 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Revised Lohit Malayalam for bugs #799565 and #798870 (68.20 KB, application/x-font-ttf)
2012-03-05 09:42 EST, Shriramana Sharma
no flags Details
ODT and PDF for test-case (14.36 KB, application/zip)
2012-03-05 09:44 EST, Shriramana Sharma
no flags Details
Desired rendering of dot reph and comparison of Bengali syllable structure (34.43 KB, application/x-zip-compressed)
2012-07-20 07:22 EDT, Shriramana Sharma
no flags Details

  None (edit)
Description Shriramana Sharma 2012-03-02 22:46:32 EST
Description of problem:

It was announced (https://www.redhat.com/archives/lohit-devel-list/2012-February/msg00011.html) that the latest 2.5.1 version Lohit fonts support latest Unicode 6.0 characters. Especially Malayalam was also mentioned.

However I find that the Lohit Malayalam 2.5.1 release downloadable from https://fedorahosted.org/releases/l/o/lohit/lohit-malayalam-ttf-2.5.1.tar.gz does NOT provide support for 0D4E MALAYALAM LETTER DOT REPH.

As this is one of the three Malayalam characters encoded for Unicode 6.0 (see http://www.unicode.org/Public/UNIDATA/DerivedAge.txt and search for 0D4E) it should also be supported. 

[The other two characters are provided but 0D3A has a wrong glyph which I have reported as bug 798870.]

Version-Release number of selected component (if applicable):

2.5.1

Steps to Reproduce:
1. Install Lohit Malayalam 2.5.1 font. 
2. Try to use 0D4E MALAYALAM LETTER DOT REPH
  
Actual results:

This character is not available.

Expected results:

This character was encoded to support the old Malayalam orthography. As such it should be made available for full Unicode 6.0 (or 6.1) support.

Additional info:

You might need to do some smart font programming to position the dot reph correctly. Note that this character is a special rendering character (hence the dotted box around it in the code chart http://www.unicode.org/charts/PDF/U0D00.pdf). 

The special rendering is that it should be placed on top of the character *following* it. See the original proposal bottom of page 3 and top of page 4.

I think the e-Malayalam OTC font (http://www.aai.uni-hamburg.de/indtib/INDOLIPI/Malayalam.zip) has pre-composed glyphs using this character on top of other consonants which might help you in positioning this character. Note that most often it is found with doubled consonants (i.e. DOT_REPH + GA + VIRAMA + GA etc) so you will have to be able to position this character above stacked consonant clusters.

I hope this is sufficient feedback for supporting this character which is important for old Malayalam orthography.
Comment 1 Pravin Satpute 2012-03-04 23:33:08 EST
can you provide screenshot of its rendering?
Comment 2 Shriramana Sharma 2012-03-05 01:24:56 EST
The original proposal http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3676.pdf has many examples on pp 3-5. Is that enough?
Comment 3 Pravin Satpute 2012-03-05 04:33:43 EST
yes, fixed in upstream, latest ttf http://pravins.fedorapeople.org/Lohit-Malayalam.ttf
Comment 4 Shriramana Sharma 2012-03-05 09:42:05 EST
Created attachment 567649 [details]
Revised Lohit Malayalam for bugs #799565 and #798870

There are some corrections. Please find attached a revised font. 

The correction is that the dot reph should have a positive LSB. Otherwise it will become positioned on the previous letter rather than on the following letter. Please see original proposal. PA DOT_REPH VA പൎവ should place dot reph on VA, and not PA. So I have now moved the dot reph to the right. 

However, while it seems to be working correctly (at least in LibreOffice) with medium-size consonants like ഗ വ etc, it still does not look good with wide consonants like ണ. It should ideally be centered on top of the consonant as you can see in the proposal samples. Can you please implement proper glyph positioning? I don't know how to do that.

I will attach ODT and PDF samples of font as currently modified by me for testing.
Comment 5 Shriramana Sharma 2012-03-05 09:44:23 EST
Created attachment 567651 [details]
ODT and PDF for test-case
Comment 6 Pravin Satpute 2012-03-06 00:40:27 EST
this is peculiar case, where marks come first and then base character. need to check it.
Comment 7 Shriramana Sharma 2012-03-06 01:06:54 EST
Yes as I told you that is why the character in the Unicode chart has a dotted box around it to indicate that it is a special rendering character. 

I quote from: http://www.unicode.org/versions/Unicode6.0.0/ch09.pdf p 310:

Dot Reph. U+0D4E MALAYALAM LETTER DOT REPH is used to represent the dead consonant form of U+0D30 MALAYALAM LETTER RA, when **it is displayed as a dot over the consonant following it**. Conceptually, the dot reph is analogous to the sequence <RA, VIRAMA>, but when followed by another consonant, the Malayalam cluster <RA, VIRAMA, C2> normally assumes the C2 conjoining form. **U+0D4E MALAYALAM LETTER DOT REPH occurs first, in logical order, even though it displays as a dot above the succeeding consonant**. It has the character properties of a letter, and is not considered a combining mark.
Comment 8 Pravin Satpute 2012-03-07 02:03:13 EST
If we consider this as a base character, then there is no feature in OT spec for doing base to base positioning. We can use dist/kern feature but i dont think it will give expected results.

Dunno, do we need reordering for this character? we are reordering "ra+virama" in Devanagari script.
Comment 9 Santhosh Thottingal 2012-03-08 12:55:52 EST
I used akhn for this in Meera. The dot positioning using positive lsb is not optimal and will result the dot appearing in wrong positions size the ligature below is variable width. The DOT should come in center top position in general, but there are exceptions too. Meera has separate glyphs, for all valid dot rephs. But I got some issues in this implementation too.
Comment 10 Shriramana Sharma 2012-03-09 11:43:21 EST
I asked about this on Unicore list. One Microsoft engineer replied that they use reordering for this. I am guessing that they are treating the dot_reph character just like any other reph in any Indic script -- move it to the end of the syllable. The only difference is that Malayalam has a distinct character for this whereas in other scripts it is just RA + VIRAMA which is rendered as the reph.

So probably you can just treat it like the reph of other scripts. However, as Santosh says (and I already said) there will be a problem with the positioning. 

One solution is to use akhanda ligatures as Santosh says but that is just a hack. The proper solution is to use GPOS. After all, that is what GPOS is for, isn't it? To position combining marks properly? 

That said, I myself am not knowledgeable about all this GPOS-GSUB thing -- I'm more a Graphite person. So I leave it to you people to decide how to implement this.

I would only suggest that if you do use akhanda ligatures that you use composite glyphs instead of duplicating existing outlines. Would help in keeping the size of the font under check.
Comment 11 Pravin Satpute 2012-03-12 07:46:57 EDT
Santhosh, i think with reordering dot-reph to end of the syllable, we can solve this problem. 

We can simply use positioning/kerning for it, Once it get reordered same way like Anuswara in Devanagari script U+0902 we can position it.

For time being for testing purpose we can type u0d4e at the end of syllable and check.

In between i am not finding any rule written for dot-reph ligatures in Meera font.
  
Shriramana, can you ask Microsoft guy regarding any link for it in specification, we can ask behdad to do that change in harfbuzz.
Comment 12 Shriramana Sharma 2012-03-12 12:09:29 EDT
(In reply to comment #11)
> For time being for testing purpose we can type u0d4e at the end of syllable and
> check.

Agreed. If the kerning is done first, we can later bring in reordering support from the software side.

> Shriramana, can you ask Microsoft guy regarding any link for it in
> specification, we can ask behdad to do that change in harfbuzz.

AFAIK this is not present in the Microsoft OT docs on Malayalam (http://www.microsoft.com/typography/otfntdev/malayot/shaping.aspx). The Unicode publication only says right now that it should be placed after the following consonant but clearly this is insufficient description. I will ask the Unicode people to update the wording. Hopefully Behdad can implement this as you say.
Comment 13 Behdad Esfahbod 2012-07-18 00:09:23 EDT
Hi all,

We understand the Repha now.  Will implement for Malayalam soon (eg. tomorrow).

Cheers,
behdad
Comment 14 Pravin Satpute 2012-07-20 03:27:07 EDT
Thats nice.
http://pravins.fedorapeople.org/Lohit-Malayalam.ttf Test font with added GPOS 'abvm' for U+0D4E (though not very accurate positioning)
Comment 15 Shriramana Sharma 2012-07-20 07:22:44 EDT
Created attachment 599365 [details]
Desired rendering of dot reph and comparison of Bengali syllable structure

@Pravin/Behdad: I'm not much knowledgeable about but isn't the feature tag for reph characters spelt as "reph" or "rphf" or something?

@Behdad: Basically if you treat this 0D4E character equivalent to the cluster-initial RA + VIRAMA sequences of other Indic scripts, especially Bengali, I think it would be sufficient. Why Bengali? Because it also has two part vowel signs ো ৌ like Malayalam ൊ ോ ൌ and the reph also. But Bengali doesn't seem to have post-base VA unlike Malayalam so you may have to look out for that.

FWIW I have attached a document (ODT and PDF) showing the desired rendering of the reph (using the e-Malayalam OTC font from the "Indolipi" package [link above] which hack-renders the Malayalam RA + Virama combination as the reph) and the equivalent Bengali sequences in two standard Bengali fonts.

You might also like to see https://sites.google.com/site/jamadagni/files/utcsubmissions/12106-ed-updates.pdf §3 (on p 4) for more details on the Malayalam dot reph.
Comment 16 Pravin Satpute 2012-07-20 07:45:49 EDT
Feature tag will not come into picture for U+0D4E, once reordering is done by OTLS as per syllable structure, we can simply apple GPOS tag 'abvm' and get desired positioning.

I will update Lohit as per desired positioning.
Comment 17 Pravin Satpute 2012-12-21 05:02:49 EST
(In reply to comment #13)
> Hi all,
> 
> We understand the Repha now.  Will implement for Malayalam soon (eg.
> tomorrow).
> 

I just checked with Harfbuzz NG. Still we are not reordering U+0D4E at the end of syllable.

"ൎക"  
hb-shape returns :: [uni0D4E=0|U0D15=1+1015]

Any specific plan for this?
Comment 18 Behdad Esfahbod 2012-12-21 13:45:58 EST
Ok, I'll work on this now.  Thanks for pinging, I was out of issues to fix and was getting bored... :)
Comment 19 Behdad Esfahbod 2012-12-21 15:50:00 EST
Fixed upstream.  Please test.
Comment 20 Pravin Satpute 2012-12-24 01:21:59 EST
(In reply to comment #7) 
> I quote from: http://www.unicode.org/versions/Unicode6.0.0/ch09.pdf p 310:
> 
> Dot Reph. U+0D4E MALAYALAM LETTER DOT REPH is used to represent the dead
> consonant form of U+0D30 MALAYALAM LETTER RA, when **it is displayed as a
> dot over the consonant following it**. Conceptually, the dot reph is
> analogous to the sequence <RA, VIRAMA>, but when followed by another
> consonant, the Malayalam cluster <RA, VIRAMA, C2> normally assumes the C2
> conjoining form. **U+0D4E MALAYALAM LETTER DOT REPH occurs first, in logical
> order, even though it displays as a dot above the succeeding consonant**. It
> has the character properties of a letter, and is not considered a combining
> mark.

I still not get this output. do i need to use different OT feature here?
Comment 21 Behdad Esfahbod 2012-12-29 18:40:48 EST
Testing with the font in comment 14, I see the reordering happening.  Here's the hb-shape output:

$ ./hb-unicode-encode d4e,d15 | build/util/hb-shape ./Lohit-Malayalam.ttf --shaper ot
[U0D15=0+1015|uni0D4E=0@-971,-41+0]
Comment 22 Behdad Esfahbod 2012-12-29 18:46:44 EST
This is what I get with Lohit-Malayalam 2.5.2:

$ ./hb-unicode-encode d4e,d15 | build/util/hb-shape
indic-fonts-lohit/malayalam.ttf
[U0D15=0+1015|uni0D4E=0@-971,-41+0]
Comment 23 Pravin Satpute 2013-01-02 03:05:46 EST
My mistake. Messed up with git, its working fine now.
I will do release of lohit with this fix.
Thanks a lot Behdad
Comment 24 Behdad Esfahbod 2013-01-02 03:45:55 EST
Cool.  Please close when you do.
Comment 25 Pravin Satpute 2013-01-03 00:10:57 EST
Completed GPOS in upstream, will be available with the next release of lohit-malayalam-fonts.

Note You need to log in before you can comment on or make changes to this bug.