Bug 1427550 - [ta_IN] Adding additional Glyphs in Tamil
Summary: [ta_IN] Adding additional Glyphs in Tamil
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: lohit-tamil-fonts
Version: 29
Hardware: All
OS: All
unspecified
medium
Target Milestone: ---
Assignee: vishal vijayraghavan
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-28 15:04 UTC by Seshadri N
Modified: 2019-07-16 09:51 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)
Svarita Deerga Svarita Anudatta eg (36.69 KB, image/jpeg)
2017-02-28 15:04 UTC, Seshadri N
no flags Details
sfd file (371.37 KB, text/plain)
2017-03-25 19:42 UTC, Seshadri N
no flags Details
Testing result jpg file (181.86 KB, image/jpeg)
2017-03-25 19:44 UTC, Seshadri N
no flags Details
Proposed Lohit Vedic Tamil sfd (446.53 KB, text/plain)
2019-07-08 16:44 UTC, Seshadri N
no flags Details

Description Seshadri N 2017-02-28 15:04:33 UTC
Created attachment 1258398 [details]
Svarita  Deerga Svarita  Anudatta eg

Description of problem:

This is not a bug, but enhancement request.

Tamil language script is also used for writing other language text such as Sanskrit, Hindi, Malayalam, Telugu and Kannada in the form of transliteration. However Tamil script and the associated Unicode range U0B80-U0BFF does not contain many characters that are available in these other languages - mainly the Devanagari Extended unicode UA8E0 and Devanagari Vedic 

Extentions unicode U1CD0 series.

Some such requests that I have come across in my FB and other groups:

1. How can I include the signs like lines above the letters as shown in the picture. This is required to show the swaram i.e. high or low pitch of pronouncing the letter/word as in Sanskrit. This kind of lines are used in Sanskrit also.

2. I need to type musical notations in Tamil Using MSword processor. To indicate octave we put a dot on top or bottom of the letter. how to add  ?

--- My answer for above point 2: dot above = anuswara, dot below = nukta, Lohit Tamil does not contain nukta, but includes anuswara (U+0B82). ---

While its lot easier to juggle around in Linux based systems by switching between different fonts, its very difficult in MS based applications. 

Hence, I am requesting that 9 such frequently used Glyphs be 'added' to the existing Lohit Tamil font. The existing Glyphs from Lohit-Devanagari.ttf can be 'leveraged' for this, no need to reinvent the wheel. 

Replace:

U+0310 COMBINING CANDRABINDU  >>> replace with VEDIC TONE CANDRABINDU / anunasika U+0901, thats in Devanagari range, to be consistent with all other scrips.

Add:

Signs: 

Visarga U+0903 ( as separate and distinct, in addition to Tamil Visarga U+0B83 )
AVAGRAHA U+093D
CANDRABINDU VIRAMA U+A8F3

Combining signs/marks with Anchoring:

NUKTA U+093C
VEDIC TONE UDATTA / SVARITA U+0951
VEDIC TONE ANUDATTA U+0952
VEDIC TONE PRENKHA U+1CD2
VEDIC TONE DOUBLE/Deerga SVARITA U+1CDA
VEDIC TONE CANDRA U+1CF4


Version-Release number of selected component (if applicable):


How reproducible:

Not a bug, but enhancement request.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Seshadri N 2017-03-07 07:15:07 UTC
Hi Pravin,

After few mail & FB exchanges with the person that needed musical notations to indicate octave - we put a dot on top or bottom of the letter - and experimenting little bit with FontForge (I am learning), I have come to the conclusion that its better to use the following two Spacing Modifier Letters:

U+0307 COMBINING DOT ABOVE
U+0323 COMBINING DOT BELOW

Hence please include the above two also, in addition to the 10 already requested.

From whatever little I have learned so far in FontForge, following things are to needed to take care of my request ( in layman's language, as I am still not conversant with FF)

1. Move CANDRABINDU Glyph from U+0310 to U+0901 with the associated Anchor-0

2. Create a new GPOS Lookup 'blwm' Below Base Mark Lookup 1 and associated Subtable for Anchor-1 and "linkup" all the Consonant, Vowel and Combo Glyphs ( 12+18+5+2)

3. "Place" Anchor-0 or Anchor-1 "at the right place" to all the above 10+2 Mark Glyphs.

If the above is ok, I can modify the current Lohit Tamil V 2.91.0 and send you  .sfd file, which you can fine tune. Let me know if this will work.

Warm regards
Seshadri

Comment 2 Seshadri N 2017-03-07 10:35:17 UTC
Missed one important step:

1a. Copy 9 Lohit Devanagari and 2 Arial Glyphs and Paste into Lohit Tamil font at the respective places.

Comment 3 Seshadri N 2017-03-25 19:38:06 UTC
Hi Pravin,

Attaching my "trial and learn generated" sfd file with the following changes to 2.91.0

Added Anchor-1 Below Marker and the following Glyphs and te status (OK or Error)

No Anchor

U+0903  DEVANAGARI VISARGA >>>> Error ???
U+093D DEVANAGARI AVAGRAHA >>>> OK
U+A8F3 VEDIC TONE CANDRABINDU VIRAMA >>>> OK

Anchor-0 with Top Marker

U+0305 COMBINING OVERLINE  >>> Not tested
U+0307 COMBINING DOT ABOVE  >>> Not tested
U+0901 VEDIC TONE CANDRABINDU / ANUNASIKA ( Moved from U+0310) >>>>Error
U+0951 VEDIC TONE UDATTA / SVARITA  >>>> OK 
U+1CD2 VEDIC TONE PRENKHA  >>>> Error
U+1CDA VEDIC TONE DOUBLE/DEERGA SVARITA >>>> Error
U+1CF4 VEDIC TONE CANDRA   >>>> Error

Anchor-1 with Below Marker

U+0323 COMBINING DOT BELOW  >>> Not tested
U+0332 COMBINING LOW LINE >>> Not tested
U+0952 VEDIC TONE ANUDATTA >>>> OK 


கँ
கஂ
கः
ऽ
ꣳ
ம॑
ப॒
ப᳒
ம᳚
த᳴

Hope you can fix the issues / inform me how to fix these error. Did get some error msg while generating font, don't know how to fix it.

Thanks
Seshadri

Comment 4 Seshadri N 2017-03-25 19:42:42 UTC
Created attachment 1266409 [details]
sfd file

Sfd file

Comment 5 Seshadri N 2017-03-25 19:44:12 UTC
Created attachment 1266410 [details]
Testing result jpg file

Testing output jpg file

Comment 6 Pravin Satpute 2017-03-27 05:34:10 UTC
Nice, you moved ahead and tried to add these characters into Lohit Tamil.

I have one query how frequently these new added characters get used? Is it for all users for Tamil language or only specific set.

After adding Vedic accents to Lohit Devanagari i left rather than making Lohit Devanagari heavy with addition of this characters good to have different version of Lohit Devanagari i.e. Lohit Vedic etc, where as a Developer we can get more flexibility adding new characters and changing.

Same way i think: Rather than adding these new characters into Lohit Tamil add into new variant of Lohit Font. Did you saw Lohit Tamil Classical?  Same way to be Lohit Tamil Vedic etc??

Comment 7 Seshadri N 2017-03-27 15:27:28 UTC
Hi Pravin,

Thanks for taking this up, I am delighted. 

Yes, moved ahead, trying to learn new things, challenging and fun. 

On the sfd that I sent, few things that need to be corrected per my current 'learning'. 

1. The error is Missing Point at Extrema and Self intersecting. 

Not invested time to learn and fix it, will try in the coming days, you should be able to fix it in no time.

2. I also realised that kSha (kataml_viramataml_ssataml) is not showing up in GPOS blwn subtable and Anchor-1 is also NOT showing up at the left most side ( comparing against GPOS abvm).

3. Guess I also need to add another GSUB ( or GPOS or both) table for vowel sign aa, i, ii, u, uu, ai, o, oo, au combinations to place Anchor-0 and Anchor-1 at the 'right' place ??

4. Same for kSha

5. I have placed Anchor-0 and Anchor-1 more by guess work/positioning by looking at my laptop screen than scientifically/using the right step. 

6. And 'Transformed' the size of newly added 12 Glyphs again by guess work. 

To the best of my current understanding, the above need to be corrected/fixed.

Now coming to your questions:

Q1. How frequently these new added characters get used? Is it for all users for Tamil language or only specific set?

A1. Used by a select set of people for writing Dev Vedas like Pancha Suktam etc in Tamil and also transliterating the Dev slokas that have these characters into Tamil. 

Other contributors and masters like Sri Shriramana Sharma can throw more light. 

Q2. Did you see Lohit Tamil Classical?

A2. Yes, I dowloaded it from here, but finally what I see is Normal/Basic Tamil fonts.

https://pagure.io/lohit  >> Lohit Tamil Classical >> https://releases.pagure.org/lohit/lohit-tamil-ttf-2.5.3.tar.gz

However for oldies like me who have studied in Tamil medium and having gone thro some of the related Bugs, I know the context - the rendering of aa, i, ii 
etc, vowel signs for nna, nnna, rra etc.

But, note that, in the classical case, its an OR condition - either 'old style' or the new unified/uniform computer saavy style.

Q3. How about different versions - Lohit Tamil (basic), Lohit Tamil Vedic etc?? Because, as a Developer, we can get more flexibility adding new 
characters and changing. 

A3. Yes AND NO.

Here the condition is AND and not OR as the Classical case. 

It can give flexibility but maintaining (version control) multiple Tamil Lohit fonts can be challenging. You can also have Basic and Vedic (with just the extentions) as separate 'packages' and create Tamil Vedic by '#include'ing the Vedic. However there are also interdependencies - say adding a marker 
needs adding related Anchors in the Basic package etc. 

So, IMO, better to have single 'package' to take care of these dozen or so additions and those who want to use them can do so. 

Comments from others are welcome and the final approach is left to you. 

Thanks again for your help.

Warm regards
Seshadri

Comment 8 Shriramana Sharma 2017-06-05 10:59:14 UTC
The OP has pinged me many times via direct mail but being busy I wasn't able to respond to this issue till today.

First it should be understood that to keep things formally correct the OP should submit a request to Unicode to allot any characters that he desires to use with the Tamil script a ScriptExtension ("SE") of Taml (http://www.unicode.org/Public/UNIDATA/ScriptExtensions.txt). Just adding the characters to Lohit Tamil and adding appropriate OT tables may work but might not be portable.

The OP is advised to use http://www.unicode.org/reporting.html or better still submit a document as per http://www.unicode.org/pending/docsubmit.html with appropriate attestations for the usages he asks to be supported.

(In reply to Seshadri N from comment #0)
> Extentions unicode U1CD0 series.

Not all of these characters are attested to be used with Tamil.

> Some such requests that I have come across in my FB and other groups:
> 
> 1. How can I include the signs like lines above the letters as shown in the
> picture. This is required to show the swaram i.e. high or low pitch of
> pronouncing the letter/word as in Sanskrit. This kind of lines are used in
> Sanskrit also.

The characters attested to be used with Tamil are 0951, 0952 and 1CDA and they are already allotted ScriptExtensions for Tamil.


0951          ; Beng Deva Gran Gujr Guru Knda Latn Mlym Orya Shrd Taml Telu # Mn       DEVANAGARI STRESS SIGN UDATTA
0952          ; Beng Deva Gran Gujr Guru Knda Latn Mlym Orya Taml Telu # Mn       DEVANAGARI STRESS SIGN ANUDATTA
1CDA          ; Deva Knda Mlym Taml Telu # Mn       VEDIC TONE DOUBLE SVARITA

> 2. I need to type musical notations in Tamil Using MSword processor. To
> indicate octave we put a dot on top or bottom of the letter. how to add  ?
> 
> --- My answer for above point 2: dot above = anuswara, dot below = nukta,
> Lohit Tamil does not contain nukta, but includes anuswara (U+0B82). ---

Nuktas and anusvaras form part of Indic syllabic structure. Musical notations are outside syllable structure. Combining characters from the 03xx range should be used instead.

Note that separately single- and double-dot nukta-s are attested for Tamil and need to be added to Lohit Tamil:

http://www.unicode.org/L2/L2015/15256-tamil-nukta.pdf

Of these the double-dot nukta from Grantha is allotted SE for Tamil:

1133C         ; Gran Taml # Mn       GRANTHA SIGN NUKTA

and the single-dot nukta is in the pipeline to be encoded in the Grantha block with Script=Inherited (which means it can be used with all scripts).

> U+0310 COMBINING CANDRABINDU  >>> replace with VEDIC TONE CANDRABINDU /
> anunasika U+0901, thats in Devanagari range, to be consistent with all other
> scrips.

In fact it would seem Unicode advises to use Grantha characters for Tamil rather than Devanagari. See visarga matter below. Thus U+11301 Grantha Candrabindu would be the appropriate character.

> Visarga U+0903 ( as separate and distinct, in addition to Tamil Visarga
> U+0B83 )

UTC advises to use U+11303 Grantha Sign Visarga:
11303         ; Gran Taml # Mc       GRANTHA SIGN VISARGA

> AVAGRAHA U+093D

I suppose U+1133D Grantha Sign Avagraha would be appropriate.

> CANDRABINDU VIRAMA U+A8F3

Already in SE:
A8F3          ; Deva Taml # Lo       DEVANAGARI SIGN CANDRABINDU VIRAMA

> NUKTA U+093C
> VEDIC TONE UDATTA / SVARITA U+0951
> VEDIC TONE ANUDATTA U+0952
> VEDIC TONE DOUBLE/Deerga SVARITA U+1CDA

See above.

> VEDIC TONE PRENKHA U+1CD2
> VEDIC TONE CANDRA U+1CF4

Please provide attestation as to where these are used with Tamil.

Comment 9 Shriramana Sharma 2017-06-05 11:01:15 UTC
(In reply to Seshadri N from comment #2)
> Missed one important step:
> 
> 1a. Copy 9 Lohit Devanagari and 2 Arial Glyphs and Paste into Lohit Tamil
> font at the respective places.

IANAL but I think Arial glyphs should not be copied to Lohit fonts because of copyright issues. Use an OFL-compatible source font for borrowed glyphs and make sure they are at least stylistically acceptable to be in line with Lohit Tamil style.

Comment 10 Krishna Babu K 2018-02-28 15:19:41 UTC
No glyphs added as per the comments.

Comment 11 Ben Cotton 2018-11-27 14:07:35 UTC
This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30  Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora  'version' of '27'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 27 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 12 Seshadri N 2019-07-08 16:35:26 UTC
Thanks for the feedback from Shriramana Sharma, appreciate it very much.

I am not a fontographer. Just learnt fontforge by 'trial & learn' method. 

And I am not a Vedic scholar and I don't know Sanskrit.. All that I have learnt so far is panchasuktam in TAMIL - sad. 
But typing panchasuktam in Tamil with vedic svaras is the trigger for me to foray into this unknown territory. 
Helped me learn a very LITTLE bit about unicode, glyph, font etc,.

I have modifed the original sfd to take care of the feedback points to the best of my knowledge & abilities. 

I have added comments in FONTLOG as to what I have done ( rather copy/pasted). 

I find that the above & below (vedic) symbols don't get aligned properly for pure consonants and CVCs except for a, u, U, e, E, ai series, no idea why. 

Hoping that Redhat Fedora / Lohit font team makes final corrections and releases this Vedic Tamil font. 
(could be a new font, but better to update the existing Lohit Tamil font for easier upgrades, maintenance etc in future).

Thanks

Comment 13 Seshadri N 2019-07-08 16:44:47 UTC
Created attachment 1588438 [details]
Proposed Lohit Vedic Tamil sfd

Attaching the sfd file.

Comment 14 Seshadri N 2019-07-16 09:51:17 UTC
On second thoughts, it is better to have a separate Vedic Tamil font, instead of modifying / adding Vedic signs existing Lohit Tamil.

As Vedic letters require extra ascent and descent space.

Since I have not modified ascent and descent parameters, ( I don't know the logic on what / how to modify, what values to choose ..), I guess some of the Vedic signs are intersecting / overlapping.

Hope V Vijay will take care of this.


Note You need to log in before you can comment on or make changes to this bug.