Bug 843331

Summary: [ta_IN] Please add glyphs for minority orthographies in Tamil
Product: [Fedora] Fedora Reporter: Shriramana Sharma <samjnaa>
Component: lohit-tamil-fontsAssignee: Pravin Satpute <psatpute>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: fonts-bugs, i18n-bugs, pnemade, psatpute
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: lohit-tamil-fonts-2.91.0-2.fc21 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-11-10 06:45:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Glyphs required for minority orthographies in Tamil and attestations for the same none

Description Shriramana Sharma 2012-07-26 04:17:12 UTC
Created attachment 600434 [details]
Glyphs required for minority orthographies in Tamil and attestations for the same

While the Tamil script is mainly used for writing Tamil language text, it is also attested to be used for other language text such as Sanskrit, Saurashtra, Hindi, Marathi, Telugu and Kannada in the form of transliteration. 

For these minority orthography usecases, some characters from script-neutral blocks are required:

1) The superscript digits ¹²³⁴ would be used for representing the varga consonants (actually ¹ is only used very rarely). The Unicode chapter on Tamil script documents this and recommends the characters (0xb9) 0xb2 0xb3 0x2074.

2) Not only the superscript digits but their corresponding subscript digits ₁₂₃₄ (0x2081-0x2084) are also attested as a stylistic variant choice. 

3) The modifier letter apostrophe 02BC ʼ is also seen.

4) Further, sometimes the candrabindu is also seen for nasality. Since there is no Tamil candrabindu character, the generic candrabindu ◌̐  at 0310 can be used. 

5) The visarga is commonly seen but the Tamil visarga code point 0B83 is mapped to the Tamil special letter aytam ஃ (which has three dots against the visarga's two dots), so we will have to place the two-dot visarga in the PUA. (Not ideal I know, but it is unlikely to encode a Tamil-specific two-dot visarga. It is not possible to use the Devanagari visarga codepoint 0903 since rendering engines will produce dotted circles as it is not correct to combine Devanagari codepoints with Tamil codepoints.)

Attestations for these usages and required glyphs are attached. Please add them with the appropriate script-neutral codepoints shown in the patch TTF so Lohit Tamil (and Lohit Tamil Classical) is also useful for these minority orthographies. BTW it would be good font design policy to make the modifier apostrophe a composite glyph of the regular apostrophe and subscript digits as composites of superscript digits.

Comment 1 Nicolas Mailhot 2012-07-26 08:08:01 UTC
BTW can you please work with fontconfig upstream to codify this coverage knowledge in an .orth file?

Comment 2 Shriramana Sharma 2012-07-26 08:26:40 UTC
Hi I'm totally inexperienced as to what "fontconfig codification of coverage knowledge in an orth file" is. 

From what you say it seems to be some mechanism by which the Linux font management system (fontconfig?) finds out which fonts has glyphs for which codepoints? I thought it automatically scanned fonts to find out their coverage.

Anyway, please point me to where to go to and I will see what I can do.

Comment 3 Parag Nemade 2012-07-26 08:37:49 UTC
fontconfig orthography files are the minimum coverage for any language. In Ortho file, we used to write Unicode character range.

Comment 4 Nicolas Mailhot 2012-07-26 09:26:27 UTC
http://cgit.freedesktop.org/fontconfig/tree/fc-lang

Comment 5 Shriramana Sharma 2012-07-26 11:41:08 UTC
Hi -- in that case if you are accepting these glyphs for the Lohit Tamil fonts, you only have to add:

0xb2, 0xb3, 0xb9, 0x2074
0x2081-0x2084
0x2bc
0x310
0xa789

Note about that last codepoint: I have been discussing with other users, and we have decided that since it is best to avoid PUA, we can use the codepoint A789 MODIFIER LETTER COLON for visarga. 

Most people are right now using 0x3a colon anyway for convenience, but in printed texts, the visarga is shown as two rings (see my TTF) and not as two dots which is the shape of the colon. So there is a need to differentiate the colon and visarga. So please map the visarga to the separate codepoint 0xa789 leaving 0x3a as it is.

BTW I have also examined the file http://cgit.freedesktop.org/fontconfig/tree/fc-lang/ta.orth. It seems that even the Tamil digits and numbers are commented out. Is this a script-specific datafile or language-specific? If it is language-specific, then the above characters should not be added to it because they are *not* for Tamil language but for *other* languages written in Tamil script.

Comment 6 Nicolas Mailhot 2012-07-31 14:20:50 UTC
(In reply to comment #5)

> BTW I have also examined the file
> http://cgit.freedesktop.org/fontconfig/tree/fc-lang/ta.orth. It seems that
> even the Tamil digits and numbers are commented out. Is this a
> script-specific datafile or language-specific? If it is language-specific,
> then the above characters should not be added to it because they are *not*
> for Tamil language but for *other* languages written in Tamil script.

It is locale-specific. So it is possible to define a locale differing from ta.orth if needed (if you look at the list fontconfig already knows quite a few sub-locales where the same script has started diverging in neighbouring locales)

Comment 7 Fedora End Of Life 2013-04-03 17:09:25 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 8 Shriramana Sharma 2013-10-31 05:10:49 UTC
Hello. It seems this issue is not yet resolved.

These additional glyphs do not relate to any particular locale other than ta_IN. They are used in ta_IN only. They are not replacement glyphs for the regular Tamil letters/signs. They are additional glyphs required for supporting minority user requirements in Tamil script.

May I know what is the problem in just adding these glyphs to the Lohit Tamil [+Classical] fonts? Thank you.

Comment 9 Pravin Satpute 2013-10-31 05:56:10 UTC
Hi Shriramana,

   Apologies, looks like i missed this. This looks perfectly fine to me. Yeah if Tamil script required some characters from other script we must provide it in single font so they can harmonise nicely with other Tamil characters. We will do this in coming month. Good to see you have provided patch as well :)

  We will do this in coming month.
   
   From .orth file perspective i think we should not add any more characters to existing ta.orth as we looks for minimal character coverage to support Tamil language. AFAIK very few Tamil fonts around has these characters.

Comment 10 Pravin Satpute 2014-09-16 06:05:13 UTC
Committed in Lohit Tamil, will be available with alpha release.

Comment 11 Fedora Update System 2014-10-14 05:21:37 UTC
lohit-tamil-fonts-2.91.0-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/lohit-tamil-fonts-2.91.0-1.fc20

Comment 12 Fedora Update System 2014-10-14 05:22:17 UTC
lohit-tamil-fonts-2.91.0-1.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/lohit-tamil-fonts-2.91.0-1.fc21

Comment 13 Fedora Update System 2014-10-16 01:57:28 UTC
Package lohit-tamil-fonts-2.91.0-1.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing lohit-tamil-fonts-2.91.0-1.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-12828/lohit-tamil-fonts-2.91.0-1.fc20
then log in and leave karma (feedback).

Comment 14 Fedora Update System 2014-10-28 10:31:33 UTC
lohit-tamil-fonts-2.91.0-2.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/lohit-tamil-fonts-2.91.0-2.fc21

Comment 15 Fedora Update System 2014-11-10 06:45:33 UTC
lohit-tamil-fonts-2.91.0-2.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.