Red Hat Bugzilla – Bug 843331
[ta_IN] Please add glyphs for minority orthographies in Tamil
Last modified: 2014-11-10 01:45:33 EST
Created attachment 600434 [details]
Glyphs required for minority orthographies in Tamil and attestations for the same
While the Tamil script is mainly used for writing Tamil language text, it is also attested to be used for other language text such as Sanskrit, Saurashtra, Hindi, Marathi, Telugu and Kannada in the form of transliteration.
For these minority orthography usecases, some characters from script-neutral blocks are required:
1) The superscript digits ¹²³⁴ would be used for representing the varga consonants (actually ¹ is only used very rarely). The Unicode chapter on Tamil script documents this and recommends the characters (0xb9) 0xb2 0xb3 0x2074.
2) Not only the superscript digits but their corresponding subscript digits ₁₂₃₄ (0x2081-0x2084) are also attested as a stylistic variant choice.
3) The modifier letter apostrophe 02BC ʼ is also seen.
4) Further, sometimes the candrabindu is also seen for nasality. Since there is no Tamil candrabindu character, the generic candrabindu ◌̐ at 0310 can be used.
5) The visarga is commonly seen but the Tamil visarga code point 0B83 is mapped to the Tamil special letter aytam ஃ (which has three dots against the visarga's two dots), so we will have to place the two-dot visarga in the PUA. (Not ideal I know, but it is unlikely to encode a Tamil-specific two-dot visarga. It is not possible to use the Devanagari visarga codepoint 0903 since rendering engines will produce dotted circles as it is not correct to combine Devanagari codepoints with Tamil codepoints.)
Attestations for these usages and required glyphs are attached. Please add them with the appropriate script-neutral codepoints shown in the patch TTF so Lohit Tamil (and Lohit Tamil Classical) is also useful for these minority orthographies. BTW it would be good font design policy to make the modifier apostrophe a composite glyph of the regular apostrophe and subscript digits as composites of superscript digits.
BTW can you please work with fontconfig upstream to codify this coverage knowledge in an .orth file?
Hi I'm totally inexperienced as to what "fontconfig codification of coverage knowledge in an orth file" is.
From what you say it seems to be some mechanism by which the Linux font management system (fontconfig?) finds out which fonts has glyphs for which codepoints? I thought it automatically scanned fonts to find out their coverage.
Anyway, please point me to where to go to and I will see what I can do.
fontconfig orthography files are the minimum coverage for any language. In Ortho file, we used to write Unicode character range.
Hi -- in that case if you are accepting these glyphs for the Lohit Tamil fonts, you only have to add:
0xb2, 0xb3, 0xb9, 0x2074
Note about that last codepoint: I have been discussing with other users, and we have decided that since it is best to avoid PUA, we can use the codepoint A789 MODIFIER LETTER COLON for visarga.
Most people are right now using 0x3a colon anyway for convenience, but in printed texts, the visarga is shown as two rings (see my TTF) and not as two dots which is the shape of the colon. So there is a need to differentiate the colon and visarga. So please map the visarga to the separate codepoint 0xa789 leaving 0x3a as it is.
BTW I have also examined the file http://cgit.freedesktop.org/fontconfig/tree/fc-lang/ta.orth. It seems that even the Tamil digits and numbers are commented out. Is this a script-specific datafile or language-specific? If it is language-specific, then the above characters should not be added to it because they are *not* for Tamil language but for *other* languages written in Tamil script.
(In reply to comment #5)
> BTW I have also examined the file
> http://cgit.freedesktop.org/fontconfig/tree/fc-lang/ta.orth. It seems that
> even the Tamil digits and numbers are commented out. Is this a
> script-specific datafile or language-specific? If it is language-specific,
> then the above characters should not be added to it because they are *not*
> for Tamil language but for *other* languages written in Tamil script.
It is locale-specific. So it is possible to define a locale differing from ta.orth if needed (if you look at the list fontconfig already knows quite a few sub-locales where the same script has started diverging in neighbouring locales)
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.
(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)
More information and reason for this action is here:
Hello. It seems this issue is not yet resolved.
These additional glyphs do not relate to any particular locale other than ta_IN. They are used in ta_IN only. They are not replacement glyphs for the regular Tamil letters/signs. They are additional glyphs required for supporting minority user requirements in Tamil script.
May I know what is the problem in just adding these glyphs to the Lohit Tamil [+Classical] fonts? Thank you.
Apologies, looks like i missed this. This looks perfectly fine to me. Yeah if Tamil script required some characters from other script we must provide it in single font so they can harmonise nicely with other Tamil characters. We will do this in coming month. Good to see you have provided patch as well :)
We will do this in coming month.
From .orth file perspective i think we should not add any more characters to existing ta.orth as we looks for minimal character coverage to support Tamil language. AFAIK very few Tamil fonts around has these characters.
Committed in Lohit Tamil, will be available with alpha release.
lohit-tamil-fonts-2.91.0-1.fc20 has been submitted as an update for Fedora 20.
lohit-tamil-fonts-2.91.0-1.fc21 has been submitted as an update for Fedora 21.
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing lohit-tamil-fonts-2.91.0-1.fc20'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).
lohit-tamil-fonts-2.91.0-2.fc21 has been submitted as an update for Fedora 21.
lohit-tamil-fonts-2.91.0-2.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.