Bug 455981
Summary: | Missing locl romanian magic | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Nicolas Mailhot <nicolas.mailhot> |
Component: | dejavu-fonts | Assignee: | Ben Laenen <bl.bugs> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | rawhide | CC: | fonts-bugs, gaburici, i18n-bugs, quantumburnz |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-10-27 03:20:13 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 438944 | ||
Attachments: |
Description
Nicolas Mailhot
2008-07-19 18:33:41 UTC
I've been told by a Romanian person (as I explicitly asked about this) that they don't expect to see the S/T with comma below if they type the S/T with cedilla... Romanians seems to be in disagreement :( My personal view is that it shouldn't be done. The S/T with cedilla code points are no longer unified with the Romanian letters S/T with comma below, so they should never appear like them anymore, and if they do, it's a bug. It should be done because locl is an *optional* font feature. The application is free to request it or not. Unfortunately pango always turns locl on based on laguage. It should be configurable at pango's level, preferably in way that allows application to modify it via pango markup. It's okay for the default pango settings to turn locl on for Romania because the Romanian Academy typographic standard requires commas not cedillas for Romanian text. The only possible trouble spot is a Turkish name. But that name could be marked-up as being in the Turkish language in the document [well, not in plain text]. BTW, if you ever saw a Romanian document rendered with mixed cedillas and commas you wouldn't doubt the necessity of locl. Adobe introduced ROM/locl because they (and 99% of commercial fonts) remap "t with cedilla" to "t with comma" regardless of locale, based on the assumption that "t with cedilla" is not used in any language [There's a post on Adobe forums, but I'm too tired to find it now]. Mixed diacritics look like €rap for Romanian text in the pre-Unicode 3.0 encoding, which sadly is still far more widespread at least on the web [check with Google]. A [mild] picture of mixed diacritics is here: http://en.wikipedia.org/wiki/Romanian_alphabet#Adobe.2FLinotype.2FVista_de-facto_standard. This visual inconsistency is why Adobe Pro fonts can also map "s with cedilla" to "s with comma" when ROM/locl is turned on. FYI: Vista fonts and the Linotype fonts [you can check on their site] behave the same way. Now assume DejaVu, which currently doesn't honor ROM/locl, is used in a document together with a font that does honor the ROM/locl substituion, not necessarily a comercial one, e.g. one of the free SIL fonts[*]. You'd get mixed diacritics again. Granted this is not in the same font, but it is still in the same document and it looks bad... Footnote [*] SIL fonts use ROM/ccmp to do the mapping, but pango turns that on too. I'm not aware of any other fonts except those from SIL that work this way. Please don't make dejavu work that way. I'd rather have you adopt the Adobe standard which is used in hundreds of fonts. Ugh, Unicode seems to have made an even bigger mess out of this than I originally thought... So, apparently both U+015E-U+015F, U+0162-U+0163, and U+0218-U+021B can still all be used for Romanian. With the extra string attached to U+0218-U+021B that they should be used when a distinct shape with comma below is needed. So you're still allowed the U+015E-U+015F, U+0162-U+0163 glyphs to write Romanian apparently. And since Unicode only cares about code points, it didn't give any clue on how fonts or renderers are supposed to know when distinct glyphs are needed. Yet Unicode expects them to clean up the mess they've made. > It should be done because locl is an *optional* font feature. I thought it was obligated if a language was passed to the renderer (but I may be wrong on this). > Adobe introduced ROM/locl because they (and 99% of commercial fonts) remap > "t with cedilla" to "t with comma" regardless of locale That's just bad, t with cedilla _is_ used sometimes. I think it was even proposed a long time ago to be used in French for when a t sounds like /s/, like "relaţion" (didn't catch on unfortunately :-) ). Unicode itself mentions Semitic transliteration (but I guess that needs a lot of other glyphs those fonts don't have). So far I've only found three Adobe fonts with Romanian glyphs and two didn't have the locl rule, so it looks like Adobe doesn't do it often either. They all have indeed t with comma below in the place of t with cedilla. If you have documents with mixed diacritics you can blame it on that practice, _not_ the absence of locl rules in the font. I've also checked the MS Vista fonts once (usually they make the de facto standard rules since their fonts are most widely spread). Segoe UI and the new versions of Arial, Times New Roman etc. don't have locl rules or anything else and have t with cedilla at U+0162-U+0163 (I think the old versions known as the corefonts were pre-Unicode 3.0). The C-fonts which were made by another foundry have t with comma below at U+0162-U+0163 like Adobe fonts, and have a salt (stylistic alternate) _and_ a locl feature for s with cedilla glyphs to s with comma below for Romanian. Also, one thing I'm asking myself is: why doesn't Gentium have locl rules (or ccmp rules)? It's a more recent font compared to Doulos and Charis, so the SIL people seem to have changed their minds about it, and I'd like to know their reasons before changing anything in DejaVu. So, short conclusion: how it's dealt with it seems to just depend on the foundry that made the fonts, and it also seems to depend on who you ask. So far, I haven't seen enough yet to be sure that a locl rule is needed. Also, don't always assume commercial fonts have it right. As said above, the same fonts have t with comma below in place of t with cedilla, together with a s with cedilla, which is the worst thing you can do here. (In reply to comment #5) > So, apparently both U+015E-U+015F, U+0162-U+0163, and U+0218-U+021B can still > all be used for Romanian. With the extra string attached to U+0218-U+021B that > they should be used when a distinct shape with comma below is needed. So > you're still allowed the U+015E-U+015F, U+0162-U+0163 glyphs to write Romanian > apparently. Microsoft took about 7 years to include U+0218-U+021B in *some* Windows XP fonts, which happened only after Romanian got into the EU :) Some XP fonts (Georgia, Courier) still don't have the proper glyphs, even after the update [http://www.microsoft.com/downloads/details.aspx?familyid=0ec6f335-c3de-44c5-a13d-a1e7cea5ddea&displaylang=en] (google "EU font expansion update if that ugly link doesn't work). The result is that documents using the pre-Unicode 3.0 encoding (U+015E-U+015F, U+0162-U+0163) still dominate. > > It should be done because locl is an *optional* font feature. > > I thought it was obligated if a language was passed to the renderer (but I may be wrong on this). Currently Uniscribe (the XP renderer) doesn't honor it at all. At least in XP SP3. > > Adobe introduced ROM/locl because they (and 99% of commercial fonts) remap > > "t with cedilla" to "t with comma" regardless of locale > > That's just bad, t with cedilla _is_ used sometimes. I think it was even > proposed a long time ago to be used in French for when a t sounds like /s/, > like "relaţion" (didn't catch on unfortunately :-) ). Unicode itself mentions > Semitic transliteration (but I guess that needs a lot of other glyphs those > fonts don't have). I agree it's bad. *Very few* commercial fonts have a proper "t with cedilla". Verdna and Tahoma are only significant ones. Everything else follows the Adobe standard. You can check commercial fonts at Linotypes' website. Below is a link that restricts the search to fonts that support the Romanian characters: [http://www.linotype.com/featuresearch?cf[]=adobece&cf[]=euro&cf[]=latinext] You have to enter a test string yourself, since that doesn't go in the URL. Use: aăâiîsştţ€sștț. > So far I've only found three Adobe fonts with Romanian glyphs and two didn't > have the locl rule, so it looks like Adobe doesn't do it often either. They > all have indeed t with comma below in the place of t with cedilla. If you have > documents with mixed diacritics you can blame it on that practice, _not_ the > absence of locl rules in the font. You probably looked at old fonts. All the Pro fonts they are currently shipping have complete support for Romanian, with a "t with cedilla" substituted by the comma variant regardles of locale, and with a ROM/locl feature that *additionally* substitutes "s with cedilla" with "s with comma". Vista C-series fonts have exactly the same feature set, as you pointed out. [http://en.wikipedia.org/wiki/Romanian_alphabet#Adobe.2FLinotype.2FVista_de-facto_standard] > Also, one thing I'm asking myself is: why doesn't Gentium have locl rules (or > ccmp rules)? It's a more recent font compared to Doulos and Charis, so the SIL > people seem to have changed their minds about it, and I'd like to know their > reasons before changing anything in DejaVu. You need to ask them. IMHO, their implementation of the remapping via ccmp violates the OpenType 1.4 standard: ccmp should *not* depend on the langage. > > So, short conclusion: how it's dealt with it seems to just depend on the > foundry that made the fonts, and it also seems to depend on who you ask. So > far, I haven't seen enough yet to be sure that a locl rule is needed. The are some variations, but 99% of commercial fonts follow the Adobe standard. Check on Linotype's website! Unfortunately you cannot check for locl there. But the Romanian locl issue has be debated to death on typophile forums, and the opinion leaders there (fokes that run foundries) follow the Adobe standard, locl included. > Also, don't always assume commercial fonts have it right. As said above, the > same fonts have t with comma below in place of t with cedilla, together with a > s with cedilla, which is the worst thing you can do here. Adobe fonts look ok with locl on. Adobe assumed that Microsoft would implement locl sooner rather than later. InDesign CS3 supports locl in it's own renderer. OK, I guess it doesn't break anything else to add this (except your Turkish texts when reading in Romanian locale...). But it still goes against my philosophy of "don't fix problems of the past, but make sure you don't make more problems that you need to fix in the future". So, is this only for latn{ROM} and latn{MOL}, or are there other dialects that need it as well? If you know any, the full list of languages that can be used in OpenType is at http://www.microsoft.com/typography/otspec/languagetags.htm so you can check if it's there. Similar issue, the s/t with cedilla code point should be canonically the same as s/t + combining cedilla. In short, that would mean that when you write such a sequence you need to get a t with comma below as well for Romanian. But I'm not entirely sure how to do that yet... Probably a "calt" (contextual alternate) feature for the combining cedilla, but that's not applied by default in Pango unfortunately, we could misuse "ccmp" (glyph composition/decomposition) for it, but I'd like to see "calt" turned on once, and this way I can use the Romanian community to push Behdad :-) (In reply to comment #7) > > So, is this only for latn{ROM} and latn{MOL}, or are there other dialects that > need it as well? If you know any, the full list of languages that can be used > in OpenType is at http://www.microsoft.com/typography/otspec/languagetags.htm > so you can check if it's there. As you pointed out, Adobe's fonts do this for latn{MOL} as well. But Moldavians have their own academy (and country), so I don't know it this is appropriate or not. Wikipedia doesn't have a page on their alphabet. I guess Adobe is preparing Moldavians for an an anschluss ;) No other languages should need it, or if they do, Adobe ignores them for now... > > Similar issue, the s/t with cedilla code point should be canonically the same > as s/t + combining cedilla. In short, that would mean that when you write such > a sequence you need to get a t with comma below as well for Romanian. But I'm > not entirely sure how to do that yet... Probably a "calt" (contextual > alternate) feature for the combining cedilla, but that's not applied by > default in Pango unfortunately, we could misuse "ccmp" (glyph > composition/decomposition) for it, but I'd like to see "calt" turned on once, > and this way I can use the Romanian community to push Behdad :-) Can you provide a test string string for the combining business? Fontmatrix does not rely on pango for OpenType features, so I can test it there. U+015E-U+0163 (s and t with cedilla): Ş ş Ţ ţ U+0218-U+021B (s and t with comma below): Ş ş Ţ ţ S/T + U+0327 (combining cedilla): Ş ş Ţ ţ S/T + U+0326 (combining comma below): Ș ș Ț ț oops, second line was wrong. This is the correct list: U+015E-U+0163 (s and t with cedilla): Ş ş Ţ ţ U+0218-U+021B (s and t with comma below): Ș ș Ț ț S/T + U+0327 (combining cedilla): Ş ş Ţ ţ S/T + U+0326 (combining comma below): Ș ș Ț ț (In reply to comment #6) > But the Romanian locl issue has be debated to death on typophile forums, and the > opinion leaders there (fokes that run foundries) follow the Adobe standard, locl > included. For reference purposes, I'm linking to John Hudson's comment on typophile: [http://www.typophile.com/node/2764#comment-22015]. John is co-founder Tiro Typeworks, which jointly registered with Adobe the locl feature tag: [http://www.microsoft.com/typography/otspec/features_ko.htm#locl] There's apparently some issue with the Gagauz language as well. The wikipedia page uses comma below, but Unicode people don't seem to know what to use http://unicode.org/mail-arch/unicode-ml/y2002-m10/0020.html ... (In reply to comment #12) > There's apparently some issue with the Gagauz language as well. The wikipedia > page uses comma below, but Unicode people don't seem to know what to use > http://unicode.org/mail-arch/unicode-ml/y2002-m10/0020.html ... I check the OpenType spec: Gagauz has the language tag GAG. So, you can do something special for it, assuming you know what to do. So far I haven't seen any fonts that pay attention to it, so I'd say do nothing now. Like the email you pointed to said, let some Gagauz speak up before we decide anything for them. I think that on this subject, the current opinion of an Unicode guru such as Everson would be very valuable. (In reply to comment #14) > I think that on this subject, the current opinion of an Unicode guru such as > Everson would be very valuable. This is an OpenType issue, not an Unicode issue, but surely some expert opinion would not hurt. Everson's document about Gagauz is here: http://www.evertype.com/alphabets/gagauz.pdf Basically he says they use comma below, but some may prefer cedilla... Full quote: Gagauzi in Russia use Cyrillic; Gagauzi in Romania use Latin. Note that in Romania, Gagauz uses the characters S WITH COMMA BELOW and T WITH COMMA BELOW. In inferior Gagauz typography, the glyphs for these characters are sometimes drawn with CEDILLAs, but it is strongly recommended to avoid this practice. However, because Gagauz is a Turkic language, it may be left to the user to decide whether S WITH COMMA BELOW (as in Romanian) or S WITH CEDILLA (as in Turkish) is preferred. Btw, Everson is wrong about the use of quotes in Romanian (http://www.evertype.com/alphabets/romanian.pdf), so I wouldn't take him as the ultimate guru... Everson is a type designer, not just an Unicode expert. And what I meant was his opinion on the whole locl thing, not on Gagauz only OK, I quickly pushed the locl rules for S/T with cedilla in DejaVu before the freeze for the next release this weekend. So please test the latest snapshot at http://dejavu.sourceforge.net/snapshots/ and see if it works as expected (no need to test the condensed fonts, they'll get updated as well soon). Also test out if everything else like ligatures and mark placement (combining diacritic placement) still work for Romanian. I didn't make changes to the combining cedilla yet, I don't know how to properly tackle that yet. I had a quick look at the Sans. Results: • locl - OK • mark - HM (S/s cedilla both OK, T/t cedilla still shifted) The rest were tested with locl on: • salt - OK (checked J) • liga - OK (checked ff) • mark - NO (as expected) I also found a way to make the combining work, see next message. To make the combining (i) look good and (ii) work with locl you do not need a contextual substitution. A "ligature" is enough! See the Adobe feature file doc: you only need a type-4, not a type-6 substitution for this. I decided to put these in a rlig table for latn{MOL,ROM}. Of course, this rlig has to come before the locl, so locl can affect ti. I'm attaching some screenshots first and later some patches later (but there's some fiddle with those). Created attachment 312552 [details]
My version of Sans, no features enabled - all test glyphs use combining!
Created attachment 312553 [details]
The new ROM rlig feature activated. It looks much better than with mark on!
Created attachment 312554 [details]
Turning on both rlig and locl works as expected.
Created attachment 312555 [details]
Giant patch ball for Romanian comman and cedilla ligatures.
Comment on attachment 312555 [details]
Giant patch ball for Romanian comman and cedilla ligatures.
You could also add a breve and a, i circumflex to the rlig. But I don't know
the combining unicodes for those...
No, we prefer using anchors to place diacritics. We don't want them in ccmp, liga or rlig features. We've had plenty of discussions about this in the past, and even had a lot of these as ligatures in the past, but removed them because we thought it was a bad idea. With anchors it's just much more maintainable. The problem with the T/t with cedilla is just the missing cedilla anchor in the T and t glyphs. Easy to make it work, but that's counted as "feature" so something for after the release :-) How about using abvs and blws tables for these ligatures? Tag: 'abvs' Friendly name: Above-base Substitutions Registered by: Microsoft Function: Substitutes a ligature for a base glyph and mark that's above it. UI suggestion: This feature should be on by default. Tag: "blws" Friendly name: Below-base Substitutions Registered by: Microsoft Function: Produces ligatures that comprise of base glyph and below-base forms. UI suggestion: This feature should be on by default. Btw, your suggestion to use calt for changing presumably just the accent when it follows S or T in Romanian seem to be a bit different than what the standard says calt is for: Tag: 'calt' Friendly name: Contextual Alternates Registered by: Adobe Function: In specified situations, replaces default glyphs with alternate forms which provide better joining behavior. Used in script typefaces which are designed to have some or all of their glyphs join. Okay, I finished reading all the OpenType tag descriptions. The current spec doesn't have a type-6 table designated for the purpose you want (replacing diacritics). So, if you want to avoid making ligatures at all cost, Redhat would have to register a new OpenType tag... (In reply to comment #31) > So, if you want to avoid making ligatures at all cost, Redhat would > have to register a new OpenType tag... Ben is not @rh. He's one of the top DejaVu people, and was kind enough to get an account in Fedora bugzilla Yeah, I'm not even a fedora user :-) abvs and blws are used for Indic scripts only. I think I've mentioned before somewhere, if there's no good feature, we can misuse ccmp for it. It's common practice to misuse ccmp to replace i by dotless i before a diacritic above, and renderers also apply ccmp by default and can handle this. calt would still be more beautiful though (also for the dotless i situation), but it's not applied by default (even though the specs suggest otherwise). Well, I tried to add a calt table: I can enter what should be substituted, but when I try to click on the box where the replacement should go, FontForge segfaults. So, this feature will have to wait a little longer. Anyway, given the complete lack of support in commercial fonts for this feature, I don't think we'll see users asking about it anytime soon... Fixed upstream in 2.26. requested by Jens Petersen (#27995) My understanding is that this issue is resolved. Please reopen and assign if I am wrong. |