From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98) Description of problem: ttmkfdir can't produce font.dir correctly with CJK fonts in ttfonts-ja ttfonts-ko. It may seems to me a ttmkfdir2 have some bug. I make a patch againt ttmkfdir2, It work correctly for me now. -------------8X-------------------- --- ttmkfdir2/encoding.l Thu Jan 13 07:17:46 2000 +++ ttmkfdir2.my/encoding.l Thu Sep 27 18:36:32 2001 @@ -62,7 +62,7 @@ i2 = std::strtol (startptr, &endptr, 0); - cur_enc->size = (startptr == endptr) ? i1 : (i1 << 8) + i2; + cur_enc->size = (startptr == endptr) ? i1 - 1: (i1 - 1 << 8) + i2; } <INSIDE_ENC_BLOCK>STARTMAPPING{WHITESPACES}unicode { --------------------- 8X----------------- And I have also found some encoding map bug "gbk-0.enc" in XFree86-4.x.x if you change the UNDIFINE section like following - UNDIFINE 0 0xFFFF + UNDIFINE 0 0xFEFE working correctly with above patched ttmkfdir. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.cp /usr/share/fonts/ko/TrueType/gulim.ttf . 2.ttmkfdir font.dir have incorrect information. ! ttmkfdir will not working with CJK fonts. Additional info:
*** Bug 54089 has been marked as a duplicate of this bug. ***
Leon, can you reproduce and confirm this problem?
above submitted patch by me is incorrect please change from ... + cur_enc->size = (startptr == endptr) ? i1 - 1: (i1 - 1 << 8) + i2; to... + cur_enc->size = (startptr == endptr) ? i1: (i1 - 1 << 8) + i2; It works for me
Bug exists in the current version of ttmkfdir for ttfonts-{ko,ja,zh,..}. ttfmkdir uses characters matching to recognize the encoding. Because CJK fonts (like gulim) usually don't have full set of chars, ttfmkdir will not recognize it. There are a option - --max-missing able to set max # of missing characters allowed per encoding (default is 5). If you put large enough, it able to recognize the encoding. yshao has done a patch on adding one more option to ttmkfdir, for setting max % of missing characters allowed per encoding. Because it is ratio based, so it will work better.
Ok. Currently Gulim.ttf (in ttfonts-ko) has full set of KSC5601(KSX1001) and does not have full set of iso10646. then ttmkfdir have to emit 'ksc5601.1987' (this case also true for Wadalab in ttfonts-ja for jisx-20??) But, ttmkfdir over estimate the number of needed chars for each Charsets and alway fails against CJK ttf. (above patch correct this problem)
I made a patch. ttmkfdir which applied it generate fonts.dir by checking CodePage from OS/2 table. As far as I checked it seems to be ok for all of CJK. any comments?
Created attachment 42311 [details] fix patch
Ok, I've made the change to gbk-0.enc in XFree86 4.2.0-6.21 in rawhide, and also this change too: cur_enc->size = (startptr == endptr) ? i1: (i1 - 1 << 8) + i2; Please note now that ttmkfdir is part of the XFree86-font-utils package and no longer in freetype package. tagoh - your patch is not applied yet. I'd like to see if the first two fixes fix this if possible. I'll need you guys to test this stuff out to ensure whatever encodings are needed are there, yada yada. Thanks.
If encodings.dir has jisx entries only, ttmkfdir has output jisx entries, but it hasn't jisx0201.1976-0. However when encodings.dir is made by mkfontdir on xfs initscript, it doesn't. I will make ttfonts-ja without fonts.{dir,scale} in the meantime.
BTW kochi fonts doesn't support microsoft-cp1251 encodings. but it is output. this problem is exactly same as Bug#57818.
Patch does not fix ttfonts-{zh_CN,zh_TW}. Fixing the forumla is good, but the main reason being some of the ttf we got have much more missing glyphs than it allows for single encoding. Tagoh's patch (codepage detection from OS/2 table) does the work. However, it will have some problems on newer and older encoding for single language. For example: gb18030.2000-1 has over 20000 glyphs gbk-0 has over 10000 glyphs gb2312.1980-0 has over 2000 glyphs If we assigning all three charset into a zh_CN ttf font which only has 2000 glyphs, then it is not sane. Same goes to kr encodings and probably jp. We at least do some checking for how many glyphs as well as codepage detection (tagoh's patch).
> Currently Gulim.ttf (in ttfonts-ko) has full set of KSC5601(KSX1001) > and does not have full set of iso10646. No single font has all the glyphs necessary for ISO 10646 (the number of glyphs necessary is larger than the number of code points in ISO 10646). Anyway, does gulim.ttf in ttfonts-ko have the full set of 11,172 precomposed Hangul syllables as defined in ISO 10646/Unicode. If it does, ttfmkdif should emit not only ksc5601.1987-0 entry but also entries for ksc5601.1992-3 (JOHAB as used on Solaris 8 and supported by Mozilla) the encoding for which is included in XF86 4.2 and iso10646-1. IMHO, sometimes having only a fraction of glyphs for a given character set should not prevent ttfmkdir from producing an entry for that character set for a given font. For instance, a truetype font with only Latin,Cyrillic, Greek alphabets and other characters(e.g. genuine - as opposed to crippled ASCII counterparts - punctuation marks) used in European writing systems is very useful for European users and should be presented as iso10646-1 font although it may have only 10% or less of glyphs for the full iso10646-1. My point is that ttfmkdir needs to have a kind of 'useful Unicode subset' concept and produce iso10646-1 entry for a font which has all the glyphs for that subset even though its repertoire is limited to a small fraction of iso10646-1.
I'm very sorry for my comment which turned out to be all but irrelevant especially because iso10646-1 patch has been already applied. I should have tried the newest ttmkfdir from rawhide. However, there's one problem that needs to be addressed. Even the newest ttmkfdir doesn't produce ksc5601.1992-3 entry for ttf fonts in ttfonts-ko package unless I use a very high value (~ 30000 ) for the maximum number of missing glyphs. After looking at ttmkfdir source a little, I found that the problem can be solved by chaning UNDEFINE line in ksc5601.1992-3.enc file from UNDEFINE 0x8431 0xF9FE to UNDEFINE 0 0xF9FE When I made and submitted the file to XF86, I didn't know the way ttmkfdir interprets UNDEFINE entry. With this change, ttmkfdir generates ksc5601.1992-3 entries as well as ksc5601.1987-0 and iso10646-1 entries for all three fonts in ttmkfdir-ko package. Note that there's no need to specify '-m' option because all three fonts in question have all the necessary glyphs for ksc5601.1992-3 (11,172 Hangul syllables , ~ 4800 Hanjas and ~950 symbols). I'll submit the patch to XF86.
Closing bug as CURRENTRELEASE