54087 – ttmkfdir error with CJK fonts

Bug 54087 - ttmkfdir error with CJK fonts

Summary: ttmkfdir error with CJK fonts

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	freetype
Sub Component:
Version:	9
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Carl Worth (Ampere)
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	54089 (view as bug list)
Depends On:
Blocks:	57818 61769
TreeView+	depends on / blocked

Reported:	2001-09-27 10:07 UTC by Won-kyu Park
Modified:	2013-01-14 14:54 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2003-08-30 10:38:51 UTC
Embargoed:

Attachments	(Terms of Use)
fix patch (1.71 KB, patch) 2002-01-11 12:10 UTC, Akira TAGOH	no flags	Details \| Diff
View All

Description Won-kyu Park 2001-09-27 10:07:25 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)

Description of problem:
ttmkfdir can't produce font.dir correctly with CJK fonts in ttfonts-ja 
ttfonts-ko.

It may seems to me a ttmkfdir2 have some bug.

I make a patch againt ttmkfdir2, It work correctly for me now.

-------------8X--------------------
--- ttmkfdir2/encoding.l	Thu Jan 13 07:17:46 2000
+++ ttmkfdir2.my/encoding.l	Thu Sep 27 18:36:32 2001
@@ -62,7 +62,7 @@
     
     i2 = std::strtol (startptr, &endptr, 0);
 
-    cur_enc->size = (startptr == endptr) ? i1 : (i1 << 8) + i2;
+    cur_enc->size = (startptr == endptr) ? i1 - 1: (i1 - 1 << 8) + i2;
 }
 
 <INSIDE_ENC_BLOCK>STARTMAPPING{WHITESPACES}unicode {
--------------------- 8X-----------------

And I have also found some encoding map bug "gbk-0.enc"
in XFree86-4.x.x

if you change the UNDIFINE section like following

- UNDIFINE 0 0xFFFF
+ UNDIFINE 0 0xFEFE

working correctly with above patched ttmkfdir.



Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.cp /usr/share/fonts/ko/TrueType/gulim.ttf .
2.ttmkfdir

font.dir have incorrect information. !

ttmkfdir will not working with CJK fonts.
	

Additional info:

Comment 1 Won-kyu Park 2001-09-27 10:39:44 UTC

*** Bug 54089 has been marked as a duplicate of this bug. ***

Comment 2 Mike A. Harris 2001-10-04 12:08:42 UTC

Leon, can you reproduce and confirm this problem?

Comment 3 Won-kyu Park 2001-10-04 16:14:29 UTC

above submitted patch by me is incorrect

please change from ...

+    cur_enc->size = (startptr == endptr) ? i1 - 1: (i1 - 1 << 8) + i2;

to...

+    cur_enc->size = (startptr == endptr) ? i1: (i1 - 1 << 8) + i2;



It works for me

Comment 4 Leon Ho 2001-10-05 00:37:22 UTC

Bug exists in the current version of ttmkfdir for ttfonts-{ko,ja,zh,..}.

ttfmkdir uses characters matching to recognize the encoding. Because CJK fonts 
(like gulim) usually don't have full set of chars, ttfmkdir will not recognize 
it. There are a option - --max-missing able to set max # of missing characters 
allowed per encoding (default is 5). If you put large enough, it able to 
recognize the encoding. 

yshao has done a patch on adding one more option to ttmkfdir, for setting max 
% of missing characters allowed per encoding. Because it is ratio based, so it 
will work better.

Comment 5 Won-kyu Park 2001-10-05 07:35:07 UTC

Ok.

Currently Gulim.ttf (in ttfonts-ko) has full set of KSC5601(KSX1001)

and does not have full set of iso10646.

then ttmkfdir have to emit 'ksc5601.1987' (this case also true for Wadalab in
ttfonts-ja for jisx-20??)

But, ttmkfdir over estimate the number of needed chars for each Charsets
and alway fails against CJK ttf.
(above patch correct this problem)

Comment 6 Akira TAGOH 2002-01-11 12:09:41 UTC

I made a patch. ttmkfdir which applied it generate fonts.dir by checking
CodePage from OS/2 table. As far as I checked it seems to be ok for all of CJK.
any comments?

Comment 7 Akira TAGOH 2002-01-11 12:10:26 UTC

Created attachment 42311 [details]
fix patch

Comment 8 Mike A. Harris 2002-02-05 18:03:04 UTC

Ok, I've made the change to gbk-0.enc in XFree86 4.2.0-6.21 in rawhide,
and also this change too:

cur_enc->size = (startptr == endptr) ? i1: (i1 - 1 << 8) + i2;


Please note now that ttmkfdir is part of the XFree86-font-utils package
and no longer in freetype package.

tagoh - your patch is not applied yet.  I'd like to see if the first
two fixes fix this if possible.  I'll need you guys to test this stuff out
to ensure whatever encodings are needed are there, yada yada.

Thanks.

Comment 9 Akira TAGOH 2002-02-12 10:04:00 UTC

If encodings.dir has jisx entries only, ttmkfdir has output jisx entries, but it
hasn't jisx0201.1976-0.
However when encodings.dir is made by mkfontdir on xfs initscript, it doesn't.
I will make ttfonts-ja without fonts.{dir,scale} in the meantime.

Comment 10 Akira TAGOH 2002-02-12 10:24:02 UTC

BTW kochi fonts doesn't support microsoft-cp1251 encodings. but it is output.
this problem is exactly same as Bug#57818.

Comment 11 Leon Ho 2002-02-13 23:54:39 UTC

Patch does not fix ttfonts-{zh_CN,zh_TW}. Fixing the forumla is good, but the
main reason being some of the ttf we got have much more missing glyphs than it
allows for single encoding.


Tagoh's patch (codepage detection from OS/2 table) does the work. However, it
will have some problems on newer and older encoding for single language.

For example:
gb18030.2000-1 has over 20000 glyphs
gbk-0 has over 10000 glyphs
gb2312.1980-0 has over 2000 glyphs

If we assigning all three charset into a zh_CN ttf font which only has 2000
glyphs, then it is not sane.

Same goes to kr encodings and probably jp. We at least do some checking for how
many glyphs as well as codepage detection (tagoh's patch).

Comment 12 Jungshik Shin 2002-03-23 01:02:08 UTC

> Currently Gulim.ttf (in ttfonts-ko) has full set of KSC5601(KSX1001)
> and does not have full set of iso10646.

  No single font has all the glyphs necessary for ISO 10646
(the number of glyphs necessary is larger than the number of code
points in ISO 10646). Anyway, does gulim.ttf in ttfonts-ko 
have the full set of 11,172 precomposed Hangul syllables as
defined in ISO 10646/Unicode. If it does, ttfmkdif
should emit not only ksc5601.1987-0 entry but also
entries for ksc5601.1992-3 (JOHAB as used on Solaris 8 and 
supported by Mozilla) the encoding for which is included 
in XF86 4.2 and iso10646-1.

  IMHO, sometimes having only a fraction of glyphs for a given
character set should not prevent ttfmkdir from producing an
entry for that character set for a given font. For instance,
a truetype font with only Latin,Cyrillic, Greek alphabets
and other characters(e.g. genuine - as opposed to crippled
ASCII counterparts - punctuation marks) used in European writing systems 
is very useful for European users and should be 
presented as iso10646-1 font although it may have only 10% or less
of glyphs for the full iso10646-1.  My point is that
ttfmkdir needs to have a kind of 'useful Unicode subset' concept
and produce iso10646-1 entry for a font which has all the
glyphs for that subset even though its repertoire is limited
to a small fraction of iso10646-1.

Comment 13 Jungshik Shin 2002-03-24 09:21:23 UTC

I'm very sorry for my comment which turned out to be all but 
irrelevant especially because iso10646-1 patch has  been already 
applied.
I should have tried the newest ttmkfdir from rawhide.
However, there's one problem that needs to be addressed. 
Even the newest ttmkfdir doesn't produce ksc5601.1992-3 entry for
ttf fonts in ttfonts-ko package
unless I use a very high value (~ 30000 ) for the maximum
number of missing glyphs. After looking at ttmkfdir source a little,
I found that the problem can be solved by chaning UNDEFINE line
in ksc5601.1992-3.enc file 
from

UNDEFINE 0x8431 0xF9FE

to 

UNDEFINE 0   0xF9FE

When I made and submitted the file to XF86, I didn't know 
the way ttmkfdir interprets UNDEFINE entry.  With this 
change, ttmkfdir generates ksc5601.1992-3 entries as
well as ksc5601.1987-0 and iso10646-1 entries for all three
fonts in ttmkfdir-ko package. Note that there's no need
to specify '-m' option because all three fonts in
question have all the necessary glyphs for ksc5601.1992-3
(11,172 Hangul syllables , ~ 4800 Hanjas and ~950 symbols).
 
I'll submit the patch to XF86.

Comment 14 Mike A. Harris 2003-08-30 10:38:51 UTC

Closing bug as CURRENTRELEASE

Note You need to log in before you can comment on or make changes to this bug.