Bug 195436

Summary: JIS X 0213:2004 Font Coverage
Product: [Fedora] Fedora Reporter: Jong Bae KO <jko>
Component: distributionAssignee: Akira TAGOH <tagoh>
Status: CLOSED CANTFIX QA Contact: Bill Nottingham <notting>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: eng-i18n-bugs, jonstanley, ltroan, mmatsuya, nicolas.mailhot, petersen, rvokal
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-12 20:52:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 195437    
Bug Blocks: 150223    

Description Jong Bae KO 2006-06-15 07:49:23 UTC
Description of problem:
JIS X 0213:2004 glyphs are required in order to support the new de facto
standard in Japan.  It contains an additional 1100 glyphs.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Jong Bae KO 2006-06-15 08:01:03 UTC
----- Additional Comments From wtogami  2006-06-12 10:39 EST -------
Hi, Larry.

 I send an explanation of the problems on Japanese Language
on Linux. 

 If you have any question or comment, let me know.

 Please note that my explanation may include mistakes,
because this problem is too complicated to understand.

Regards,

Satoshi

----------------------------------------------------------

Japanese Language Problem

1. JIS X 0213:2004 charactor set

 We need JIS X 0213:2004 charactor set support on
RHEL3/4/5.

 Microsoft announced that the Meiryo font which supports
JIS X 0213:2004 charactor set will be the defalut font of Windows
Vista Japanese edition. In addtion, Meiryo font will be released 
not only for Windows Vista but for Windows XP via Windows update.

 JIS X 0213:2004 charactor set includes about 1100
additional glyph. Currently Sazanami font doesn't 
support JIS X 0213:2004 additional glyph. So we
cannot use these charactors on OpenOffice.org or
Firefox etc...

But Microsoft announced that their IME will support
these additional charactors for input method at the same
time of Meiryo font release. So after Meiryo font is
released, there will be a lot of data which include
them.

2. knowledge base of legacy encoding conversion

 We need knowledge base of legacy encoding conversion
that is mentioned below, if RHEL don't support euc-jp 
and sjis.

 The reason why the customers that have already had 
data cannot move to utf-8 encoding is that there is 
no easy way to convert legacy data to utf-8. 

 In general, people believe that there are 4 type of
charactor encoding for Japanese language, sjis, euc-jp,
jis(iso-2022-jp) and utf-8. But this is not true or not
enough.

I explain below.

2.1 problem of sjis encoding

 In MS-DOS era, each hardware vendor support their own
charactor set in addition to jis standard charactor set.
Because glyph are loaded on the ROM of each hardware.
These additional charactors are called 'machine dependent
charactor'.

 When MS-DOS version 5 (so-called DOS/V) was released,
Microsoft support software glyph which means glyph 
without ROM. And they decided DOS/V continues to support 
IBM and NEC machine dependent charactors, because
these 2 vendors hardware were most popular. This, jis
charactor set + IBM and NEC machine dependent charactor
set are named as Windows Standard Charactorset by Microsoft.
And sjis encoding of Windows Standard Charactorset is
named as Code Page 932 by Microsoft. In general, sjis 
encoding that people believe is CP932 indeed.

 The problem-1 is that IBM's and NEC's set of glyph are 
almost the same but have different codepoint. So Microsoft
assigned 2 different code point to 1 glyph for these machine
dependent glyph.

 The first reason why the round trip conversion of 
sjis->utf-8->sjis is not preserved is there. Some glyph
have 2 codepoint but when they are converted to utf-8,
they have the same codepoint and converted back to sjis,
one of them has the other codepoint.

2.2 problem of euc-jp

 For a long time, euc-jp had supported only jis charactor set
and had not supported CP932 charactor set.
So unix users cannot read CP932's machine depended charactor.
But some application, such as mule(multi language version of
emacs), perl, ruby, nkf and firefox etc.. ,try to support 
these charactor on their own way. The 2nd problem and most
difficult problem is this.

 Since there was no formal document on supporting CP932 charactor
on euc-jp, each developer implements their own way at first.
But currently, we can classify these implementations into 3 types.

 First one is so-called euc-jp-ms. This one is most widely used.
The problem is that there is no formal document on this 
implementation and there are a few of application dependent 
codepoint.

 Second one is called as CP51932. There is the formal document
of this encode because Microsoft publish this document and
people can read it on web. But only a few application support
CP51932.

 Third one is mozilla implementation. This implementation has
no compatibility with the above. And It has a big problem. When you
input CP932 charactor via mozilla application into blog or something
and homepage is written with euc-jp but webserver convert it to utf-8 
for  blog tools and read it with Internet Explorer, you cannot read it.

The problem-2 is all these documents are labeled as euc-jp and we
don't have easy way to identify which encoding is used. You have
to find which application is used to generate it.

The problem-3 is that there is no good tool for converting to utf-8
for these expanded euc-jp. The Legacy-Encoding-Project, that is 
founded by Miracle Linux in Japan, try to solve a part of this problem.

2.3 ISO-2022-JP problem

 MUA must use ISO-2022-JP encoding for e-mail exchange. But 
a number of MUA support CP932 glyph. This encoding is sometimes
called as ISO-2022-JP-2. But this is not popular. Basically 
labeled as ISO-2022-JP.


3 deadline of transition to utf-8

JIS X 0213:2004 charactor set will not be supported in euc-jp
or sjis. So the system that must support it have to switch
to utf-8 encoding.

As I mentioned section 1, Microsoft will release new IME when
Windows Vista is released. The system that recieve document from
internet is the first candidate. And Goverment announced that
people naming available kanji is expanded to some of JIS X 0213
glyph. So the data base system that must have people name must
switch to utf-8.

But currently, our customers are very afraid that there is no
knowledge and no good reference for switching to utf-8. 
And this problem is too complicated to understand. 


Comment 2 Jong Bae KO 2006-06-15 08:11:23 UTC
------- Additional Comments From notting  2006-06-12 14:24 EST -------
Are these unicode characters? Are these characters in DejaVu?

Comment 3 Jong Bae KO 2006-06-15 08:13:14 UTC
------- Additional Comments From wtogami  2006-06-12 14:44 EST -------
(NOTE: The pasted mail above is from Satoshi Oshima, on-site partner engineer
from Hitachi to Larry Troan, our Japanese partner manager.)

Comment 4 Jong Bae KO 2006-06-15 08:13:48 UTC
------- Additional Comments From wtogami  2006-06-12 17:40 EST -------
AFAIK DejaVu doesn't contain any Asian characters.

JIS X 0213:2004 uses the Unicode private area.  Does anyone know if if it
overlaps at all with GB 18030?

Comment 5 Nicolas Mailhot 2006-07-02 10:25:07 UTC
However if there are any japanese font designers looking to create a JIS X
0213:2004 I'm sure the DejaVy project will welcome them.

They already have setup all the infrastructure for a font project, and a single
font pool means you can reference existing glyphs instead of recreating them
from scratch (japanese probably needs latin for mixed text like everyone else)

(DejaVu has vietnamese support, don't know if it counts as Asian characters or not)

Comment 6 Jens Petersen 2007-10-02 03:58:12 UTC
For the record yesterday the Japanese IPA fonts were released on a new less
restrictive but nevertheless not completely free license (no modifications
allowed).  They do cover JIS X 0213:2004 however.

http://ossipedia.ipa.go.jp/ipafont/ (Japanese)

Comment 7 Jon Stanley 2008-05-12 20:07:16 UTC
This bug hasn't been touched in awhile.  Is there still no free font that meets
these requirements?

Comment 8 Bill Nottingham 2008-05-12 20:52:20 UTC
At this point, if any such fonts exist, people are welcome to submit them for
review. If not, it's not something we, as Fedora, are going to put specific
resources into creating.