Red Hat Bugzilla – Bug 195436
JIS X 0213:2004 Font Coverage
Last modified: 2014-03-16 23:00:09 EDT
Description of problem:
JIS X 0213:2004 glyphs are required in order to support the new de facto
standard in Japan. It contains an additional 1100 glyphs.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
----- Additional Comments From firstname.lastname@example.org 2006-06-12 10:39 EST -------
I send an explanation of the problems on Japanese Language
If you have any question or comment, let me know.
Please note that my explanation may include mistakes,
because this problem is too complicated to understand.
Japanese Language Problem
1. JIS X 0213:2004 charactor set
We need JIS X 0213:2004 charactor set support on
Microsoft announced that the Meiryo font which supports
JIS X 0213:2004 charactor set will be the defalut font of Windows
Vista Japanese edition. In addtion, Meiryo font will be released
not only for Windows Vista but for Windows XP via Windows update.
JIS X 0213:2004 charactor set includes about 1100
additional glyph. Currently Sazanami font doesn't
support JIS X 0213:2004 additional glyph. So we
cannot use these charactors on OpenOffice.org or
But Microsoft announced that their IME will support
these additional charactors for input method at the same
time of Meiryo font release. So after Meiryo font is
released, there will be a lot of data which include
2. knowledge base of legacy encoding conversion
We need knowledge base of legacy encoding conversion
that is mentioned below, if RHEL don't support euc-jp
The reason why the customers that have already had
data cannot move to utf-8 encoding is that there is
no easy way to convert legacy data to utf-8.
In general, people believe that there are 4 type of
charactor encoding for Japanese language, sjis, euc-jp,
jis(iso-2022-jp) and utf-8. But this is not true or not
I explain below.
2.1 problem of sjis encoding
In MS-DOS era, each hardware vendor support their own
charactor set in addition to jis standard charactor set.
Because glyph are loaded on the ROM of each hardware.
These additional charactors are called 'machine dependent
When MS-DOS version 5 (so-called DOS/V) was released,
Microsoft support software glyph which means glyph
without ROM. And they decided DOS/V continues to support
IBM and NEC machine dependent charactors, because
these 2 vendors hardware were most popular. This, jis
charactor set + IBM and NEC machine dependent charactor
set are named as Windows Standard Charactorset by Microsoft.
And sjis encoding of Windows Standard Charactorset is
named as Code Page 932 by Microsoft. In general, sjis
encoding that people believe is CP932 indeed.
The problem-1 is that IBM's and NEC's set of glyph are
almost the same but have different codepoint. So Microsoft
assigned 2 different code point to 1 glyph for these machine
The first reason why the round trip conversion of
sjis->utf-8->sjis is not preserved is there. Some glyph
have 2 codepoint but when they are converted to utf-8,
they have the same codepoint and converted back to sjis,
one of them has the other codepoint.
2.2 problem of euc-jp
For a long time, euc-jp had supported only jis charactor set
and had not supported CP932 charactor set.
So unix users cannot read CP932's machine depended charactor.
But some application, such as mule(multi language version of
emacs), perl, ruby, nkf and firefox etc.. ,try to support
these charactor on their own way. The 2nd problem and most
difficult problem is this.
Since there was no formal document on supporting CP932 charactor
on euc-jp, each developer implements their own way at first.
But currently, we can classify these implementations into 3 types.
First one is so-called euc-jp-ms. This one is most widely used.
The problem is that there is no formal document on this
implementation and there are a few of application dependent
Second one is called as CP51932. There is the formal document
of this encode because Microsoft publish this document and
people can read it on web. But only a few application support
Third one is mozilla implementation. This implementation has
no compatibility with the above. And It has a big problem. When you
input CP932 charactor via mozilla application into blog or something
and homepage is written with euc-jp but webserver convert it to utf-8
for blog tools and read it with Internet Explorer, you cannot read it.
The problem-2 is all these documents are labeled as euc-jp and we
don't have easy way to identify which encoding is used. You have
to find which application is used to generate it.
The problem-3 is that there is no good tool for converting to utf-8
for these expanded euc-jp. The Legacy-Encoding-Project, that is
founded by Miracle Linux in Japan, try to solve a part of this problem.
2.3 ISO-2022-JP problem
MUA must use ISO-2022-JP encoding for e-mail exchange. But
a number of MUA support CP932 glyph. This encoding is sometimes
called as ISO-2022-JP-2. But this is not popular. Basically
labeled as ISO-2022-JP.
3 deadline of transition to utf-8
JIS X 0213:2004 charactor set will not be supported in euc-jp
or sjis. So the system that must support it have to switch
to utf-8 encoding.
As I mentioned section 1, Microsoft will release new IME when
Windows Vista is released. The system that recieve document from
internet is the first candidate. And Goverment announced that
people naming available kanji is expanded to some of JIS X 0213
glyph. So the data base system that must have people name must
switch to utf-8.
But currently, our customers are very afraid that there is no
knowledge and no good reference for switching to utf-8.
And this problem is too complicated to understand.
------- Additional Comments From email@example.com 2006-06-12 14:24 EST -------
Are these unicode characters? Are these characters in DejaVu?
------- Additional Comments From firstname.lastname@example.org 2006-06-12 14:44 EST -------
(NOTE: The pasted mail above is from Satoshi Oshima, on-site partner engineer
from Hitachi to Larry Troan, our Japanese partner manager.)
------- Additional Comments From email@example.com 2006-06-12 17:40 EST -------
AFAIK DejaVu doesn't contain any Asian characters.
JIS X 0213:2004 uses the Unicode private area. Does anyone know if if it
overlaps at all with GB 18030?
However if there are any japanese font designers looking to create a JIS X
0213:2004 I'm sure the DejaVy project will welcome them.
They already have setup all the infrastructure for a font project, and a single
font pool means you can reference existing glyphs instead of recreating them
from scratch (japanese probably needs latin for mixed text like everyone else)
(DejaVu has vietnamese support, don't know if it counts as Asian characters or not)
For the record yesterday the Japanese IPA fonts were released on a new less
restrictive but nevertheless not completely free license (no modifications
allowed). They do cover JIS X 0213:2004 however.
This bug hasn't been touched in awhile. Is there still no free font that meets
At this point, if any such fonts exist, people are welcome to submit them for
review. If not, it's not something we, as Fedora, are going to put specific
resources into creating.