Bug 195436
Summary: | JIS X 0213:2004 Font Coverage | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jong Bae KO <jko> |
Component: | distribution | Assignee: | Akira TAGOH <tagoh> |
Status: | CLOSED CANTFIX | QA Contact: | Bill Nottingham <notting> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | rawhide | CC: | eng-i18n-bugs, jonstanley, ltroan, mmatsuya, nicolas.mailhot, petersen, rvokal |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-05-12 20:52:20 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 195437 | ||
Bug Blocks: | 150223 |
Description
Jong Bae KO
2006-06-15 07:49:23 UTC
----- Additional Comments From wtogami 2006-06-12 10:39 EST ------- Hi, Larry. I send an explanation of the problems on Japanese Language on Linux. If you have any question or comment, let me know. Please note that my explanation may include mistakes, because this problem is too complicated to understand. Regards, Satoshi ---------------------------------------------------------- Japanese Language Problem 1. JIS X 0213:2004 charactor set We need JIS X 0213:2004 charactor set support on RHEL3/4/5. Microsoft announced that the Meiryo font which supports JIS X 0213:2004 charactor set will be the defalut font of Windows Vista Japanese edition. In addtion, Meiryo font will be released not only for Windows Vista but for Windows XP via Windows update. JIS X 0213:2004 charactor set includes about 1100 additional glyph. Currently Sazanami font doesn't support JIS X 0213:2004 additional glyph. So we cannot use these charactors on OpenOffice.org or Firefox etc... But Microsoft announced that their IME will support these additional charactors for input method at the same time of Meiryo font release. So after Meiryo font is released, there will be a lot of data which include them. 2. knowledge base of legacy encoding conversion We need knowledge base of legacy encoding conversion that is mentioned below, if RHEL don't support euc-jp and sjis. The reason why the customers that have already had data cannot move to utf-8 encoding is that there is no easy way to convert legacy data to utf-8. In general, people believe that there are 4 type of charactor encoding for Japanese language, sjis, euc-jp, jis(iso-2022-jp) and utf-8. But this is not true or not enough. I explain below. 2.1 problem of sjis encoding In MS-DOS era, each hardware vendor support their own charactor set in addition to jis standard charactor set. Because glyph are loaded on the ROM of each hardware. These additional charactors are called 'machine dependent charactor'. When MS-DOS version 5 (so-called DOS/V) was released, Microsoft support software glyph which means glyph without ROM. And they decided DOS/V continues to support IBM and NEC machine dependent charactors, because these 2 vendors hardware were most popular. This, jis charactor set + IBM and NEC machine dependent charactor set are named as Windows Standard Charactorset by Microsoft. And sjis encoding of Windows Standard Charactorset is named as Code Page 932 by Microsoft. In general, sjis encoding that people believe is CP932 indeed. The problem-1 is that IBM's and NEC's set of glyph are almost the same but have different codepoint. So Microsoft assigned 2 different code point to 1 glyph for these machine dependent glyph. The first reason why the round trip conversion of sjis->utf-8->sjis is not preserved is there. Some glyph have 2 codepoint but when they are converted to utf-8, they have the same codepoint and converted back to sjis, one of them has the other codepoint. 2.2 problem of euc-jp For a long time, euc-jp had supported only jis charactor set and had not supported CP932 charactor set. So unix users cannot read CP932's machine depended charactor. But some application, such as mule(multi language version of emacs), perl, ruby, nkf and firefox etc.. ,try to support these charactor on their own way. The 2nd problem and most difficult problem is this. Since there was no formal document on supporting CP932 charactor on euc-jp, each developer implements their own way at first. But currently, we can classify these implementations into 3 types. First one is so-called euc-jp-ms. This one is most widely used. The problem is that there is no formal document on this implementation and there are a few of application dependent codepoint. Second one is called as CP51932. There is the formal document of this encode because Microsoft publish this document and people can read it on web. But only a few application support CP51932. Third one is mozilla implementation. This implementation has no compatibility with the above. And It has a big problem. When you input CP932 charactor via mozilla application into blog or something and homepage is written with euc-jp but webserver convert it to utf-8 for blog tools and read it with Internet Explorer, you cannot read it. The problem-2 is all these documents are labeled as euc-jp and we don't have easy way to identify which encoding is used. You have to find which application is used to generate it. The problem-3 is that there is no good tool for converting to utf-8 for these expanded euc-jp. The Legacy-Encoding-Project, that is founded by Miracle Linux in Japan, try to solve a part of this problem. 2.3 ISO-2022-JP problem MUA must use ISO-2022-JP encoding for e-mail exchange. But a number of MUA support CP932 glyph. This encoding is sometimes called as ISO-2022-JP-2. But this is not popular. Basically labeled as ISO-2022-JP. 3 deadline of transition to utf-8 JIS X 0213:2004 charactor set will not be supported in euc-jp or sjis. So the system that must support it have to switch to utf-8 encoding. As I mentioned section 1, Microsoft will release new IME when Windows Vista is released. The system that recieve document from internet is the first candidate. And Goverment announced that people naming available kanji is expanded to some of JIS X 0213 glyph. So the data base system that must have people name must switch to utf-8. But currently, our customers are very afraid that there is no knowledge and no good reference for switching to utf-8. And this problem is too complicated to understand. ------- Additional Comments From notting 2006-06-12 14:24 EST ------- Are these unicode characters? Are these characters in DejaVu? ------- Additional Comments From wtogami 2006-06-12 14:44 EST ------- (NOTE: The pasted mail above is from Satoshi Oshima, on-site partner engineer from Hitachi to Larry Troan, our Japanese partner manager.) ------- Additional Comments From wtogami 2006-06-12 17:40 EST ------- AFAIK DejaVu doesn't contain any Asian characters. JIS X 0213:2004 uses the Unicode private area. Does anyone know if if it overlaps at all with GB 18030? However if there are any japanese font designers looking to create a JIS X 0213:2004 I'm sure the DejaVy project will welcome them. They already have setup all the infrastructure for a font project, and a single font pool means you can reference existing glyphs instead of recreating them from scratch (japanese probably needs latin for mixed text like everyone else) (DejaVu has vietnamese support, don't know if it counts as Asian characters or not) For the record yesterday the Japanese IPA fonts were released on a new less restrictive but nevertheless not completely free license (no modifications allowed). They do cover JIS X 0213:2004 however. http://ossipedia.ipa.go.jp/ipafont/ (Japanese) This bug hasn't been touched in awhile. Is there still no free font that meets these requirements? At this point, if any such fonts exist, people are welcome to submit them for review. If not, it's not something we, as Fedora, are going to put specific resources into creating. |