Red Hat Bugzilla – Bug 75829
needs to have UTF-8 locales for CJK
Last modified: 2008-05-01 11:38:04 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2b) Gecko/20021008
Description of problem:
It's very unfortunate that RH 8 doesn't have UTF-8 locales for CJK by default.
Even if there are some problems with CJK UTF-8 locale, it'd be nice
to give users a choice by adding them to /etc/X11/gdm/locale.aliases.
As Solaris/AIX do, RH8 can have multiple locales with different codesets
For instance, locale.aliases file for gdm can have two entries
There are a couple of things to change if CJK UTF-8 locale is added.
Firstly, XLC_LOCALE file for CJK UTF-8 locale has to be customized
per locale instead of using XLC_LOCALE for en_US.UTF-8. I'll
attach XLC_LOCALE files for ja_JP.UTF-8, ko_KR.UTF-8
and zh_CN.UTF-8 .
With this change, KDE3 works very well out of the box under ko_KR.UTF-8.
I'm less sure of Gnome2 because nautilus in Gnome2 is really sluggish on my
machine (taking up 400M memory). However, nautilus is as sluggish under
ko_KR.EUC-KR as under ko_KR.UTF-8 so that UTF-8 locale is not a culprit
for this problem.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. At gdm login screen, click on Language
2. Choose either Japanese or Korean locale
3. After log-in, check out the locale (e.g. echo $LANG)
Actual Results: The locale codeset is a legacy encoding instead of
Expected Results: gdm should offer Korean(UTF-8) and Japanese(UTF-8)
Korean(UTF-8) was the first language (along with English) for which
UTF-8 locale was supported on both Solaris 7/8 and AIX 4.x.
That's not without a reason. EUC-KR codeset for Korean has a very
critical problem and virtually every Korean wants to move onto UTF-8 locale
as soon as possible to solve that problem.
Created attachment 80224 [details]
XLC_LOCALE for ko_KR.UTF-8
Created attachment 80225 [details]
XLC_LOCALE for ja_JP.UTF-8
Created attachment 80226 [details]
locale.dir (X11) and locale.alias(gdm) : diff
Created attachment 80227 [details]
a sample gtkrc file for ko_KR.UTF-8 (perhaps not necessary for gtk2-based RH8)
It might be argued that UTF-8 locales for CJK are not yet "mature" enough
for the prime time in some aspects. Then, I can't help wondering why
RH decided to include SC (GB18030) while not including Korean(UTF-8)
and Japanese(UTF-8). Well, PRC requires that all OS' shipped in
PRC support GB 18030 and probably RedHat didn't have any other choice.
Whatever the rationale for including SC(GB18030) in RH was,
I think that's a strong indication that Korean(UTF-8) and Japanese(UTF-8)
could well be included as alternatives to Korean(EUC) and Japanse(EUC)
because GB 18030 is nothing other than another UTF (Unicode Transformation
Format) just like UTF-8.
It has to be noted that I'm NOT saying that UTF-8 has to be the *only* codeset
for Japanese and Korean. Rather, I think UTF-8 should be offered
to those who want to use it *in addition to* Korean and Japanese locales
based on legacy encodings (EUC-KR and EUC-JP).
If RedHat is so much concerned about potential problems of CJK UTF-8
locales, it can warn its usres that CJK UTF-8 locale support is experimental
and that legacy enoding based CJK locale is recommended in a prominent
place of the release notes.
For the record, I have been using ko_KR.UTF-8 for about half a year and
have yet to find a single aspect in which ko_KR.UTF-8 locale is worse than
ko_KR.EUC-KR. My comments in bug 75832 have a couple of compelling
reasons why Koreans need UTF-8 locale right now.
We don't (in general) think it is appropriate to offer users a choice
of encodings when logging in; encodings are something that should
"just work"; the user shouldn't have to think about them.
We didn't actually offer a choice between GB18030 and GB2312 in the
user interface in 8.0 - if you picked simplified chinese, you got
We are hoping that soon we can just use UTF-8 locales for CJK as we
do for all other languages; at that point, we won't offer a choice
between UTF-8 and the traditional encoding; when you select the
language in gdm, redhat-config-languages, or the installer, you'll
just get The UTF-8 locale.
To pick alternate encodings, you can put "LANG=ko_KR.UTF-8" in
your ~/.i18n file.
I'm assigning this bug to XFree86 since the XLC_LOCALE patches are
the substantive part of the attachments.. it would be good to submit
them upstream as well, however.
[Note that locale_config is an obsolete configuration tool, replaced
by redhat-config-languages not Red Hat's locale configuration :-)]
> We don't (in general) think it is appropriate to offer users a choice
> of encodings when logging in; encodings are something that should
> "just work"; the user shouldn't have to think about them
I agree that in general that's a sound policy. The keyword here
is 'in general', though as I expect you to agree :-) In this particular
case of CJK UTF-8 locale, it might be necessary to do things
a bit differently as an interim measure.
> We didn't actually offer a choice between GB18030 and GB2312 in the
> user interface in 8.0 - if you picked simplified chinese, you got
Sure you don't. I guess you can't even if you want to because PRC government
let you sell RH 8.0 in China otherwise :-) My point of mentioning SC (GB18030)
was , if it's not clear, that if SC(GB18030) is supported, there should be
little reason Korean(UTF-8) and Japanese(UTF-8) cannot be offered because
GB18030 and UTF-8 are just two different UTF's of a single coded character
> We are hoping that soon we can just use UTF-8 locales for CJK as we
> do for all other languages; at that point, we won't offer a choice
> between UTF-8 and the traditional encoding; when you select the
> language in gdm, redhat-config-languages, or the installer, you'll
> just get The UTF-8 locale.
Yup, that's definitely the way to go. Good bye to old encodings forever !!!
And, Linux will be on par with MS Windows 2k/XP in Korean support.
> I'm assigning this bug to XFree86
Thanks for assigning it to a more appropriate component. I had a hard
time picking a component for this bug. There's a locale-ja, but
there's no component 'locale' and ended up with 'locale-config'.
> since the XLC_LOCALE patches are
> the substantive part of the attachments.. it would be good to submit
> them upstream as well, however.
I'll do. It's very frustrating to wait several months for XF86 team to accept
my patches. ;-). I hope this patch will get accepted in a timely manner,
but in the meantime it'd be nice if RedHat can just go ahead with it.
>> since the XLC_LOCALE patches are
>> the substantive part of the attachments.. it would be good to submit
>> them upstream as well, however.
> I'll do.
I've submitted my patch and was given seq. 5421.
In addition to Ami (patched for UTF-8), there's a gtk input module
for Korean, 'imhangul' by CHOI Hwan-jin (http://imhangul.kldp.net/). This
fully supports UTF-8 input for Korean and works really great !
Ami (with UTF-8 patch) is still necessary for non-gtk applications, though.
Anyway, the existence of 'imhangul' makes my case for Korean UTF-8
locale even stronger.
429. Add ko_KR.UTF-8 and ja_JP.UTF-8 XLC_LOCALE files (#5421, Jungshik Shin).
Your patches were accepted into XFree86 CVS a while back. I've got CVS
builds going into rawhide in RPM's soon, so I'm closing this as resolved
When I submitted my patch to XF86, I forgot to include XI18NOBJS files for
ko_KR.UTF-8 and ja_JP.UTF-8 locales.(I had had them on my machine)
They're identical to that for en_US.UTF-8.
However, without their presence in ko_KR.UTF-8 and ja_JP.UTF-8 directories,
X11 lib. complains that two locales are not supported by Xlib.
Perhaps, you've already taken care of it.. Anyway, I'm gonna send a new patch
to XFree86 to include them.
For the imhangul modules that you have mentioned, it works well in GNOME2
however, it doesn't work with GNOME1 applications such as XCHAT1.8 (1.9 is
developer version) and XMMS and so on..
That is why it hasn't been implemented.