75829 – needs to have UTF-8 locales for CJK

Bug 75829 - needs to have UTF-8 locales for CJK

Summary: needs to have UTF-8 locales for CJK

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	XFree86
Sub Component:
Version:	8.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Mike A. Harris
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-10-13 12:24 UTC by Jungshik Shin
Modified:	2008-05-01 15:38 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2002-10-15 14:24:29 UTC
Embargoed:

Attachments	(Terms of Use)
XLC_LOCALE for ko_KR.UTF-8 (2.26 KB, text/plain) 2002-10-13 12:35 UTC, Jungshik Shin	no flags	Details
XLC_LOCALE for ja_JP.UTF-8 (2.26 KB, text/plain) 2002-10-13 12:36 UTC, Jungshik Shin	no flags	Details
locale.dir (X11) and locale.alias(gdm) : diff (1.17 KB, patch) 2002-10-13 12:42 UTC, Jungshik Shin	no flags	Details \| Diff
a sample gtkrc file for ko_KR.UTF-8 (perhaps not necessary for gtk2-based RH8) (266 bytes, text/plain) 2002-10-13 12:45 UTC, Jungshik Shin	no flags	Details
View All

Description Jungshik Shin 2002-10-13 12:24:51 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2b) Gecko/20021008

Description of problem:
It's very unfortunate that RH 8 doesn't have UTF-8 locales for CJK by default.
Even if there are some problems with CJK UTF-8 locale, it'd be nice
to give users a choice by adding them to /etc/X11/gdm/locale.aliases.
As Solaris/AIX do, RH8 can have multiple locales with different codesets
for CJK.
 For instance, locale.aliases file for gdm can have two entries
for Korean:

   Korean(EUC)  ko_KR
   Korean(UTF-8)  ko_KR.UTF-8

There are a couple of things to change if CJK UTF-8 locale is added. 
Firstly, XLC_LOCALE file for CJK UTF-8 locale has to be customized
per locale instead of using XLC_LOCALE for en_US.UTF-8. I'll
attach XLC_LOCALE files for ja_JP.UTF-8, ko_KR.UTF-8
and zh_CN.UTF-8 . 

 
With this change, KDE3 works very well out of the box under ko_KR.UTF-8.
 I'm less sure of Gnome2 because nautilus in Gnome2 is really sluggish on my 
machine (taking up 400M memory). However, nautilus is as sluggish under
ko_KR.EUC-KR as under ko_KR.UTF-8 so that UTF-8 locale is not a culprit
for this problem. 



Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. At gdm login screen, click on Language 
2. Choose either Japanese or Korean locale
3. After log-in, check out the locale (e.g. echo $LANG)
	

Actual Results:  The locale  codeset is a legacy encoding instead of
UTF-8

Expected Results:  gdm should offer Korean(UTF-8) and Japanese(UTF-8)

Additional info:

Korean(UTF-8) was  the first language (along with English) for which
UTF-8 locale was supported on both Solaris 7/8 and AIX 4.x.
That's not without a reason. EUC-KR codeset for Korean has a very
critical problem and virtually every Korean wants to move onto UTF-8 locale
as soon as possible to solve that problem.

Comment 1 Jungshik Shin 2002-10-13 12:35:11 UTC

Created attachment 80224 [details]
XLC_LOCALE for ko_KR.UTF-8

Comment 2 Jungshik Shin 2002-10-13 12:36:07 UTC

Created attachment 80225 [details]
XLC_LOCALE for ja_JP.UTF-8

Comment 3 Jungshik Shin 2002-10-13 12:42:56 UTC

Created attachment 80226 [details]
locale.dir (X11) and locale.alias(gdm) : diff

Comment 4 Jungshik Shin 2002-10-13 12:45:51 UTC

Created attachment 80227 [details]
a sample gtkrc file for ko_KR.UTF-8 (perhaps not necessary for gtk2-based RH8)

Comment 5 Jungshik Shin 2002-10-14 06:04:46 UTC

It might be argued that UTF-8 locales for CJK are not yet "mature" enough 
for the prime time in some aspects. Then, I can't help wondering why
RH decided to include SC (GB18030) while not including Korean(UTF-8)
and Japanese(UTF-8). Well, PRC requires that all OS' shipped in
PRC support GB 18030 and probably RedHat didn't have any other choice.
Whatever the rationale  for including SC(GB18030) in RH was,
I think that's a strong indication that Korean(UTF-8) and Japanese(UTF-8)
could well be included as alternatives to Korean(EUC) and Japanse(EUC)
because GB 18030 is nothing other than another UTF (Unicode Transformation
Format) just like UTF-8. 

It has to be noted that I'm NOT saying that UTF-8 has to be the *only* codeset
for Japanese and Korean. Rather, I think UTF-8 should be offered
to those who want to use it  *in addition to* Korean and Japanese locales
based on legacy encodings (EUC-KR and EUC-JP).

If RedHat is so much concerned about potential problems of CJK UTF-8
locales, it can warn its usres that CJK UTF-8 locale support is experimental
and that legacy enoding based CJK locale is recommended  in a prominent
place of the release notes. 
 
For the record, I have been using ko_KR.UTF-8 for about half a year and
have yet to find a single aspect in which ko_KR.UTF-8 locale is worse than
ko_KR.EUC-KR. My comments in bug 75832 have a couple of compelling
reasons why Koreans need UTF-8 locale right now.

Comment 6 Owen Taylor 2002-10-14 16:40:49 UTC

We don't (in general) think it is appropriate to offer users a choice
of encodings when logging in; encodings are something that should
"just work"; the user shouldn't have to think about them.

We didn't actually offer a choice between GB18030 and GB2312 in the
user interface in 8.0 - if you picked simplified chinese, you got
GB18030.

We are hoping that soon we can just use UTF-8 locales for CJK as we
do for all other languages; at that point, we won't offer a choice
between UTF-8 and the traditional encoding; when you select the 
language in gdm, redhat-config-languages, or the installer, you'll
just get The UTF-8 locale.

To pick alternate encodings, you can put "LANG=ko_KR.UTF-8" in
your ~/.i18n file.

I'm assigning this bug to XFree86 since the XLC_LOCALE patches are
the substantive part of the attachments.. it would be good to submit
them upstream as well, however.

[Note that locale_config is an obsolete configuration tool, replaced
by redhat-config-languages not Red Hat's locale configuration :-)]

Comment 7 Jungshik Shin 2002-10-14 17:04:06 UTC

> We don't (in general) think it is appropriate to offer users a choice
> of encodings when logging in; encodings are something that should
> "just work"; the user shouldn't have to think about them

  I agree that in general that's a sound policy. The keyword here
is 'in general', though  as I expect  you  to agree :-) In this particular
case of CJK UTF-8 locale, it might be necessary to do things
a bit differently as an interim measure. 

> We didn't actually offer a choice between GB18030 and GB2312 in the
> user interface in 8.0 - if you picked simplified chinese, you got
> GB18030.

  Sure you don't.  I guess you can't even if you want to because PRC government
wouldn't
let you sell RH 8.0 in China otherwise :-)  My point of mentioning SC (GB18030)
was , if it's not clear, that if SC(GB18030) is supported, there should be
little reason Korean(UTF-8) and Japanese(UTF-8) cannot be  offered because
GB18030 and UTF-8 are just two different UTF's of a single coded character
set ISO10646/Unicode. 

> We are hoping that soon we can just use UTF-8 locales for CJK as we
> do for all other languages; at that point, we won't offer a choice
> between UTF-8 and the traditional encoding; when you select the 
> language in gdm, redhat-config-languages, or the installer, you'll
> just get The UTF-8 locale.

  Yup, that's definitely the way to go. Good bye to old encodings forever !!!
And, Linux will be on par with MS Windows 2k/XP in Korean support. 

> I'm assigning this bug to XFree86 

  Thanks for assigning it to a more appropriate component. I had a hard
time picking a component for this bug. There's a locale-ja, but
there's no component 'locale' and ended up with  'locale-config'. 

> since the XLC_LOCALE patches are
> the substantive part of the attachments.. it would be good to submit
> them upstream as well, however.

  I'll do. It's very frustrating to wait several months for XF86 team to accept 
my patches. ;-). I hope this patch will get accepted in a timely manner,
but in the meantime it'd be nice if RedHat can just go ahead with it.

Comment 8 Jungshik Shin 2002-10-14 19:08:26 UTC

>> since the XLC_LOCALE patches are
>> the substantive part of the attachments.. it would be good to submit
>> them upstream as well, however.

  > I'll do.
I've submitted my patch and was given seq. 5421.

Comment 9 Jungshik Shin 2002-10-15 14:24:22 UTC

In addition to Ami (patched for UTF-8), there's a gtk input module
for Korean, 'imhangul' by CHOI Hwan-jin (http://imhangul.kldp.net/). This
fully supports UTF-8 input for Korean and works really great !
Ami (with UTF-8 patch) is still necessary for non-gtk applications, though. 
Anyway, the existence of 'imhangul' makes my case for Korean UTF-8
locale even stronger.

Comment 10 Mike A. Harris 2002-11-03 10:33:40 UTC

 429. Add ko_KR.UTF-8 and ja_JP.UTF-8 XLC_LOCALE files (#5421, Jungshik Shin).

Your patches were accepted into XFree86 CVS a while back.  I've got CVS
builds going into rawhide in RPM's soon, so I'm closing this as resolved
in RAWHIDE.

Thanks.

Comment 11 Jungshik Shin 2002-12-16 05:31:56 UTC

When I submitted my patch to XF86, I forgot to include XI18NOBJS files for
ko_KR.UTF-8 and ja_JP.UTF-8 locales.(I had had them on my machine) 
They're identical to that for en_US.UTF-8.
However, without their presence in ko_KR.UTF-8 and ja_JP.UTF-8 directories,
X11 lib. complains that  two locales are not supported by Xlib. 
Perhaps, you've already taken care of it.. Anyway, I'm gonna send a new patch
to XFree86 to include them.

Comment 12 David Joo 2002-12-16 06:29:31 UTC

For the imhangul modules that you have mentioned, it works well in GNOME2
however, it doesn't work with GNOME1 applications such as XCHAT1.8 (1.9 is
developer version) and XMMS and so on..
That is why it hasn't been implemented.

Note You need to log in before you can comment on or make changes to this bug.