Bug 144707 - non-CJK text broken by default for Western locale
Summary: non-CJK text broken by default for Western locale
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: emacs
Version: rawhide
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Chip Coldwell
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-01-10 21:03 UTC by James Ralston
Modified: 2007-11-30 22:10 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-11-06 16:32:00 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
screenshot of Emacs failing to save a buffer with Japanese text (9.63 KB, image/png)
2005-02-03 20:45 UTC, James Ralston
no flags Details

Description James Ralston 2005-01-10 21:03:15 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20040922

Description of problem:
I have two machines running FC2.  On those machines, I have no problems with emacs and Japanese text (hiragana, katakana, kanji).  I can visit files containing Japanese text encoded in UTF-8; emacs autodetects UTF-8 encoding and displays the text properly.  I can save buffers containing Japanese text using the utf-8 coding system.

In short, on the FC2 machines, Japanese text in emacs just works.

I just loaded FC3 on a third machine.  On the FC3 machine, I can't get emacs Japanese text support working.  If I visit a buffer containing Japanese text encoded in UTF-8, I just get a bunch of gibberish (backslash octal characters and empty boxes).  I can paste Japanese text (copied from another application) into Emacs buffers, and it displays properly, but then if I attempt to save the buffer, I receive this message:

> These default coding systems were tried:
>   utf-8
> However, none of them safely encodes the target text.
> 
> Select one of the following safe coding systems:
>   euc-jp shift_jis iso-2022-jp iso-2022-jp-2 x-ctext
>   japanese-iso-7bit-1978-irv iso-2022-7bit raw-text emacs-mule
>   no-conversion iso-2022-7bit-lock-ss2 ctext-no-compositions
>   iso-2022-8bit-ss2 iso-2022-7bit-lock iso-2022-7bit-ss2
>   tibetan-iso-8bit-with-esc thai-tis620-with-esc lao-with-esc
>   korean-iso-8bit-with-esc hebrew-iso-8bit-with-esc
>   greek-iso-8bit-with-esc iso-latin-9-with-esc iso-latin-8-with-esc
>   iso-latin-5-with-esc iso-latin-4-with-esc iso-latin-3-with-esc
>   iso-latin-2-with-esc iso-latin-1-with-esc
>   in-is13194-devanagari-with-esc cyrillic-iso-8bit-with-esc
>   chinese-iso-8bit-with-esc japanese-iso-8bit-with-esc

I cannot see how this message can be correct, because UTF-8 encodes *everything*.

Emacs Japanese text support worked just fine on FC2, but now it appears to be broken on FC3.  All other FC3 applications I've used have worked just fine; it only seems to be emacs that is broken.  Does anyone know what's going on and how to fix it?

(I did an "Everything" install when I loaded my FC3 machine, so I am hoping that the problem is something simple, like I accidentally included some bogus "backwards-compatibility" package designed for Japanese text support in the days before UTF-8.)


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Run emacs and try to use Japanese text.


Actual Results:  See above.


Expected Results:  See above.


Additional info:

Comment 1 James Ralston 2005-01-10 21:05:22 UTC
[Bah, I didn't realize the Bugzilla beta doesn't wrap lines anymore.
Here's a problem report without the lines wrapped.]

I have two machines running FC2.  On those machines, I have no
problems with emacs and Japanese text (hiragana, katakana, kanji).  I
can visit files containing Japanese text encoded in UTF-8; emacs
autodetects UTF-8 encoding and displays the text properly.  I can save
buffers containing Japanese text using the utf-8 coding system.

In short, on the FC2 machines, Japanese text in emacs just works.

I just loaded FC3 on a third machine.  On the FC3 machine, I can't get
emacs Japanese text support working.  If I visit a buffer containing
Japanese text encoded in UTF-8, I just get a bunch of gibberish
(backslash octal characters and empty boxes).  I can paste Japanese
text (copied from another application) into Emacs buffers, and it
displays properly, but then if I attempt to save the buffer, I receive
this message:

> These default coding systems were tried:
>   utf-8
> However, none of them safely encodes the target text.
> 
> Select one of the following safe coding systems:
>   euc-jp shift_jis iso-2022-jp iso-2022-jp-2 x-ctext
>   japanese-iso-7bit-1978-irv iso-2022-7bit raw-text emacs-mule
>   no-conversion iso-2022-7bit-lock-ss2 ctext-no-compositions
>   iso-2022-8bit-ss2 iso-2022-7bit-lock iso-2022-7bit-ss2
>   tibetan-iso-8bit-with-esc thai-tis620-with-esc lao-with-esc
>   korean-iso-8bit-with-esc hebrew-iso-8bit-with-esc
>   greek-iso-8bit-with-esc iso-latin-9-with-esc iso-latin-8-with-esc
>   iso-latin-5-with-esc iso-latin-4-with-esc iso-latin-3-with-esc
>   iso-latin-2-with-esc iso-latin-1-with-esc
>   in-is13194-devanagari-with-esc cyrillic-iso-8bit-with-esc
>   chinese-iso-8bit-with-esc japanese-iso-8bit-with-esc

I cannot see how this message can be correct, because UTF-8 encodes
*everything*.

Emacs Japanese text support worked just fine on FC2, but now it
appears to be broken on FC3.  All other FC3 applications I've used
have worked just fine; it only seems to be emacs that is broken.  Does
anyone know what's going on and how to fix it?

(I did an "Everything" install when I loaded my FC3 machine, so I am
hoping that the problem is something simple, like I accidentally
included some bogus "backwards-compatibility" package designed for
Japanese text support in the days before UTF-8.)


Comment 2 Jens Petersen 2005-01-11 07:00:48 UTC
It seems to work fine for me.

Have tried with "emacs -q"?

Comment 3 James Ralston 2005-01-12 06:27:08 UTC
Running "emacs -q" yields the same results.

I will go post on gnu.emacs.help and see if anyone there has any ideas.


Comment 4 James Ralston 2005-02-03 20:45:26 UTC
Created attachment 110618 [details]
screenshot of Emacs failing to save a buffer with Japanese text

I posted to both gnu.emacs.help and gnu.emacs.bug:

http://groups-beta.google.com/group/gnu.emacs.help/browse_thread/thread/2bc4eb72af963da3/c85c2cf0be051ef6

http://groups-beta.google.com/group/gnu.emacs.bug/browse_thread/thread/1f5edc323f8c05d9/035de9a8da1d3782


The single reply that I received to my gnu.emacs.help post doesn't seem to be
relevant.

I am open to the possibility that I am doing something wrong.  However, all
available evidence indicates that this is a bug with Emacs, especially since
everything worked under FC2.

I've attached a self-explanatory screenshot that describes the problem.  If you

could hit some of the Emacs developers over the head with this, I'd appreciate
it.

Comment 5 Jens Petersen 2005-04-08 01:56:44 UTC
Well this is the first report of this problem I've heard
so I suspect something is wrong on your side.

Are there any local modifications to the site config files for emacs?

Does "rpm -V emacs emacs-common" output anything?


Comment 6 James Ralston 2006-04-20 19:17:04 UTC
Peter Salvi finally clued me in to the solution back in August 2005.

The problem is that Unicode support isn't loaded by default.  You need to
specifically load Unicode support by adding the following line to your .emacs file:

(require 'un-define)

The above line is all that's necessary, but from Googling, I also found this
suggestion:

;;; Load Unicode support.
(when (locate-library "un-define")
  (require 'un-define)
  (require 'unicode)
  ;; requiring unidata is optional:
  (require 'unidata))

In either case, this is *completely* unintuitive.  I spent weeks researching
this problem, and I never once encountered *any* mention that Unicode support
had to be specifically enabled.

It's 2006.  The fact that a program that is Unicode-capable requires specific
contortions to enable Unicode support is mind-boggling.

I'm guessing the only reason Emacs doesn't load Unicode support by default is
because it dramatically increases Emacs' startup time (try launching Emacs with
and without having the un-define line in your .emacs file and you'll see what I
mean).  But that's no excuse--Unicode support should be enabled by default, and
if people complain about poor startup times, then the solution is to fix the
startup time with Unicode support enabled, not break Unicode support.


Comment 7 Jens Petersen 2006-04-21 01:59:30 UTC
Well that is not strictly correct:  Emacs supports utf-8 for non-Asian
characters by default, but not Asian characters.  So yes, if you're
in a European/Western locale then you need to load un-define yourself
since it slows down Emacs startup significantly for users that don't
need Asian language support.  un-define is set to load at startup when
Emacs is started in an Asian locale.  So if you want to make use of that
you can set LC_CTYPE=ja_JP.UTF-8 for example for Emacs, or borrow the
setup code in lang-coding-systems-init.el for your .emacs if you prefer
not to do that.  (BTW in Emacs 22 un-define will no longer be necessary
so that will be a big win.)


Comment 8 Jens Petersen 2006-04-21 02:37:53 UTC
I suggest adding this to dotemacs.el

Comment 9 Jens Petersen 2006-04-21 03:00:06 UTC
;;; uncomment for CJK utf-8 support for non-Asian users
;; (require 'un-define)


However the problem here really is that emacs22 has been released yet.

Comment 10 Chip Coldwell 2006-11-06 16:32:00 UTC
devel: fixed in 21.4-18


Note You need to log in before you can comment on or make changes to this bug.