Bug 124849 - Application message is scrabmled in Japanese
Application message is scrabmled in Japanese
Product: Fedora
Classification: Fedora
Component: lynx (Show other bugs)
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Tim Waugh
Brian Brock
: i18n
Depends On:
  Show dependency treegraph
Reported: 2004-05-31 06:53 EDT by Nakai
Modified: 2007-11-30 17:10 EST (History)
4 users (show)

See Also:
Fixed In Version: 2.8.5-20
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-12-30 08:11:06 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Screenshot with the kind arrow of where is wrong (52.96 KB, image/png)
2004-05-31 06:57 EDT, Nakai
no flags Details
comparison of screenshots (15.18 KB, image/png)
2004-06-17 20:41 EDT, Thomas E. Dickey
no flags Details
This bug report in lynx, showing the bad/desired strings. (15.38 KB, image/png)
2004-06-20 18:28 EDT, Thomas E. Dickey
no flags Details

  None (edit)
Description Nakai 2004-05-31 06:53:54 EDT
Description of problem:
Application message is scrambled in Japanese.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Run lynx to see the html documents.
2. Wow
Actual results:
-- press space to next page -- in Japanese is scrambled.

Expected results:
Show Japanese message correctly.

Additional info:
Comment 1 Nakai 2004-05-31 06:57:50 EDT
Created attachment 100714 [details]
Screenshot with the kind arrow of where is wrong
Comment 2 Eido Inoue 2004-06-01 15:06:43 EDT
More helpful report for those whose native language isn't Japanese:

Version-Release number of selected component (if applicable):

1. Make sure you're running a FC system with JP fonts/etc installed
2. Make sure all locale catagories are in UTF-8 Japanese (set LANG
and/or LC_ALL to "ja_JP.UTF-8", confirm with "locale(1)" command), and
make sure gnome-terminal character encoding is set to UTF-8.
3. View a multi-page, plain text file (the GPL in /usr/share/doc is good)

Actual results:
-- 妾쮡妾 -
(multi-language gibberish)

Expected results:
-- 次ページはスペースキーで --
("Press space for the next page")

Additional Info:
The last two lines of help info are correct (although the character
spacing could use some work; but that's not a lynx bug). Messages that
are in colors (yellow foreground, blue background) seem to be
consistently scrambled.

Also, non-trival HTML pages (ex. www.yahoo.co.jp) in Japanese appear
to segfault lynx.
Comment 3 Thomas E. Dickey 2004-06-17 20:41:58 EDT
Created attachment 101232 [details]
comparison of screenshots
Comment 4 Thomas E. Dickey 2004-06-18 18:39:56 EDT
The screenshot I made was for uxterm on Debian, using a copy of
lynx which I built myself (linked with ncursesw of course).
It would be nice to know if this screenshot looks correct
(I don't read Japanese, but it does look different - from
both - but that could be a font issue).
Comment 5 Nakai 2004-06-19 20:24:27 EDT
commit #3 screenshots is ja_JP.UTF-8 locale?

Now I confirmed this bug happens on both gnome-terminal and uxterm in FC2.
Comment 6 Thomas E. Dickey 2004-06-19 21:09:32 EDT
yes, I used ja_JP.UTF-8 (to make lynx use the corresponding
ja.po contents).
Comment 7 Thomas E. Dickey 2004-06-20 18:20:31 EDT
Do the two (gnome-terminal and uxterm) produce the same result
in FC2, or are they different from each other?  
Comment 8 Thomas E. Dickey 2004-06-20 18:28:09 EDT
Created attachment 101280 [details]
This bug report in lynx, showing the bad/desired strings.

In this screenshot, the text is displayed without color (since it
is part of the regular text).  Is that example uncorrupted?  I notice
that the characters shown are different from the screenshot I made with
them in the blue/yellow message line.
Comment 9 Thomas E. Dickey 2004-06-20 18:30:06 EDT
If the second screenshot displays the characters properly,
I may have enough information to debug this.  (It's hard
to debug this without knowing what the result should look
Comment 10 Nakai 2004-06-21 00:01:59 EDT
gnome-terminal and uxterm have the same result in FC2.

To comment #8:
That is the expected result. Broken string is pasted in this
bug report to show the display result.

Comment #3 screenshot is the correct Japanese.
I can read it.
Comment 11 Thomas E. Dickey 2004-06-28 17:59:13 EDT
My concern in comment #8 was that (I'm viewing it with Opera),
it is possible that the browser would render it different from
lynx.  So the #3 screenshot (thanks for the confirmation).
Comment 12 Thomas E. Dickey 2004-06-30 17:37:01 EDT
It is not related to color; the data returned by gettext
is displayed by lynx+ncursesw just as if I echoed the
string directly to the terminal.

It occurs to me that the problem is that the message file
ja.po uses EUC-JP charset.  This issue came up in another
message file (though it seems odd to me that gettext would
have this limitation).  I'll investigate further.
Comment 13 Thomas E. Dickey 2004-07-03 10:20:30 EDT
Comment on attachment 101232 [details]
comparison of screenshots

This screenshot shows lynx on Debian/testing.
The message in blue corresponds to
"-- press space for next page --"
Comment 14 Thomas E. Dickey 2004-07-03 10:23:17 EDT
Comment on attachment 101280 [details]
This bug report in lynx, showing the bad/desired strings.

Shows lynx displaying this bug report. Compare with attachment #2 [details], which shows
a different message.  Also note that the expected message begins with "press",
not "Press".
Comment 15 Thomas E. Dickey 2004-07-03 10:28:23 EDT
Displaying ja.po in EUC-JP shows me the same text as I see in
attachment #2 [details].  I compared that against the text for the oldest
version of ja.po which I have (and noting that the translator
worked on it over the following 2 years), see the same text.
But I do not read Japanese, and cannot evaluate the quality of
the translation. There are two questions: whether the Redhat
package & configuration is correct, and whether the translation
is correct.  If we can get an answer to the second question, we
can focus on the first one.
Comment 16 John Thacker 2004-12-27 18:00:10 EST
I know a little bit more about this problem, which occurs in FC3 as well.  It's
not a limit of gettext exactly.  The translation is perfect, BTW.

The current version of lynx does not convert Japanese characters back and forth
between the legacy encodings (EUC-JP, Shift-JIS, etc.) and Unicode, although it
can convert back and forth between the legacy encodings.  (It hasn't properly
been rewritten to work with iconv()).  Therefore, we have multiple lynx.cfg
files located in /etc:  lynx.cfg, lynx.cfg.ja, and others.  When the user has no
.lynxrc files in his home directory, we select the lynx.cfg.ja configuration
file if the user's locale begins with ja, and the lynx.cfg file in most other cases.

The lynx.cfg.ja file explicitly sets EUC-JP as the character set used.  This is
better for browsing in Japanese, and works perfectly in the ja_JP.EUC-JP locale,
but runs into problems in the ja_JP.UTF-8 locale, causing precisely the problems

If I copy the /etc/lynx.cfg file to .lynxrc in my home directory, and then run
lynx with LANG=ja_JP.UTF-8, all works perfectly fine.  The Japanese application
messages appear exactly as they should.  This is because it does not read the
/etc/lynx.cfg.ja, and try to set the preferred character set to EUC-JP.

Therefore, we only want to use /etc/lynx.cfg.ja, which uses EUC-JP instead of
UTF-8, in ja_JP.EUC-JP (and eucjp, etc.) locales, but NOT in ja_JP.UTF-8. 
Editing the lynx-284-i18ncfg.patch included in the src RPM may fix the problem.
Comment 17 John Thacker 2004-12-27 18:44:09 EST
Another note:

lynx has an option which allows it to determine the proper charset from the
locale, added in lynx2-8-5pre3  The ability to use this is enabled at
compilation with --enable-locale-charset, though that appears to be the default
at least in the lynx2-8-6dev releases.  By default it's off, and can be turned
on by putting LOCALE_CHARSET:TRUE in lynx.cfg.

I recommend updating lynx, setting LOCALE_CHARSET:TRUE in /etc/lynx.cfg, and
getting rid off all the existing RedHat patches for running a special lynx.cfg
depending on the locale.
Comment 18 Thomas E. Dickey 2004-12-27 19:44:10 EST
That (the lynx.cfg.ja) hadn't occurred to me, though I did
notice it in the rpm.

Starting with 2.8.6dev.6, I've modified lynx to work better
with multibyte locales (though I have some reports that say
it's not working that well, I've not been able to see that on
my configuration).  Any constructive feedback on that would
be useful.  (I'm currently working on bug reports for lynx, to
get dev.9 out this week).
Comment 19 John Thacker 2004-12-27 20:27:45 EST
Well, since you asked for feedback on lynx and multibyte locales:

It would be nice to be able to translate EUC-JP and Shift-JIS *into* UTF-8, as
well as the recently added ability to translate UTF-8 into EUC-JP and Shift-JIS.
 (via --enable-japanese-utf8) Shouldn't iconv() allow this?

If we could get this, it would help immensely.
Comment 20 Thomas E. Dickey 2004-12-27 20:44:17 EST
Yes, iconv should help, but I'm not quite ready to do
that part.  In my mind, it would build upon the changes
I made a few months ago to use the wide-character support
in ncurses.  One of the issues I see is that iconv can
also be slow unless some of the results are saved - which
is also an issue with the changes I've already made, since
I've no nice/simple interface to take a multibyte string
and say how many columns it will use (it's a 2-3 step
Comment 22 Tim Waugh 2004-12-30 08:11:06 EST
Fixed package is 2.8.5-20.

Note You need to log in before you can comment on or make changes to this bug.