Bug 124849

Summary: Application message is scrabmled in Japanese
Product: [Fedora] Fedora Reporter: Nakai <ynakai>
Component: lynxAssignee: Tim Waugh <twaugh>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 2CC: dickey, eng-i18n-bugs, fedora-ja-list, johnthacker
Target Milestone: ---Keywords: i18n
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: 2.8.5-20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-30 13:11:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Screenshot with the kind arrow of where is wrong
none
comparison of screenshots
none
This bug report in lynx, showing the bad/desired strings. none

Description Nakai 2004-05-31 10:53:54 UTC
Description of problem:
Application message is scrambled in Japanese.

Version-Release number of selected component (if applicable):


How reproducible:
Everytime.

Steps to Reproduce:
1. Run lynx to see the html documents.
2. Wow
3.
  
Actual results:
-- press space to next page -- in Japanese is scrambled.

Expected results:
Show Japanese message correctly.

Additional info:

Comment 1 Nakai 2004-05-31 10:57:50 UTC
Created attachment 100714 [details]
Screenshot with the kind arrow of where is wrong

Comment 2 Eido Inoue 2004-06-01 19:06:43 UTC
More helpful report for those whose native language isn't Japanese:

Version-Release number of selected component (if applicable):
lynx-2.8.5-15

1. Make sure you're running a FC system with JP fonts/etc installed
correctly.
2. Make sure all locale catagories are in UTF-8 Japanese (set LANG
and/or LC_ALL to "ja_JP.UTF-8", confirm with "locale(1)" command), and
make sure gnome-terminal character encoding is set to UTF-8.
3. View a multi-page, plain text file (the GPL in /usr/share/doc is good)

Actual results:
-- 妾쮡妾 -
(multi-language gibberish)

Expected results:
-- 次ãã¼ã¸ã¯ã¹ãã¼ã¹ã­ã¼ã§ --
("Press space for the next page")

Additional Info:
The last two lines of help info are correct (although the character
spacing could use some work; but that's not a lynx bug). Messages that
are in colors (yellow foreground, blue background) seem to be
consistently scrambled.

Also, non-trival HTML pages (ex. www.yahoo.co.jp) in Japanese appear
to segfault lynx.


Comment 3 Thomas E. Dickey 2004-06-18 00:41:58 UTC
Created attachment 101232 [details]
comparison of screenshots

Comment 4 Thomas E. Dickey 2004-06-18 22:39:56 UTC
The screenshot I made was for uxterm on Debian, using a copy of
lynx which I built myself (linked with ncursesw of course).
It would be nice to know if this screenshot looks correct
(I don't read Japanese, but it does look different - from
both - but that could be a font issue).

Comment 5 Nakai 2004-06-20 00:24:27 UTC
commit #3 screenshots is ja_JP.UTF-8 locale?

Now I confirmed this bug happens on both gnome-terminal and uxterm in FC2.

Comment 6 Thomas E. Dickey 2004-06-20 01:09:32 UTC
yes, I used ja_JP.UTF-8 (to make lynx use the corresponding
ja.po contents).

Comment 7 Thomas E. Dickey 2004-06-20 22:20:31 UTC
Do the two (gnome-terminal and uxterm) produce the same result
in FC2, or are they different from each other?  

Comment 8 Thomas E. Dickey 2004-06-20 22:28:09 UTC
Created attachment 101280 [details]
This bug report in lynx, showing the bad/desired strings.

In this screenshot, the text is displayed without color (since it
is part of the regular text).  Is that example uncorrupted?  I notice
that the characters shown are different from the screenshot I made with
them in the blue/yellow message line.

Comment 9 Thomas E. Dickey 2004-06-20 22:30:06 UTC
If the second screenshot displays the characters properly,
I may have enough information to debug this.  (It's hard
to debug this without knowing what the result should look
like).

Comment 10 Nakai 2004-06-21 04:01:59 UTC
gnome-terminal and uxterm have the same result in FC2.

To comment #8:
That is the expected result. Broken string is pasted in this
bug report to show the display result.

Comment #3 screenshot is the correct Japanese.
I can read it.

Comment 11 Thomas E. Dickey 2004-06-28 21:59:13 UTC
My concern in comment #8 was that (I'm viewing it with Opera),
it is possible that the browser would render it different from
lynx.  So the #3 screenshot (thanks for the confirmation).

Comment 12 Thomas E. Dickey 2004-06-30 21:37:01 UTC
It is not related to color; the data returned by gettext
is displayed by lynx+ncursesw just as if I echoed the
string directly to the terminal.

It occurs to me that the problem is that the message file
ja.po uses EUC-JP charset.  This issue came up in another
message file (though it seems odd to me that gettext would
have this limitation).  I'll investigate further.

Comment 13 Thomas E. Dickey 2004-07-03 14:20:30 UTC
Comment on attachment 101232 [details]
comparison of screenshots

This screenshot shows lynx on Debian/testing.
The message in blue corresponds to
"-- press space for next page --"

Comment 14 Thomas E. Dickey 2004-07-03 14:23:17 UTC
Comment on attachment 101280 [details]
This bug report in lynx, showing the bad/desired strings.

Shows lynx displaying this bug report. Compare with attachment #2 [details], which shows
a different message.  Also note that the expected message begins with "press",
not "Press".

Comment 15 Thomas E. Dickey 2004-07-03 14:28:23 UTC
Displaying ja.po in EUC-JP shows me the same text as I see in
attachment #2 [details].  I compared that against the text for the oldest
version of ja.po which I have (and noting that the translator
worked on it over the following 2 years), see the same text.
But I do not read Japanese, and cannot evaluate the quality of
the translation. There are two questions: whether the Redhat
package & configuration is correct, and whether the translation
is correct.  If we can get an answer to the second question, we
can focus on the first one.

Comment 16 John Thacker 2004-12-27 23:00:10 UTC
I know a little bit more about this problem, which occurs in FC3 as well.  It's
not a limit of gettext exactly.  The translation is perfect, BTW.

The current version of lynx does not convert Japanese characters back and forth
between the legacy encodings (EUC-JP, Shift-JIS, etc.) and Unicode, although it
can convert back and forth between the legacy encodings.  (It hasn't properly
been rewritten to work with iconv()).  Therefore, we have multiple lynx.cfg
files located in /etc:  lynx.cfg, lynx.cfg.ja, and others.  When the user has no
.lynxrc files in his home directory, we select the lynx.cfg.ja configuration
file if the user's locale begins with ja, and the lynx.cfg file in most other cases.

The lynx.cfg.ja file explicitly sets EUC-JP as the character set used.  This is
better for browsing in Japanese, and works perfectly in the ja_JP.EUC-JP locale,
but runs into problems in the ja_JP.UTF-8 locale, causing precisely the problems
witnessed.

If I copy the /etc/lynx.cfg file to .lynxrc in my home directory, and then run
lynx with LANG=ja_JP.UTF-8, all works perfectly fine.  The Japanese application
messages appear exactly as they should.  This is because it does not read the
/etc/lynx.cfg.ja, and try to set the preferred character set to EUC-JP.

Therefore, we only want to use /etc/lynx.cfg.ja, which uses EUC-JP instead of
UTF-8, in ja_JP.EUC-JP (and eucjp, etc.) locales, but NOT in ja_JP.UTF-8. 
Editing the lynx-284-i18ncfg.patch included in the src RPM may fix the problem.

Comment 17 John Thacker 2004-12-27 23:44:09 UTC
Another note:

lynx has an option which allows it to determine the proper charset from the
locale, added in lynx2-8-5pre3  The ability to use this is enabled at
compilation with --enable-locale-charset, though that appears to be the default
at least in the lynx2-8-6dev releases.  By default it's off, and can be turned
on by putting LOCALE_CHARSET:TRUE in lynx.cfg.

I recommend updating lynx, setting LOCALE_CHARSET:TRUE in /etc/lynx.cfg, and
getting rid off all the existing RedHat patches for running a special lynx.cfg
depending on the locale.

Comment 18 Thomas E. Dickey 2004-12-28 00:44:10 UTC
That (the lynx.cfg.ja) hadn't occurred to me, though I did
notice it in the rpm.

Starting with 2.8.6dev.6, I've modified lynx to work better
with multibyte locales (though I have some reports that say
it's not working that well, I've not been able to see that on
my configuration).  Any constructive feedback on that would
be useful.  (I'm currently working on bug reports for lynx, to
get dev.9 out this week).

Comment 19 John Thacker 2004-12-28 01:27:45 UTC
Well, since you asked for feedback on lynx and multibyte locales:

It would be nice to be able to translate EUC-JP and Shift-JIS *into* UTF-8, as
well as the recently added ability to translate UTF-8 into EUC-JP and Shift-JIS.
 (via --enable-japanese-utf8) Shouldn't iconv() allow this?

If we could get this, it would help immensely.

Comment 20 Thomas E. Dickey 2004-12-28 01:44:17 UTC
Yes, iconv should help, but I'm not quite ready to do
that part.  In my mind, it would build upon the changes
I made a few months ago to use the wide-character support
in ncurses.  One of the issues I see is that iconv can
also be slow unless some of the results are saved - which
is also an issue with the changes I've already made, since
I've no nice/simple interface to take a multibyte string
and say how many columns it will use (it's a 2-3 step
process).

Comment 22 Tim Waugh 2004-12-30 13:11:06 UTC
Fixed package is 2.8.5-20.