Bug 124849
Summary: | Application message is scrabmled in Japanese | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Nakai <ynakai> | ||||||||
Component: | lynx | Assignee: | Tim Waugh <twaugh> | ||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Brian Brock <bbrock> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 2 | CC: | dickey, eng-i18n-bugs, fedora-ja-list, johnthacker | ||||||||
Target Milestone: | --- | Keywords: | i18n | ||||||||
Target Release: | --- | ||||||||||
Hardware: | i386 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | 2.8.5-20 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2004-12-30 13:11:06 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Nakai
2004-05-31 10:53:54 UTC
Created attachment 100714 [details]
Screenshot with the kind arrow of where is wrong
More helpful report for those whose native language isn't Japanese: Version-Release number of selected component (if applicable): lynx-2.8.5-15 1. Make sure you're running a FC system with JP fonts/etc installed correctly. 2. Make sure all locale catagories are in UTF-8 Japanese (set LANG and/or LC_ALL to "ja_JP.UTF-8", confirm with "locale(1)" command), and make sure gnome-terminal character encoding is set to UTF-8. 3. View a multi-page, plain text file (the GPL in /usr/share/doc is good) Actual results: -- 妾쮡妾 - (multi-language gibberish) Expected results: -- 次ãã¼ã¸ã¯ã¹ãã¼ã¹ãã¼ã§ -- ("Press space for the next page") Additional Info: The last two lines of help info are correct (although the character spacing could use some work; but that's not a lynx bug). Messages that are in colors (yellow foreground, blue background) seem to be consistently scrambled. Also, non-trival HTML pages (ex. www.yahoo.co.jp) in Japanese appear to segfault lynx. Created attachment 101232 [details]
comparison of screenshots
The screenshot I made was for uxterm on Debian, using a copy of lynx which I built myself (linked with ncursesw of course). It would be nice to know if this screenshot looks correct (I don't read Japanese, but it does look different - from both - but that could be a font issue). commit #3 screenshots is ja_JP.UTF-8 locale? Now I confirmed this bug happens on both gnome-terminal and uxterm in FC2. yes, I used ja_JP.UTF-8 (to make lynx use the corresponding ja.po contents). Do the two (gnome-terminal and uxterm) produce the same result in FC2, or are they different from each other? Created attachment 101280 [details]
This bug report in lynx, showing the bad/desired strings.
In this screenshot, the text is displayed without color (since it
is part of the regular text). Is that example uncorrupted? I notice
that the characters shown are different from the screenshot I made with
them in the blue/yellow message line.
If the second screenshot displays the characters properly, I may have enough information to debug this. (It's hard to debug this without knowing what the result should look like). gnome-terminal and uxterm have the same result in FC2. To comment #8: That is the expected result. Broken string is pasted in this bug report to show the display result. Comment #3 screenshot is the correct Japanese. I can read it. My concern in comment #8 was that (I'm viewing it with Opera), it is possible that the browser would render it different from lynx. So the #3 screenshot (thanks for the confirmation). It is not related to color; the data returned by gettext is displayed by lynx+ncursesw just as if I echoed the string directly to the terminal. It occurs to me that the problem is that the message file ja.po uses EUC-JP charset. This issue came up in another message file (though it seems odd to me that gettext would have this limitation). I'll investigate further. Comment on attachment 101232 [details]
comparison of screenshots
This screenshot shows lynx on Debian/testing.
The message in blue corresponds to
"-- press space for next page --"
Comment on attachment 101280 [details] This bug report in lynx, showing the bad/desired strings. Shows lynx displaying this bug report. Compare with attachment #2 [details], which shows a different message. Also note that the expected message begins with "press", not "Press". Displaying ja.po in EUC-JP shows me the same text as I see in
attachment #2 [details]. I compared that against the text for the oldest
version of ja.po which I have (and noting that the translator
worked on it over the following 2 years), see the same text.
But I do not read Japanese, and cannot evaluate the quality of
the translation. There are two questions: whether the Redhat
package & configuration is correct, and whether the translation
is correct. If we can get an answer to the second question, we
can focus on the first one.
I know a little bit more about this problem, which occurs in FC3 as well. It's not a limit of gettext exactly. The translation is perfect, BTW. The current version of lynx does not convert Japanese characters back and forth between the legacy encodings (EUC-JP, Shift-JIS, etc.) and Unicode, although it can convert back and forth between the legacy encodings. (It hasn't properly been rewritten to work with iconv()). Therefore, we have multiple lynx.cfg files located in /etc: lynx.cfg, lynx.cfg.ja, and others. When the user has no .lynxrc files in his home directory, we select the lynx.cfg.ja configuration file if the user's locale begins with ja, and the lynx.cfg file in most other cases. The lynx.cfg.ja file explicitly sets EUC-JP as the character set used. This is better for browsing in Japanese, and works perfectly in the ja_JP.EUC-JP locale, but runs into problems in the ja_JP.UTF-8 locale, causing precisely the problems witnessed. If I copy the /etc/lynx.cfg file to .lynxrc in my home directory, and then run lynx with LANG=ja_JP.UTF-8, all works perfectly fine. The Japanese application messages appear exactly as they should. This is because it does not read the /etc/lynx.cfg.ja, and try to set the preferred character set to EUC-JP. Therefore, we only want to use /etc/lynx.cfg.ja, which uses EUC-JP instead of UTF-8, in ja_JP.EUC-JP (and eucjp, etc.) locales, but NOT in ja_JP.UTF-8. Editing the lynx-284-i18ncfg.patch included in the src RPM may fix the problem. Another note: lynx has an option which allows it to determine the proper charset from the locale, added in lynx2-8-5pre3 The ability to use this is enabled at compilation with --enable-locale-charset, though that appears to be the default at least in the lynx2-8-6dev releases. By default it's off, and can be turned on by putting LOCALE_CHARSET:TRUE in lynx.cfg. I recommend updating lynx, setting LOCALE_CHARSET:TRUE in /etc/lynx.cfg, and getting rid off all the existing RedHat patches for running a special lynx.cfg depending on the locale. That (the lynx.cfg.ja) hadn't occurred to me, though I did notice it in the rpm. Starting with 2.8.6dev.6, I've modified lynx to work better with multibyte locales (though I have some reports that say it's not working that well, I've not been able to see that on my configuration). Any constructive feedback on that would be useful. (I'm currently working on bug reports for lynx, to get dev.9 out this week). Well, since you asked for feedback on lynx and multibyte locales: It would be nice to be able to translate EUC-JP and Shift-JIS *into* UTF-8, as well as the recently added ability to translate UTF-8 into EUC-JP and Shift-JIS. (via --enable-japanese-utf8) Shouldn't iconv() allow this? If we could get this, it would help immensely. Yes, iconv should help, but I'm not quite ready to do that part. In my mind, it would build upon the changes I made a few months ago to use the wide-character support in ncurses. One of the issues I see is that iconv can also be slow unless some of the results are saved - which is also an issue with the changes I've already made, since I've no nice/simple interface to take a multibyte string and say how many columns it will use (it's a 2-3 step process). Fixed package is 2.8.5-20. |