Bug 143787

Summary: Lynx won't display Japanese Unicode when in Japanese locales
Product: [Fedora] Fedora Reporter: John Thacker <johnthacker>
Component: lynxAssignee: Ivana Varekova <varekova>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: dickey, mattdm
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
URL: http://bythebay.web.infoseek.co.jp/pc/mojibake61.html
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-25 11:07:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Thacker 2004-12-27 23:12:52 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041020

Description of problem:
lynx currently does not have support for translating Japanese from legacy character sets (Shift-JIS, EUC-JP, etc.) into UTF-8.  Therefore, we include a hack that causes lynx to use EUC-JP instead when run in Japanese locales, since most Japanese web content is still in one of the legacy locales.

The version of lynx included does not support translating Japanese UTF-8 into EUC-JP, which means that gibberish is obtained instead when UTF-8 webpages are viewed.  However, as of version 2.8.6dev.4, this functionality is added, so long as lynx is compiled with --enable-japanese-utf8.  Unfortunately, lynx still does not translate the legacy charsets INTO UTF-8, but at least updating would allow Japanese UTF-8 webpages to be viewed by Japanese users.

Version-Release number of selected component (if applicable):
lynx-2.8.5-18

How reproducible:
Always

Steps to Reproduce:
1. rm ~/.lynxrc, if it exists
2. export LANG=ja_JP.EUC-JP
3. lynx http://bythebay.web.infoseek.co.jp/pc/mojibake61.html
  

Actual Results:  Gibberish, mojibake, corrupted characters, whatever you want to call it

Expected Results:  Legible Japanese should have been displayed.  In fact, leaving LANG=en_US.UTF-8 and running lynx http://bythebay.web.infoseek.co.jp/pc/mojibake61.html should give the proper result, assuming Japanese fonts are installed.

Additional info:

The fix is pretty easy.  Just get a recent version of lynx, at least 2.8.6dev4, though 2.8.6dev7 is available, stable, and has other fixes, and add --enable-japanese-utf8 to the %configure line in lynx.spec

Comment 1 Thomas E. Dickey 2004-12-28 00:38:07 UTC
2.8.6dev.8 is current.

Comment 2 John Thacker 2004-12-28 01:30:02 UTC
Ah yes, you're right of course.  Misthought that.  In any case, it would be a
nice improvement.

I think that Redhat/Fedora policy is in general to avoid "development" versions,
but 2.8.6dev.8 is extremely stable IMO and worth updating to.  Do you agree?

Comment 3 Thomas E. Dickey 2004-12-28 01:40:14 UTC
My patches tend to be pretty stable.  But I am concerned
about the newer code for multibyte support.  Briefly:  I
set out to reimplement the UTF-8 logic in lynx using the
wide-character in ncurses.  The options-menu and info-page
are parts that use this (a good built-in self-test).  Some
of the Debian users say the options menu isn't working well
in that configuration (and I've not found where our configs
differ).  Here's a screen shot I made (they say the links
are being offset by one cell):
   ftp://invisible-island.net/temp/lynx286devf5.png

Comment 4 John Thacker 2004-12-28 23:56:57 UTC
Well, I built 2.8.6dev.8 (with --enable-japanese-utf8), and the options menu
looks perfectly fine for me with LANG=ja_JP.utf8  It doesn't look quite like
your screenshot, though, since your screenshot to me looks like the links are
offset.  (Notice how the ãï¼ is duplicated at the end in yours.)

I do have a problem if I resize my terminal.  I have to quit lynx and restart it
for it work properly, reloading doesn't help, nor does loading a new webpage. 
But that's a different problem, happens in all charsets and locales, and of long
standing in lynx

Comment 5 Thomas E. Dickey 2004-12-29 00:31:02 UTC
I didn't notice the duplication (will look into that).
The only problem I had noticed was with the file
   test/utf-8-demo.html
near the end: the line-length for wrapping is off by one.
I suspect this is a different case.  The reports I had about
the offset was that all of the highlighted links were one
column to the left (something like that).

Resizing is a different problem - the configure script looks
for resizeterm() and ncurses.  If there's something wrong
with the ifdef's it won't compile-in the code for that.



Comment 6 Tim Waugh 2004-12-30 12:04:56 UTC
I'll hold off upgrading the RPM until it's settled down a bit then.  I'd much
prefer to upgrade to a stable release than a development one.

Comment 7 Thomas E. Dickey 2004-12-30 12:34:20 UTC
That sounds ok.  I just marked 2.8.6dev.9, which has a
number of fixes other than this area (and will be working
on xterm).  My current plan for 2.8.6dev.10 is to work on
this area (to fix the two items mentioned above, as least).
I'm thinking about releasing 2.8.6 around late February,
provided that I can iron out this area.

Comment 8 John Thacker 2005-10-23 17:19:50 UTC
As a slight update to this issue, 2.8.6dev.14 was released recently, and
"extend[s] experimental option --enable-japanese-utf8, allowing lynx to convert
EUC-JP and Shift_JIS strings to UTF-8"

It's a rather nice update that means that, with that option enabled, Japanese
users can browse in a Unicode Japanese locale as well and still view basically
all webpages.  In addition, non-Japanese users can use lynx when in a
non-Japanse UTF-8 based locale and still read basically all Japanese webpages. 
As it is currently, most Japanese webpages seem to still be using EUC-JP or
(especially) Shift_JIS.

However, I don't know whether there are any big blockers left in 2.8.6dev.14. I
know upgrades to development releases are discouraged, but this is a pretty huge
fix for reading Japanese webpages.

Comment 9 Thomas E. Dickey 2005-10-23 23:16:34 UTC
There are no big blockers - my available time's been reduced,
so it's taking longer to do things.  There are a few new
(since 2.8.5) display problems that I've been working to resolve.
Those get done concurrently with new bug reports, but basically
it's the new display problems that are why 2.8.6's not yet released.

Comment 10 Matthew Miller 2006-07-10 22:36:07 UTC
Fedora Core 3 is now maintained by the Fedora Legacy project for security
updates only. If this problem is a security issue, please reopen and
reassign to the Fedora Legacy product. If it is not a security issue and
hasn't been resolved in the current FC5 updates or in the FC6 test
release, reopen and change the version to match.

Thank you!


Comment 11 John Thacker 2006-07-10 23:00:04 UTC
Still a problem with FC5 and FC6 test1.
It's an issue with lynx2.8.5.  It's essentially fixed in 2.8.6 (by passing
--enable-japanese-utf8 when building), but that's been in .dev releases for
almost two and a half years.  The 2.8.6.dev releases, such as dev.18, are very
stable and have long reached the point in my mind where the bugfixes and added
features outweigh any possible regressions.  But 2.8.6 isn't officially out,
though I'm not sure what important bugs are blocking it.

The bug also hangs around as a reminder to add the build option.  This would be
a highly nice thing to fix, since we've gone to UTF8 for the default Japanese
locale for some time.

Comment 12 Thomas E. Dickey 2006-07-10 23:58:52 UTC
There's one annoying regression (returning to a page after
following a link with a "#" doesn't always return to the
expected point).  I think I've dealt with the other issues
blocking a release, and intend working on that once I've got
past a current set of changes to xterm...

Comment 13 Thomas E. Dickey 2006-10-14 22:33:15 UTC
I released lynx 2.8.6 last week - updating the package should
resolve this bug report.

Comment 14 Ivana Varekova 2006-10-24 11:49:29 UTC
Fixed in lynx-2.8.6-1.

Comment 15 John Thacker 2006-10-24 13:12:52 UTC
Not quite, actually.  The "--enable-japanese-utf8" option still needs to be
added to configure in order to solve the original bug report, and I see that
according to CVS, it hasn't been yet.  Thanks!

Comment 16 Ivana Varekova 2006-10-25 11:07:39 UTC
Thank you for your fast notice, you are right, lynx-2.8.6-1 does not fix your
problem. 
lynx-2.8.6-2 uses --enable-japanese-utf8 option and should fix the bug.