Bug 474069

Summary: links -dump can no longer deal with iso-8859-1-encoded files
Product: [Fedora] Fedora Reporter: Alexandre Oliva <oliva>
Component: elinksAssignee: Ondrej Vasik <ovasik>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 10CC: kdudka, ovasik
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-12-05 13:04:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File encoded in iso-8859-1, that triggers the bug on a utf-8 system none

Description Alexandre Oliva 2008-12-01 23:29:14 UTC
Created attachment 325317 [details]
File encoded in iso-8859-1, that triggers the bug on a utf-8 system

Description of problem:
The gnus Emacs-based mail reader uses (as the default option) "links -dump file" to decode text/html parts.  links as in elinks-0.11.4-1.fc9 would correctly decode such files as the attached, whereas version 0.12-0.6.pre2.fc10 will make a mess with it.

While I agree gnus should ideally convert the file to the system encoding (originally in iso-8859-1 over quoted-printable in my case), it doesn't, and it never had to.  Any chance of bringing back the charset auto-detection?

Version-Release number of selected component (if applicable):
elinks-0.12-0.6.pre2.fc10

How reproducible:
Every time

Steps to Reproduce:
1.links -dump attached-file
  
Actual results:
   Oi Oliv

   ��o temos

od -c:

0000000               O   i       O   l   i   v  \n  \n             357
0000020 277 275 357 277 275   o       t   e   m   o   s  \n


Expected results:
   Oi Oliv�o

   N�o temos

0000000               O   i       O   l   i   v 343   o  \n  \n        
0000020       N 343   o       t   e   m   o   s  \n
0000033


Additional info:

Comment 1 Kamil Dudka 2008-12-05 13:04:52 UTC
(In reply to comment #0)
Thank you for the report.

> Created an attachment (id=325317) [details]
> File encoded in iso-8859-1, that triggers the bug on a utf-8 system
This file lacks information about charset used, try surround it with html tag as it is a html document and insert html header before body like this:
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>

> Description of problem:
> The gnus Emacs-based mail reader uses (as the default option) "links -dump
> file" to decode text/html parts.  links as in elinks-0.11.4-1.fc9 would
> correctly decode such files as the attached, whereas version 0.12-0.6.pre2.fc10
> will make a mess with it.
0.12 branch of elinks became utf-8 ready and it uses utf-8 charset for documents with unspecified charset (nowadays popular charset on internet). I have no idea how the gnus Emacs-based mail reader works. But each non ASCII mail has information about the charset in its header like this:
Content-Type: text/plain; charset="utf-8"

I think passing this to elinks in HTML header is the way to go.

> While I agree gnus should ideally convert the file to the system encoding
> (originally in iso-8859-1 over quoted-printable in my case), it doesn't, and it
> never had to.  Any chance of bringing back the charset auto-detection?
What kind of auto-detection are you talking about? It uses charset from either http or (x)html header. Is there something like static analyse of the document text?

Closing NOTABUG. Feel free to reopen if this is really a bug.

Comment 2 Alexandre Oliva 2008-12-10 03:27:30 UTC
Thanks for your insights.  From what you say, it appears to me that what I mistook for auto-detection was merely a pass-through of encodings, which is precisely what gnus expected. That would be nice to preserve, but I can imagine this would be hard.  I agree it makes sense for it to include charset information taken from MIME headers.  Right now, it doesn't.  FWIW, I've since installed w3m-emacs and w3m, which gnus prefers over links, and this is working just fine.