Bug 474069 - links -dump can no longer deal with iso-8859-1-encoded files
links -dump can no longer deal with iso-8859-1-encoded files
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: elinks (Show other bugs)
10
All Linux
low Severity medium
: ---
: ---
Assigned To: Ondrej Vasik
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-12-01 18:29 EST by Alexandre Oliva
Modified: 2008-12-09 22:27 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-12-05 08:04:52 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
File encoded in iso-8859-1, that triggers the bug on a utf-8 system (42 bytes, application/octet-stream)
2008-12-01 18:29 EST, Alexandre Oliva
no flags Details

  None (edit)
Description Alexandre Oliva 2008-12-01 18:29:14 EST
Created attachment 325317 [details]
File encoded in iso-8859-1, that triggers the bug on a utf-8 system

Description of problem:
The gnus Emacs-based mail reader uses (as the default option) "links -dump file" to decode text/html parts.  links as in elinks-0.11.4-1.fc9 would correctly decode such files as the attached, whereas version 0.12-0.6.pre2.fc10 will make a mess with it.

While I agree gnus should ideally convert the file to the system encoding (originally in iso-8859-1 over quoted-printable in my case), it doesn't, and it never had to.  Any chance of bringing back the charset auto-detection?

Version-Release number of selected component (if applicable):
elinks-0.12-0.6.pre2.fc10

How reproducible:
Every time

Steps to Reproduce:
1.links -dump attached-file
  
Actual results:
   Oi Oliv

   ��o temos

od -c:

0000000               O   i       O   l   i   v  \n  \n             357
0000020 277 275 357 277 275   o       t   e   m   o   s  \n


Expected results:
   Oi Oliv�o

   N�o temos

0000000               O   i       O   l   i   v 343   o  \n  \n        
0000020       N 343   o       t   e   m   o   s  \n
0000033


Additional info:
Comment 1 Kamil Dudka 2008-12-05 08:04:52 EST
(In reply to comment #0)
Thank you for the report.

> Created an attachment (id=325317) [details]
> File encoded in iso-8859-1, that triggers the bug on a utf-8 system
This file lacks information about charset used, try surround it with html tag as it is a html document and insert html header before body like this:
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>

> Description of problem:
> The gnus Emacs-based mail reader uses (as the default option) "links -dump
> file" to decode text/html parts.  links as in elinks-0.11.4-1.fc9 would
> correctly decode such files as the attached, whereas version 0.12-0.6.pre2.fc10
> will make a mess with it.
0.12 branch of elinks became utf-8 ready and it uses utf-8 charset for documents with unspecified charset (nowadays popular charset on internet). I have no idea how the gnus Emacs-based mail reader works. But each non ASCII mail has information about the charset in its header like this:
Content-Type: text/plain; charset="utf-8"

I think passing this to elinks in HTML header is the way to go.

> While I agree gnus should ideally convert the file to the system encoding
> (originally in iso-8859-1 over quoted-printable in my case), it doesn't, and it
> never had to.  Any chance of bringing back the charset auto-detection?
What kind of auto-detection are you talking about? It uses charset from either http or (x)html header. Is there something like static analyse of the document text?

Closing NOTABUG. Feel free to reopen if this is really a bug.
Comment 2 Alexandre Oliva 2008-12-09 22:27:30 EST
Thanks for your insights.  From what you say, it appears to me that what I mistook for auto-detection was merely a pass-through of encodings, which is precisely what gnus expected. That would be nice to preserve, but I can imagine this would be hard.  I agree it makes sense for it to include charset information taken from MIME headers.  Right now, it doesn't.  FWIW, I've since installed w3m-emacs and w3m, which gnus prefers over links, and this is working just fine.

Note You need to log in before you can comment on or make changes to this bug.