Red Hat Bugzilla – Bug 474069
links -dump can no longer deal with iso-8859-1-encoded files
Last modified: 2008-12-09 22:27:30 EST
Created attachment 325317 [details]
File encoded in iso-8859-1, that triggers the bug on a utf-8 system
Description of problem:
The gnus Emacs-based mail reader uses (as the default option) "links -dump file" to decode text/html parts. links as in elinks-0.11.4-1.fc9 would correctly decode such files as the attached, whereas version 0.12-0.6.pre2.fc10 will make a mess with it.
While I agree gnus should ideally convert the file to the system encoding (originally in iso-8859-1 over quoted-printable in my case), it doesn't, and it never had to. Any chance of bringing back the charset auto-detection?
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.links -dump attached-file
0000000 O i O l i v \n \n 357
0000020 277 275 357 277 275 o t e m o s \n
0000000 O i O l i v 343 o \n \n
0000020 N 343 o t e m o s \n
(In reply to comment #0)
Thank you for the report.
> Created an attachment (id=325317) [details]
> File encoded in iso-8859-1, that triggers the bug on a utf-8 system
This file lacks information about charset used, try surround it with html tag as it is a html document and insert html header before body like this:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
> Description of problem:
> The gnus Emacs-based mail reader uses (as the default option) "links -dump
> file" to decode text/html parts. links as in elinks-0.11.4-1.fc9 would
> correctly decode such files as the attached, whereas version 0.12-0.6.pre2.fc10
> will make a mess with it.
0.12 branch of elinks became utf-8 ready and it uses utf-8 charset for documents with unspecified charset (nowadays popular charset on internet). I have no idea how the gnus Emacs-based mail reader works. But each non ASCII mail has information about the charset in its header like this:
Content-Type: text/plain; charset="utf-8"
I think passing this to elinks in HTML header is the way to go.
> While I agree gnus should ideally convert the file to the system encoding
> (originally in iso-8859-1 over quoted-printable in my case), it doesn't, and it
> never had to. Any chance of bringing back the charset auto-detection?
What kind of auto-detection are you talking about? It uses charset from either http or (x)html header. Is there something like static analyse of the document text?
Closing NOTABUG. Feel free to reopen if this is really a bug.
Thanks for your insights. From what you say, it appears to me that what I mistook for auto-detection was merely a pass-through of encodings, which is precisely what gnus expected. That would be nice to preserve, but I can imagine this would be hard. I agree it makes sense for it to include charset information taken from MIME headers. Right now, it doesn't. FWIW, I've since installed w3m-emacs and w3m, which gnus prefers over links, and this is working just fine.