Bug 725203 - Lynx assumes iso-8859-1_for local files
Summary: Lynx assumes iso-8859-1_for local files
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: lynx
Version: 15
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kamil Dudka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-24 05:08 UTC by Benjamin Blanco
Modified: 2011-07-25 16:19 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-25 16:19:36 UTC
Type: ---


Attachments (Terms of Use)
Small test document (74 bytes, text/html)
2011-07-24 05:08 UTC, Benjamin Blanco
no flags Details

Description Benjamin Blanco 2011-07-24 05:08:14 UTC
Created attachment 514898 [details]
Small test document

Description of problem:
Lynx assumes an encoding of iso-8859-1 for local files, so files which use another encoding (most files on a modern system, right?) may not display correctly.

Version-Release number of selected component (if applicable):
1.8.7-7.fc15

How reproducible:
Always

Steps to Reproduce:
1. Open a local utf-8 encoded file with lynx
2. Observe how certain characters are displayed
3. Open the same file with the following magic incantation:
     lynx -assume_local_charset=utf-8 somefile.html
4. Observe how certain characters are displayed differently
  
Actual results:
Wrong document encoding assumed, thus certain characters,
such as ←, are garbled, like this: â†�

Expected results:
Local documents should have their encoding detected rather than assumed, or lynx should assume that local documents are utf-8 encoded.

Additional info:
Attached is a small utf-8 encoded html document. Download it to your computer before opening it to test lynx.

Comment 1 Kamil Dudka 2011-07-25 16:19:36 UTC
lynx is not supposed to compute frequency analysis of the given text and guess encoding from its result.  You need to provide some metadata specifying which encoding to use, something as easy as:

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    ...
  ...
...


Note You need to log in before you can comment on or make changes to this bug.