Bug 725203

Summary: Lynx assumes iso-8859-1_for local files
Product: [Fedora] Fedora Reporter: Benjamin Blanco <benjo316_2003>
Component: lynxAssignee: Kamil Dudka <kdudka>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 15CC: kdudka
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-25 16:19:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Small test document none

Description Benjamin Blanco 2011-07-24 05:08:14 UTC
Created attachment 514898 [details]
Small test document

Description of problem:
Lynx assumes an encoding of iso-8859-1 for local files, so files which use another encoding (most files on a modern system, right?) may not display correctly.

Version-Release number of selected component (if applicable):
1.8.7-7.fc15

How reproducible:
Always

Steps to Reproduce:
1. Open a local utf-8 encoded file with lynx
2. Observe how certain characters are displayed
3. Open the same file with the following magic incantation:
     lynx -assume_local_charset=utf-8 somefile.html
4. Observe how certain characters are displayed differently
  
Actual results:
Wrong document encoding assumed, thus certain characters,
such as ←, are garbled, like this: â†�

Expected results:
Local documents should have their encoding detected rather than assumed, or lynx should assume that local documents are utf-8 encoded.

Additional info:
Attached is a small utf-8 encoded html document. Download it to your computer before opening it to test lynx.

Comment 1 Kamil Dudka 2011-07-25 16:19:36 UTC
lynx is not supposed to compute frequency analysis of the given text and guess encoding from its result.  You need to provide some metadata specifying which encoding to use, something as easy as:

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    ...
  ...
...