Bug 107185 - national caracters mismatch
national caracters mismatch
Status: CLOSED WORKSFORME
Product: Red Hat Linux
Classification: Retired
Component: basesystem (Show other bugs)
9
All Linux
medium Severity medium
: ---
: ---
Assigned To: Bill Nottingham
Mike McLean
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-10-15 14:14 EDT by ketil vestby
Modified: 2014-03-16 22:39 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-01-21 18:08:37 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description ketil vestby 2003-10-15 14:14:26 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225

Description of problem:
After installing / upgrading from 7.3 to 9, programs like w3m, Lynx and 
Konqueror have lost the posibilities to show the Norwegian caracters �, 
� and � correctly in webpages.  It also seemes to be problems when 
looking at, or moving, files with the national caracters in the 
filenames - especially if I move them to and from computers running 
other Linux distros (Debian and SuSE).

This seemes to be a problem with the initial setup, it affects programs 
that are able to show UniCode.

For an example, take a look at http://ketil.homeunix.net/5th/test.html 
in the three browsers

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Open the browsers Konqueror, w3m or Lynx
2.Look at http://ketil.homeunix.net/5th/test.html
3.Notice the difference between the first and second set of national caracters
    

Actual Results:  On Konqueror and Lynx, the first set is garbled, the second
shows fine
On w3m it is opposite: The first set shows fine, the second is garbled

Expected Results:  Both text should be fine

Additional info:

This is a RedHat - spesific bug, affecting all countries with national 
caracter sets. This is blocking all serious use of all RedHat distros in 
my opinion, and probably many others in all countries with national 
caracter sets.

The problem is that it did not happend in any version of RedHat before 
RH8, and it is nonexistent in competing distros like SuSE, ManDrake or 
Debian.
Comment 1 ketil vestby 2003-10-15 14:17:57 EDT
The missing lines in this bugreport is due to Mozillas inability to forward
anything written in textboxes, with the national caracters in the text.

I find it hard to believe that this bug has not been entered before, but I did
not get anything remotely related to it when entering the word "national" in
Bugzilla.
Comment 2 Bill Nottingham 2003-10-15 15:17:45 EDT
Your web page doesn't define the charset in use. Without that, there's no way to
guess what you're intending.
Comment 3 ketil vestby 2003-10-15 17:53:42 EDT
The charset in use is the default charset I got when installing the distro. In 
addition do the first tag, <?xml version="1.0" encoding="utf-8"?> tell about the 
charset used. If you change URL from .../5th/... to .../4th/... you will get the 
page with start - tag defining iso-8859-1 (and a slightly different layout).

The charset I wrote the examples with is RedHats default charset, I am intending
to get the same functionality I had in the pre 8 - distros: Ability to use the
Norwegian chars.

What I try to say is that in RH8 and 9 there is a mismatch somewhere, the 
missing line in the initial report is just one of the symptoms, I think the 
problem is in a really fundamental place.

It looks as if there is a US charset lurking somewhere, causing problems.

I want to be able to use RH9 for simple things as viewing webpages with w3m or 
Konqueror, or downloading files with GFtp, without having conversation - 
problems because I read, write and save files using Norwegian chars.

The other distros can it, and RH could do it previously.

To tell in small portions:
This problem is not related to a specific product, it is related to a charset -
mismatch or misconfiguration in the 8 and 9 versions of RedHat. If I use
norwegian chars in filenames the name get destroyed if I try to transfer the
file to or from any computer running RedHat 8 or 9.
The problem I tried to illustrate in HTML is related to a problem in handling
and viewing information in these two distros - I got a document from a MAC
yesterday, by mail, and all words with Norwegian chars in them where messed up
because of this problem.

The differences between the two examples in the HTML - document was for
describing the symptom.

On the line below is the three Norewgian chars:
æøå
Comment 4 Bill Nottingham 2003-10-15 18:28:29 EDT
I don't think xml charset tags are relevant to HTML files; you need meta charset
tags in the <head> section of the document. For example:

<head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>

The default charset in Red Hat Linux 8 and later is UTF-8. If you try to treat
UTF-8 encoded text, or filenames, or whatever, as ISO8859-1, it *will* look
different.
Comment 5 ketil vestby 2003-10-16 00:52:49 EDT
If that were true, as RedHat states in the release notes and manuals, adding 
the tag will make both the upper and lower example of HTML look the same.  
It dont. I have added the tag to the .../5th/... example, but it still looks 
as bad as before. 
 
I still believe that there seemes to be another charset, or a limitation of 
the utf-8 charset lurking somewhere. If not, the two examples would not 
differ. 
Comment 6 Bill Nottingham 2003-10-16 01:00:11 EDT
$ telnet ketil.homeunix.net 80
Trying 193.217.190.197...
Connected to ketil.homeunix.net (193.217.190.197).
Escape character is '^]'.
HEAD http://ketil.homeunix.net/5th/test.html HTTP/1.0
 
HTTP/1.1 200 OK
Date: Thu, 16 Oct 2003 04:59:28 GMT
Server: Apache/1.3.26 (Unix) Debian GNU/Linux mod_layout/3.0.3
Last-Modified: Wed, 15 Oct 2003 17:57:18 GMT
ETag: "180a-12e-3f8d8a7e"
Accept-Ranges: bytes
Content-Length: 302
Connection: close
Content-Type: text/html; charset=iso-8859-1


Your web server is adding the charset, it appears.

Note You need to log in before you can comment on or make changes to this bug.