Red Hat Bugzilla – Bug 107185
national caracters mismatch
Last modified: 2014-03-16 22:39:33 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225
Description of problem:
After installing / upgrading from 7.3 to 9, programs like w3m, Lynx and
Konqueror have lost the posibilities to show the Norwegian caracters ï¿½,
ï¿½ and ï¿½ correctly in webpages. It also seemes to be problems when
looking at, or moving, files with the national caracters in the
filenames - especially if I move them to and from computers running
other Linux distros (Debian and SuSE).
This seemes to be a problem with the initial setup, it affects programs
that are able to show UniCode.
For an example, take a look at http://ketil.homeunix.net/5th/test.html
in the three browsers
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Open the browsers Konqueror, w3m or Lynx
2.Look at http://ketil.homeunix.net/5th/test.html
3.Notice the difference between the first and second set of national caracters
Actual Results: On Konqueror and Lynx, the first set is garbled, the second
On w3m it is opposite: The first set shows fine, the second is garbled
Expected Results: Both text should be fine
This is a RedHat - spesific bug, affecting all countries with national
caracter sets. This is blocking all serious use of all RedHat distros in
my opinion, and probably many others in all countries with national
The problem is that it did not happend in any version of RedHat before
RH8, and it is nonexistent in competing distros like SuSE, ManDrake or
The missing lines in this bugreport is due to Mozillas inability to forward
anything written in textboxes, with the national caracters in the text.
I find it hard to believe that this bug has not been entered before, but I did
not get anything remotely related to it when entering the word "national" in
Your web page doesn't define the charset in use. Without that, there's no way to
guess what you're intending.
The charset in use is the default charset I got when installing the distro. In
addition do the first tag, <?xml version="1.0" encoding="utf-8"?> tell about the
charset used. If you change URL from .../5th/... to .../4th/... you will get the
page with start - tag defining iso-8859-1 (and a slightly different layout).
The charset I wrote the examples with is RedHats default charset, I am intending
to get the same functionality I had in the pre 8 - distros: Ability to use the
What I try to say is that in RH8 and 9 there is a mismatch somewhere, the
missing line in the initial report is just one of the symptoms, I think the
problem is in a really fundamental place.
It looks as if there is a US charset lurking somewhere, causing problems.
I want to be able to use RH9 for simple things as viewing webpages with w3m or
Konqueror, or downloading files with GFtp, without having conversation -
problems because I read, write and save files using Norwegian chars.
The other distros can it, and RH could do it previously.
To tell in small portions:
This problem is not related to a specific product, it is related to a charset -
mismatch or misconfiguration in the 8 and 9 versions of RedHat. If I use
norwegian chars in filenames the name get destroyed if I try to transfer the
file to or from any computer running RedHat 8 or 9.
The problem I tried to illustrate in HTML is related to a problem in handling
and viewing information in these two distros - I got a document from a MAC
yesterday, by mail, and all words with Norwegian chars in them where messed up
because of this problem.
The differences between the two examples in the HTML - document was for
describing the symptom.
On the line below is the three Norewgian chars:
I don't think xml charset tags are relevant to HTML files; you need meta charset
tags in the <head> section of the document. For example:
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
The default charset in Red Hat Linux 8 and later is UTF-8. If you try to treat
UTF-8 encoded text, or filenames, or whatever, as ISO8859-1, it *will* look
If that were true, as RedHat states in the release notes and manuals, adding
the tag will make both the upper and lower example of HTML look the same.
It dont. I have added the tag to the .../5th/... example, but it still looks
as bad as before.
I still believe that there seemes to be another charset, or a limitation of
the utf-8 charset lurking somewhere. If not, the two examples would not
$ telnet ketil.homeunix.net 80
Connected to ketil.homeunix.net (18.104.22.168).
Escape character is '^]'.
HEAD http://ketil.homeunix.net/5th/test.html HTTP/1.0
HTTP/1.1 200 OK
Date: Thu, 16 Oct 2003 04:59:28 GMT
Server: Apache/1.3.26 (Unix) Debian GNU/Linux mod_layout/3.0.3
Last-Modified: Wed, 15 Oct 2003 17:57:18 GMT
Content-Type: text/html; charset=iso-8859-1
Your web server is adding the charset, it appears.