Red Hat Bugzilla – Bug 135821
FC4: update default charset comment and remove AddCharsets
Last modified: 2007-11-30 17:10:51 EST
Description of problem:
The httpd.conf as distributed with FC2 contains a number of errors in
1. It sets the default character set to UTF-8:
This violates the W3C standard (as indicated in the comment above it).
2. It defines aliases for ISO-8859-1, -2, -3, and so on, as Latin1, 2,
3, and so on. Although this is valid for some, this does not hold for
all. In particular, Latin9 is equivalent to ISO-8859-15, not -9.
1. Using a default charset of UTF-8 does not violate any standard.
The comment just states that if you do *not* specify a charset,
browsers are required to presume the charset is ISO-8859-1.
2. Yes, thanks for the report, this was fixed upstream recently.
There was discussion about removing these lines completely, too: do
you actually use (or want to use) files named .latinX, etc?
1. Right, although the comment "merely stating the obvious" gives me
the impression that the intention was to specify iso-8859-1 instead.
2. No, I wouldn't. I'd prefer to use a suitable <meta
http-equiv="content-type" content="text/html; charset=iso8859-15">
(and have it transferred a as a real 'Content-Type' header) instead.
Thanks for the good work!
OK, thanks. We're past the point of making changes to the default
config for FC3 now, and this is not really critical, so this change
will be deferred to FC4.
So for FC4 we will
1. update the comment above AddDefaultCharset to be less confusing
w.r.t the UTF-8 default
2. remove or comment-out all the AddCharset lines which nobody really
I'm afraid there's more to it.
As W3C states, a HTML document must be assumed to be ISO-8859-1 unless
specified otherwise. With the "AddDefaultCharset UTF8" setting, such a
document will be served by Apache with a "Content-Type: text/html;
charset=utf8" heading. In other words, Apache enforces it to be
interpreted as a UTF8 document, which I think is not in accordance
with the W3C guidelines.
Moreover, browsers seems to give priority to the server Content-Type
header over a <meta http-equiv="content-type" ...>. This makes it
impossibe for a document to specify its own character set. In the
current situation, all plain .html documents treated as UTF8, and I
really think this is not correct.
Therefore I think that _IF_ a default charset is used, it can only be
ISO-8859-1, _AND_ if a document specifies a charset in a <meta> tag,
this tag must override the default charset wich implies it must be
'promoted' to a real Content-Type header (otherwise browsers will
ignore the <meta> tag due to the explicit Content-Type header).
Another option is to remove the AddDefaultCharset setting completely,
in which case the Content-Type header will be "text/html" (without a
charset) and everything works as it should.
You seem to be arguing that there is a W3C spec which states that it
is incorrect to send "Content-Type: text/xml; charset=utf-8" by
default. This is not the case; please find a specific spec reference
to back this up.
Not sending a default charset can allow some cross-site-scripting
http://www.cert.org/tech_tips/malicious_code_mitigation.html, that is
why we must present a default charset.
1) we must present a default charset, and
2) applications by default run in a UTF-8 locale in Fedora Core, and
hence newly created content is UTF-8
the only sane AddDefaultCharset setting for httpd.conf in Fedora Core
Yes, browsers are required to honour the charset in the Content-Type
header over a META tag, that's in the HTML spec. This means you need
to change the charset content-type header to match your content,
either globally in httpd.conf or locally in an .htaccess file.
The current default charset will of course be wrong if your content is
all encoded as ISO-8859-1. And a default charset of ISO-8859-1 will
be wrong if your content is all UTF-8. That's why it's configurable.
I've been just bitten by this. And I must say I don't like it (CERT
tip or not). I have bunch of ISO-8859-2 pages, all were using the
meta tag to specify the charset. When I moved them to my FC2 machine,
all of the sudden things were broken. By shear luck I found that
Apache was configured to force charset of all pages to UTF-8.
Argh!!!!! Needless to say, the line was commented out immidiately.
IMHO, it is up to the developer of program that delivers dynamic
content to specify charset. Not for the server to enforce it on
$SUBJECT is done now for FC4.