Bug 135821 - FC4: update default charset comment and remove AddCharsets
Summary: FC4: update default charset comment and remove AddCharsets
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: httpd
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Joe Orton
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: FC4Target
TreeView+ depends on / blocked
 
Reported: 2004-10-15 09:19 UTC by Johan Vromans
Modified: 2007-11-30 22:10 UTC (History)
0 users

Fixed In Version: 2.0.53-6
Clone Of:
Environment:
Last Closed: 2005-03-29 13:06:57 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Johan Vromans 2004-10-15 09:19:37 UTC
Description of problem:

The httpd.conf as distributed with FC2 contains a number of errors in
character sets.

1. It sets the default character set to UTF-8:

   AddDefaultCharset UTF-8

This violates the W3C standard (as indicated in the comment above it).

2. It defines aliases for ISO-8859-1, -2, -3, and so on, as Latin1, 2,
3, and so on. Although this is valid for some, this does not hold for
all. In particular, Latin9 is equivalent to ISO-8859-15, not -9.

Comment 1 Joe Orton 2004-10-15 09:58:48 UTC
1. Using a default charset of UTF-8 does not violate any standard. 
The comment just states that if you do *not* specify a charset,
browsers are required to presume the charset is ISO-8859-1.

2. Yes, thanks for the report, this was fixed upstream recently. 
There was discussion about removing these lines completely, too: do
you actually use (or want to use) files named .latinX, etc?


Comment 2 Johan Vromans 2004-10-15 14:55:43 UTC
1. Right, although the comment "merely stating the obvious" gives me
the impression that the intention was to specify iso-8859-1 instead.

2. No, I wouldn't. I'd prefer to use a suitable <meta
http-equiv="content-type" content="text/html; charset=iso8859-15">
(and have it transferred a as a real 'Content-Type' header) instead. 

Thanks for the good work!

Comment 3 Joe Orton 2004-10-15 15:04:38 UTC
OK, thanks.  We're past the point of making changes to the default
config for FC3 now, and this is not really critical, so this change
will be deferred to FC4.

So for FC4 we will 

1. update the comment above AddDefaultCharset to be less confusing
w.r.t the UTF-8 default

2. remove or comment-out all the AddCharset lines which nobody really
needs anyway.

Comment 4 Johan Vromans 2004-10-16 07:56:50 UTC
I'm afraid there's more to it.

As W3C states, a HTML document must be assumed to be ISO-8859-1 unless
specified otherwise. With the "AddDefaultCharset UTF8" setting, such a
document will be served by Apache with a "Content-Type: text/html;
charset=utf8" heading. In other words, Apache enforces it to be
interpreted as a UTF8 document, which I think is not in accordance
with the W3C guidelines.

Moreover, browsers seems to give priority to the server Content-Type
header over a <meta http-equiv="content-type" ...>. This makes it
impossibe for a document to specify its own character set. In the
current situation, all plain .html documents treated as UTF8, and I
really think this is not correct. 

Therefore I think that _IF_ a default charset is used, it can only be
ISO-8859-1, _AND_ if a document specifies a charset in a <meta> tag,
this tag must override the default charset wich implies it must be
'promoted' to a real Content-Type header (otherwise browsers will
ignore the <meta> tag due to the explicit Content-Type header).

Another option is to remove the AddDefaultCharset setting completely,
in which case the Content-Type header will be "text/html" (without a
charset) and everything works as it should.

Comment 5 Joe Orton 2004-10-17 22:23:53 UTC
You seem to be arguing that there is a W3C spec which states that it
is incorrect to send "Content-Type: text/xml; charset=utf-8" by
default.  This is not the case; please find a specific spec reference
to back this up.

Not sending a default charset can allow some cross-site-scripting
attacks; see
http://www.cert.org/tech_tips/malicious_code_mitigation.html, that is
why we must present a default charset.

Given that:

1) we must present a default charset, and
2) applications by default run in a UTF-8 locale in Fedora Core, and
hence newly created content is UTF-8

the only sane AddDefaultCharset setting for httpd.conf in Fedora Core
is UTF-8.

Yes, browsers are required to honour the charset in the Content-Type
header over a META tag, that's in the HTML spec.  This means you need
 to change the charset content-type header to match your content,
either globally in httpd.conf or locally in an .htaccess file.

The current default charset will of course be wrong if your content is
all encoded as ISO-8859-1.  And a default charset of ISO-8859-1 will
be wrong if your content is all UTF-8.  That's why it's configurable.

Comment 6 Aleksandar Milivojevic 2004-12-25 09:00:54 UTC
I've been just bitten by this.  And I must say I don't like it (CERT
tip or not).  I have bunch of ISO-8859-2 pages, all were using the
meta tag to specify the charset.  When I moved them to my FC2 machine,
all of the sudden things were broken.  By shear luck I found that
Apache was configured to force charset of all pages to UTF-8. 
Argh!!!!!  Needless to say, the line was commented out immidiately.

IMHO, it is up to the developer of program that delivers dynamic
content to specify charset.  Not for the server to enforce it on
*everything*.

Comment 7 Joe Orton 2005-03-29 13:06:57 UTC
$SUBJECT is done now for FC4.


Note You need to log in before you can comment on or make changes to this bug.