Red Hat Bugzilla – Bug 126921
xmlto (docbook) doesn't generate UTF8 compatible output
Last modified: 2007-11-30 17:07:02 EST
Description of problem:
When apache displays HTML generated by xmlto (docbook) non-ascii
characters are inserted that apear as black blocks in the browser
(firefox 0.8). No fancy styles, just
xmlto html realmkit.xml
Where my DTD is
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
When apache is configured to "AddDefaultCharset ISO-8859-1" I get the
expected behavior (I see whitespace in the HTML served by apache).
The default setting of "UTF8" I get stuff like:
<td width="20%" align="right">ï¿½<a accesskey="n"
So I'm thinking that xmlto or the docbook dtds are not generating
files that conform to UTF 8.
Version-Release number of selected component (if applicable):
Please try changing line 124 of /usr/bin/xmlto from:
if [ -x /usr/bin/locale ]
Does that help?
It does. The output looks proper with UTF8. However, now there are
similar issues with the Latin1 encoding.
Ã Ã Ã Ã Ã Ã Ã Ã </p></div></div></div>
Can you explain in more detail the issue you now see with Latin1? The
output you get from running xmlto (with the above change) in a UTF-8
locale should be UTF-8, and marked as such with an encoding tag. Is
that not the case?
The patched xmlto output is marked as
It seems that if the above charset doesn't match the
'AddDefaultCharset' Apache configuration option I get weird characters
displayed in my browser.
Apparently the HTTP header charset specification overrules the actual
HTML content (!). So don't use 'AddDefaultCharset'. :-)
I'll make that change in Fedora development (package built as
xmlto-0.0.18-4) and check that it doesn't cause other problems --
before it caused bug #80732 -- and then hopefully it will appear in a
future version of Red Hat Enterprise Linux.