Description of problem: When apache displays HTML generated by xmlto (docbook) non-ascii characters are inserted that apear as black blocks in the browser (firefox 0.8). No fancy styles, just xmlto html realmkit.xml Where my DTD is <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" When apache is configured to "AddDefaultCharset ISO-8859-1" I get the expected behavior (I see whitespace in the HTML served by apache). The default setting of "UTF8" I get stuff like: <td width="20%" align="right">�<a accesskey="n" href="pr01.html">Next</a></td> So I'm thinking that xmlto or the docbook dtds are not generating files that conform to UTF 8. Version-Release number of selected component (if applicable): httpd-2.0.46-32.ent xmlto-0.0.14-3 docbook-utils-0.6.13-5 docbook-dtds-1.0-17.2 docbook-style-xsl-1.61.2-2 docbook-utils-pdf-0.6.13-5 docbook-style-dsssl-1.76-8
Please try changing line 124 of /usr/bin/xmlto from: if false; to: if [ -x /usr/bin/locale ] Does that help?
It does. The output looks proper with UTF8. However, now there are similar issues with the Latin1 encoding. Ã Ã Ã Ã Ã Ã Ã Ã </p></div></div></div>
Can you explain in more detail the issue you now see with Latin1? The output you get from running xmlto (with the above change) in a UTF-8 locale should be UTF-8, and marked as such with an encoding tag. Is that not the case?
The patched xmlto output is marked as content="text/html; charset=UTF-8" It seems that if the above charset doesn't match the 'AddDefaultCharset' Apache configuration option I get weird characters displayed in my browser.
Apparently the HTTP header charset specification overrules the actual HTML content (!). So don't use 'AddDefaultCharset'. :-) I'll make that change in Fedora development (package built as xmlto-0.0.18-4) and check that it doesn't cause other problems -- before it caused bug #80732 -- and then hopefully it will appear in a future version of Red Hat Enterprise Linux.