Bug 126921 - xmlto (docbook) doesn't generate UTF8 compatible output
xmlto (docbook) doesn't generate UTF8 compatible output
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: xmlto (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Tim Waugh
Depends On:
  Show dependency treegraph
Reported: 2004-06-28 22:45 EDT by Jack Neely
Modified: 2007-11-30 17:07 EST (History)
0 users

See Also:
Fixed In Version: 0.0.18-4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-07-01 07:26:08 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Jack Neely 2004-06-28 22:45:36 EDT
Description of problem:
When apache displays HTML generated by xmlto (docbook) non-ascii
characters are inserted that apear as black blocks in the browser
(firefox 0.8).  No fancy styles, just 

   xmlto html realmkit.xml

Where my DTD is

   <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"

When apache is configured to "AddDefaultCharset ISO-8859-1" I get the
expected behavior (I see whitespace in the HTML served by apache). 
The default setting of "UTF8" I get stuff like:

<td width="20%" align="right">�<a accesskey="n"

So I'm thinking that xmlto or the docbook dtds are not generating
files that conform to UTF 8.

Version-Release number of selected component (if applicable):
Comment 1 Tim Waugh 2004-06-29 04:14:58 EDT
Please try changing line 124 of /usr/bin/xmlto from:

if false;


if [ -x /usr/bin/locale ]

Does that help?
Comment 2 Jack Neely 2004-06-29 10:01:08 EDT
It does.  The output looks proper with UTF8.  However, now there are
similar issues with the Latin1 encoding.

Comment 3 Tim Waugh 2004-06-29 10:05:03 EDT
Can you explain in more detail the issue you now see with Latin1?  The
output you get from running xmlto (with the above change) in a UTF-8
locale should be UTF-8, and marked as such with an encoding tag.  Is
that not the case?
Comment 4 Jack Neely 2004-06-29 10:17:05 EDT
The patched xmlto output is marked as
   content="text/html; charset=UTF-8"
It seems that if the above charset doesn't match the
'AddDefaultCharset' Apache configuration option I get weird characters
displayed in my browser.
Comment 5 Tim Waugh 2004-07-01 07:26:08 EDT
Apparently the HTTP header charset specification overrules the actual
HTML content (!).  So don't use 'AddDefaultCharset'. :-)

I'll make that change in Fedora development (package built as
xmlto-0.0.18-4) and check that it doesn't cause other problems --
before it caused bug #80732 -- and then hopefully it will appear in a
future version of Red Hat Enterprise Linux.

Note You need to log in before you can comment on or make changes to this bug.