Bug 126921 - xmlto (docbook) doesn't generate UTF8 compatible output
Summary: xmlto (docbook) doesn't generate UTF8 compatible output
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: xmlto   
(Show other bugs)
Version: 3.0
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Tim Waugh
QA Contact:
Depends On:
TreeView+ depends on / blocked
Reported: 2004-06-29 02:45 UTC by Jack Neely
Modified: 2007-11-30 22:07 UTC (History)
0 users

Fixed In Version: 0.0.18-4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-07-01 11:26:08 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

Description Jack Neely 2004-06-29 02:45:36 UTC
Description of problem:
When apache displays HTML generated by xmlto (docbook) non-ascii
characters are inserted that apear as black blocks in the browser
(firefox 0.8).  No fancy styles, just 

   xmlto html realmkit.xml

Where my DTD is

   <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"

When apache is configured to "AddDefaultCharset ISO-8859-1" I get the
expected behavior (I see whitespace in the HTML served by apache). 
The default setting of "UTF8" I get stuff like:

<td width="20%" align="right">�<a accesskey="n"

So I'm thinking that xmlto or the docbook dtds are not generating
files that conform to UTF 8.

Version-Release number of selected component (if applicable):

Comment 1 Tim Waugh 2004-06-29 08:14:58 UTC
Please try changing line 124 of /usr/bin/xmlto from:

if false;


if [ -x /usr/bin/locale ]

Does that help?

Comment 2 Jack Neely 2004-06-29 14:01:08 UTC
It does.  The output looks proper with UTF8.  However, now there are
similar issues with the Latin1 encoding.

à à à à à à à à </p></div></div></div>

Comment 3 Tim Waugh 2004-06-29 14:05:03 UTC
Can you explain in more detail the issue you now see with Latin1?  The
output you get from running xmlto (with the above change) in a UTF-8
locale should be UTF-8, and marked as such with an encoding tag.  Is
that not the case?

Comment 4 Jack Neely 2004-06-29 14:17:05 UTC
The patched xmlto output is marked as
   content="text/html; charset=UTF-8"
It seems that if the above charset doesn't match the
'AddDefaultCharset' Apache configuration option I get weird characters
displayed in my browser.

Comment 5 Tim Waugh 2004-07-01 11:26:08 UTC
Apparently the HTTP header charset specification overrules the actual
HTML content (!).  So don't use 'AddDefaultCharset'. :-)

I'll make that change in Fedora development (package built as
xmlto-0.0.18-4) and check that it doesn't cause other problems --
before it caused bug #80732 -- and then hopefully it will appear in a
future version of Red Hat Enterprise Linux.

Note You need to log in before you can comment on or make changes to this bug.