Bug 126921 - xmlto (docbook) doesn't generate UTF8 compatible output
Summary: xmlto (docbook) doesn't generate UTF8 compatible output
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: xmlto
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Tim Waugh
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-06-29 02:45 UTC by Jack Neely
Modified: 2007-11-30 22:07 UTC (History)
0 users

Fixed In Version: 0.0.18-4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-07-01 11:26:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jack Neely 2004-06-29 02:45:36 UTC
Description of problem:
When apache displays HTML generated by xmlto (docbook) non-ascii
characters are inserted that apear as black blocks in the browser
(firefox 0.8).  No fancy styles, just 

   xmlto html realmkit.xml

Where my DTD is

   <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
   "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"

When apache is configured to "AddDefaultCharset ISO-8859-1" I get the
expected behavior (I see whitespace in the HTML served by apache). 
The default setting of "UTF8" I get stuff like:

<td width="20%" align="right">�<a accesskey="n"
href="pr01.html">Next</a></td>

So I'm thinking that xmlto or the docbook dtds are not generating
files that conform to UTF 8.

Version-Release number of selected component (if applicable):
httpd-2.0.46-32.ent
xmlto-0.0.14-3
docbook-utils-0.6.13-5
docbook-dtds-1.0-17.2
docbook-style-xsl-1.61.2-2
docbook-utils-pdf-0.6.13-5
docbook-style-dsssl-1.76-8

Comment 1 Tim Waugh 2004-06-29 08:14:58 UTC
Please try changing line 124 of /usr/bin/xmlto from:

if false;

to:

if [ -x /usr/bin/locale ]

Does that help?

Comment 2 Jack Neely 2004-06-29 14:01:08 UTC
It does.  The output looks proper with UTF8.  However, now there are
similar issues with the Latin1 encoding.

à à à à à à à à </p></div></div></div>

Comment 3 Tim Waugh 2004-06-29 14:05:03 UTC
Can you explain in more detail the issue you now see with Latin1?  The
output you get from running xmlto (with the above change) in a UTF-8
locale should be UTF-8, and marked as such with an encoding tag.  Is
that not the case?

Comment 4 Jack Neely 2004-06-29 14:17:05 UTC
The patched xmlto output is marked as
                                                                     
          
   content="text/html; charset=UTF-8"
                                                                     
          
It seems that if the above charset doesn't match the
'AddDefaultCharset' Apache configuration option I get weird characters
displayed in my browser.

Comment 5 Tim Waugh 2004-07-01 11:26:08 UTC
Apparently the HTTP header charset specification overrules the actual
HTML content (!).  So don't use 'AddDefaultCharset'. :-)

I'll make that change in Fedora development (package built as
xmlto-0.0.18-4) and check that it doesn't cause other problems --
before it caused bug #80732 -- and then hopefully it will appear in a
future version of Red Hat Enterprise Linux.


Note You need to log in before you can comment on or make changes to this bug.