Bug 143323 - set charset UTF-8 or other charset according to user's locale to html generating by xmlto
set charset UTF-8 or other charset according to user's locale to html generat...
Product: Fedora
Classification: Fedora
Component: xmlto (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Tim Waugh
: FutureFeature
Depends On:
  Show dependency treegraph
Reported: 2004-12-18 22:53 EST by Shun Fukuzawa
Modified: 2007-11-30 17:10 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-12-23 05:02:13 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
displayed by a browser. (8.99 KB, image/png)
2004-12-23 02:55 EST, Tadashi Jokagi
no flags Details

  None (edit)
Description Shun Fukuzawa 2004-12-18 22:53:11 EST
Description of problem:
xmlto always generates html files having iso-8859-1 charset from xml.
but non iso-8859-1 environment, we usually make xml using by UFT-8 or
other charset.
please correct xslt to make suitable charset from xml.
and non-ASCII charactors are replaced by character entities when using
xmlto. please correct also.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.write xml using non iso-8859-1 charset(ex.Japanese)
2.make html from above xml by xmlto.
Actual results:
html has wrong charset
and all non-ASCII charactors become character entities.

Expected results:
set suitable charset(UTF-8 or others) to html.

Additional info:
xmlto uses docbook-style-xsl.
Comment 1 Shun Fukuzawa 2004-12-18 22:56:15 EST
> character entities.
-> character entity references
Comment 2 Tim Waugh 2004-12-20 07:41:10 EST
So the character encoding is correct for the document in that it accurately
describes the character encoding it contains, but it's not the best (most
efficient or appropriate) encoding to use -- is that right?
Comment 3 Shun Fukuzawa 2004-12-22 11:31:26 EST

I agree with you. I hope the below example
<?xml version = '1.0' encoding = 'UTF-8'?>

  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Comment 4 Tadashi Jokagi 2004-12-23 02:53:09 EST
No good.
Following URL is translation of fedora-docs.
Japanese will become a numerical entity if xmlto is used.

class="copyright">&#35069;&#20316;&#33879;&#20316; ゥ 2003 Red Hat, 
Inc., Tammy Fox</p></d

Comment 5 Tadashi Jokagi 2004-12-23 02:55:25 EST
Created attachment 109065 [details]
displayed by a browser.
Comment 7 Tim Waugh 2004-12-23 05:02:13 EST
Tadashi: I think we are all in agreement here.  Comment #3 describes the desired
output charset.

However, just because a document is encoded in UTF-8 for input should not
dictate the output charset: I think LC_CTYPE should be used for that.

In fact, in later releases, it is: see bug #126921.

Note You need to log in before you can comment on or make changes to this bug.