Bug 1018659 - Stray characters in DocBook 5 output
Summary: Stray characters in DocBook 5 output
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Publican
Classification: Community
Component: publican
Version: future
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jeff Fearn 🐞
QA Contact: tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-14 06:07 UTC by Ruediger Landmann
Modified: 2013-12-19 02:46 UTC (History)
2 users (show)

Fixed In Version: 4.0.0
Clone Of:
Environment:
Last Closed: 2013-12-19 02:46:26 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1018024 0 unspecified CLOSED Cannot build Indic-language PDFs with Publican 3.99 2021-02-22 00:41:40 UTC

Internal Links: 1018024

Description Ruediger Landmann 2013-10-14 06:07:32 UTC
Description of problem:
DocBook 5 books contain multiple stray "Â" characters in html and html-single ouput

Version-Release number of selected component (if applicable):
publican-3.9.9-0.fc19.t6.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create a DocBook 5 book
2. Build html, html-single, pdf, and epub versions


Actual results:
html and html-single versions contain stray "Â" characters around where DocBook gentext has been used, for example:

1. Document Conventions

1.1. Typographic Conventions

Copyright © 2013 | 


Expected results:
No stray "Â" characters 

Additional info:
This doesn't affect the PDF or EPUB

Comment 1 Jeff Fearn 🐞 2013-10-15 00:58:00 UTC
Link to source I can test with?

Comment 2 Ruediger Landmann 2013-10-15 03:23:12 UTC
(In reply to Jeff Fearn from comment #1)
> Link to source I can test with?

This appeared in a brand-new book; so:

$ publican create_book --title "DB5 Test Book" --dtdver 5
$ cd DB5_Test_Book/
$ publican build --formats html-single,html,pdf,epub --langs en-US

Comment 3 Jeff Fearn 🐞 2013-10-16 04:46:35 UTC
On Fedora 19 using the 1.78.1 docbook styles the combination of using xhtml5 output and setting the html.ext param triggers an issue.

If html.ext is set to '.html' the '.' appears to trigger an issue where a broken UTF8 character is introduced in to the output stream.

Switching from the xhtml5 styles to the xhtml-1_1 styles does not change the output, it's still broken.

Switching from the xhtml5 styles to the xhtml styles does change the output, it now renders correctly. The output is HTML4 instead of HTML5.

When using xhtml5 if you set html.ext to 'html', i.e. simply drop the '.', then the output renders correctly. The file names are invalid.

On RHEL6 using the 1.78.1 styles sheets this issues is not present and the combination of xhtml5 and setting html.ext to '.html' works correctly.

This is likely cause somewhere in the libxslt stack on Fedora, it could take a considerable time to debug.

Comment 4 HSS Product Manager 2013-10-16 04:47:31 UTC
HSS-QE has reviewed and declined this request. QE for this bug will be handled by IED.

Comment 5 Jeff Fearn 🐞 2013-10-16 05:13:55 UTC
It appears that setting html.ext to anything besides '.html' or '.htm' will avoid triggering this issue on Fedora 19.

Tested successfully with all these: '.xhtml' '.gtml' '.ht' '.htmll' '.1234' .wtf'

Comment 6 Ruediger Landmann 2013-10-17 07:02:59 UTC
Retested with  publican-3.9.9-0.fc19.t11.noarch with perl-XML-TreeBuilder-5.0_1-0.fc19.noarch -- stray characters still appear :(

Comment 7 Jeff Fearn 🐞 2013-10-18 06:17:57 UTC
Setting Content-Type in a meta tag resolves this ...


To ssh://git.fedorahosted.org/git/publican.git
   181eacd..f8b7701  devel -> devel

Comment 8 Ruediger Landmann 2013-10-19 00:12:09 UTC
Fixed in publican-3.9.9-0.fc19.t14.noarch


Note You need to log in before you can comment on or make changes to this bug.