Bug 1018659

Summary: Stray characters in DocBook 5 output
Product: [Community] Publican Reporter: Ruediger Landmann <rlandman>
Component: publicanAssignee: Jeff Fearn 🐞 <jfearn>
Status: CLOSED CURRENTRELEASE QA Contact: tools-bugs <tools-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: futureCC: aigao, rlandman
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.0.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-19 02:46:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Ruediger Landmann 2013-10-14 06:07:32 UTC
Description of problem:
DocBook 5 books contain multiple stray "Â" characters in html and html-single ouput

Version-Release number of selected component (if applicable):
publican-3.9.9-0.fc19.t6.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create a DocBook 5 book
2. Build html, html-single, pdf, and epub versions


Actual results:
html and html-single versions contain stray "Â" characters around where DocBook gentext has been used, for example:

1. Document Conventions

1.1. Typographic Conventions

Copyright © 2013 | 


Expected results:
No stray "Â" characters 

Additional info:
This doesn't affect the PDF or EPUB

Comment 1 Jeff Fearn 🐞 2013-10-15 00:58:00 UTC
Link to source I can test with?

Comment 2 Ruediger Landmann 2013-10-15 03:23:12 UTC
(In reply to Jeff Fearn from comment #1)
> Link to source I can test with?

This appeared in a brand-new book; so:

$ publican create_book --title "DB5 Test Book" --dtdver 5
$ cd DB5_Test_Book/
$ publican build --formats html-single,html,pdf,epub --langs en-US

Comment 3 Jeff Fearn 🐞 2013-10-16 04:46:35 UTC
On Fedora 19 using the 1.78.1 docbook styles the combination of using xhtml5 output and setting the html.ext param triggers an issue.

If html.ext is set to '.html' the '.' appears to trigger an issue where a broken UTF8 character is introduced in to the output stream.

Switching from the xhtml5 styles to the xhtml-1_1 styles does not change the output, it's still broken.

Switching from the xhtml5 styles to the xhtml styles does change the output, it now renders correctly. The output is HTML4 instead of HTML5.

When using xhtml5 if you set html.ext to 'html', i.e. simply drop the '.', then the output renders correctly. The file names are invalid.

On RHEL6 using the 1.78.1 styles sheets this issues is not present and the combination of xhtml5 and setting html.ext to '.html' works correctly.

This is likely cause somewhere in the libxslt stack on Fedora, it could take a considerable time to debug.

Comment 4 HSS Product Manager 2013-10-16 04:47:31 UTC
HSS-QE has reviewed and declined this request. QE for this bug will be handled by IED.

Comment 5 Jeff Fearn 🐞 2013-10-16 05:13:55 UTC
It appears that setting html.ext to anything besides '.html' or '.htm' will avoid triggering this issue on Fedora 19.

Tested successfully with all these: '.xhtml' '.gtml' '.ht' '.htmll' '.1234' .wtf'

Comment 6 Ruediger Landmann 2013-10-17 07:02:59 UTC
Retested with  publican-3.9.9-0.fc19.t11.noarch with perl-XML-TreeBuilder-5.0_1-0.fc19.noarch -- stray characters still appear :(

Comment 7 Jeff Fearn 🐞 2013-10-18 06:17:57 UTC
Setting Content-Type in a meta tag resolves this ...


To ssh://git.fedorahosted.org/git/publican.git
   181eacd..f8b7701  devel -> devel

Comment 8 Ruediger Landmann 2013-10-19 00:12:09 UTC
Fixed in publican-3.9.9-0.fc19.t14.noarch