Bug 1018659 - Stray characters in DocBook 5 output
Stray characters in DocBook 5 output
Product: Publican
Classification: Community
Component: publican (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Jeff Fearn
Depends On:
  Show dependency treegraph
Reported: 2013-10-14 02:07 EDT by Ruediger Landmann
Modified: 2013-12-18 21:46 EST (History)
2 users (show)

See Also:
Fixed In Version: 4.0.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-12-18 21:46:26 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Ruediger Landmann 2013-10-14 02:07:32 EDT
Description of problem:
DocBook 5 books contain multiple stray "Â" characters in html and html-single ouput

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Create a DocBook 5 book
2. Build html, html-single, pdf, and epub versions

Actual results:
html and html-single versions contain stray "Â" characters around where DocBook gentext has been used, for example:

1. Document Conventions

1.1. Typographic Conventions

Copyright © 2013 | 

Expected results:
No stray "Â" characters 

Additional info:
This doesn't affect the PDF or EPUB
Comment 1 Jeff Fearn 2013-10-14 20:58:00 EDT
Link to source I can test with?
Comment 2 Ruediger Landmann 2013-10-14 23:23:12 EDT
(In reply to Jeff Fearn from comment #1)
> Link to source I can test with?

This appeared in a brand-new book; so:

$ publican create_book --title "DB5 Test Book" --dtdver 5
$ cd DB5_Test_Book/
$ publican build --formats html-single,html,pdf,epub --langs en-US
Comment 3 Jeff Fearn 2013-10-16 00:46:35 EDT
On Fedora 19 using the 1.78.1 docbook styles the combination of using xhtml5 output and setting the html.ext param triggers an issue.

If html.ext is set to '.html' the '.' appears to trigger an issue where a broken UTF8 character is introduced in to the output stream.

Switching from the xhtml5 styles to the xhtml-1_1 styles does not change the output, it's still broken.

Switching from the xhtml5 styles to the xhtml styles does change the output, it now renders correctly. The output is HTML4 instead of HTML5.

When using xhtml5 if you set html.ext to 'html', i.e. simply drop the '.', then the output renders correctly. The file names are invalid.

On RHEL6 using the 1.78.1 styles sheets this issues is not present and the combination of xhtml5 and setting html.ext to '.html' works correctly.

This is likely cause somewhere in the libxslt stack on Fedora, it could take a considerable time to debug.
Comment 4 HSS Product Manager 2013-10-16 00:47:31 EDT
HSS-QE has reviewed and declined this request. QE for this bug will be handled by IED.
Comment 5 Jeff Fearn 2013-10-16 01:13:55 EDT
It appears that setting html.ext to anything besides '.html' or '.htm' will avoid triggering this issue on Fedora 19.

Tested successfully with all these: '.xhtml' '.gtml' '.ht' '.htmll' '.1234' .wtf'
Comment 6 Ruediger Landmann 2013-10-17 03:02:59 EDT
Retested with  publican-3.9.9-0.fc19.t11.noarch with perl-XML-TreeBuilder-5.0_1-0.fc19.noarch -- stray characters still appear :(
Comment 7 Jeff Fearn 2013-10-18 02:17:57 EDT
Setting Content-Type in a meta tag resolves this ...

To ssh://git.fedorahosted.org/git/publican.git
   181eacd..f8b7701  devel -> devel
Comment 8 Ruediger Landmann 2013-10-18 20:12:09 EDT
Fixed in publican-3.9.9-0.fc19.t14.noarch

Note You need to log in before you can comment on or make changes to this bug.