Red Hat Bugzilla – Bug 1018659
Stray characters in DocBook 5 output
Last modified: 2013-12-18 21:46:26 EST
Description of problem:
DocBook 5 books contain multiple stray "Â" characters in html and html-single ouput
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create a DocBook 5 book
2. Build html, html-single, pdf, and epub versions
html and html-single versions contain stray "Â" characters around where DocBook gentext has been used, for example:
1.Â Document Conventions
1.1.Â Typographic Conventions
Copyright Â© 2013 |
No stray "Â" characters
This doesn't affect the PDF or EPUB
Link to source I can test with?
(In reply to Jeff Fearn from comment #1)
> Link to source I can test with?
This appeared in a brand-new book; so:
$ publican create_book --title "DB5 Test Book" --dtdver 5
$ cd DB5_Test_Book/
$ publican build --formats html-single,html,pdf,epub --langs en-US
On Fedora 19 using the 1.78.1 docbook styles the combination of using xhtml5 output and setting the html.ext param triggers an issue.
If html.ext is set to '.html' the '.' appears to trigger an issue where a broken UTF8 character is introduced in to the output stream.
Switching from the xhtml5 styles to the xhtml-1_1 styles does not change the output, it's still broken.
Switching from the xhtml5 styles to the xhtml styles does change the output, it now renders correctly. The output is HTML4 instead of HTML5.
When using xhtml5 if you set html.ext to 'html', i.e. simply drop the '.', then the output renders correctly. The file names are invalid.
On RHEL6 using the 1.78.1 styles sheets this issues is not present and the combination of xhtml5 and setting html.ext to '.html' works correctly.
This is likely cause somewhere in the libxslt stack on Fedora, it could take a considerable time to debug.
HSS-QE has reviewed and declined this request. QE for this bug will be handled by IED.
It appears that setting html.ext to anything besides '.html' or '.htm' will avoid triggering this issue on Fedora 19.
Tested successfully with all these: '.xhtml' '.gtml' '.ht' '.htmll' '.1234' .wtf'
Retested with publican-3.9.9-0.fc19.t11.noarch with perl-XML-TreeBuilder-5.0_1-0.fc19.noarch -- stray characters still appear :(
Setting Content-Type in a meta tag resolves this ...
181eacd..f8b7701 devel -> devel
Fixed in publican-3.9.9-0.fc19.t14.noarch