Bug 461375

Summary: Publican's xsl stylesheet adds a weird ASCII character to section headings
Product: [Community] Publican Reporter: Jared Smith <jsmith.fedora>
Component: publicanAssignee: Jeff Fearn <jfearn>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.6CC: jsmith.fedora, mmcallis, publican-list, stickster
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Fixed In Version: 0.37 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-09-08 02:16:31 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Jared Smith 2008-09-06 18:59:38 EDT
Description of problem:

Publican's xsl stylesheet adds an ASCII character to section headings in HTML documents, which get rendered by browser.

Version-Release number of selected component (if applicable):


How reproducible:

Every time

Steps to Reproduce:
1. Render an HTML page from DocBook using Publican (in my case, I made an article)
2. Go to a section heading in the document
3. Notice that the character between the section number and the section title is not a space, but is an ASCII character.  In vim, I type "ga" at the character, and find that it is ASCII code Decimal 160, Hex 00a0, Octal 240.  According to Google, that should be a non-breaking space.  (I'm still not sure why the browser is rendering it as a capital A with a hat over it.)  Shouldn't that be converted to "&nbsp;" anyway?
Actual results:

The ASCII character shows up in the HTML document, making it look bad.

Expected results:

The lack of the ASCII character.

Additional info:

In doing some additional looking, I see that line 1011 of pdf.xsl says it's using a dirty, dirty hack by setting character &#xA0; but I still haven't figured out how this comes into play with the HTML version.
Comment 1 Paul W. Frields 2008-09-06 19:01:54 EDT
Looks like the 0xC2 before the 0xA0 is not getting clipped out by the self-admittedly ugly hack being used in pdf.xsl.
Comment 2 Paul W. Frields 2008-09-06 19:51:18 EDT

I don't know if this makes a difference, but this is the first line of the temporary output in the document's tmp/$LANG/xml/DocumentName.xml:

<?xml version='1.0'?>

Should there be an encoding there?  Does this have anything to do with /usr/bin/xmlClean?
Comment 3 Paul W. Frields 2008-09-06 20:16:24 EDT
Looks like publican doesn't add an encoding by default. I'll add that bug separately.
Comment 4 Jeff Fearn 2008-09-08 02:16:31 EDT
This was very odd, chunked html worked fine for me, but html-single had this error.

saxon with the 1.72.0 style sheets did not produce this error in any format.

xsltproc with any style sheets did not produce this error in any format. 

Added omit-xml-declaration="no" to xsl:output in html-single.xsl "fixed" this problem for saxon with html-single on the current style sheets.