Bug 1344078 - Narrow-no-break-space replaced with # in pdf by publican build
Summary: Narrow-no-break-space replaced with # in pdf by publican build
Keywords:
Status: NEW
Alias: None
Product: Publican
Classification: Community
Component: publican
Version: future
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nobody
QA Contact: Ruediger Landmann
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-08 17:04 UTC by jaaf64
Modified: 2022-04-26 19:23 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)
publican build log example (12.50 KB, application/octet-stream)
2016-06-10 04:38 UTC, jaaf64
no flags Details

Description jaaf64 2016-06-08 17:04:58 UTC
Description of problem: After building a translated book (in this case fr), in the pdf file, all narrow-no-break-spaces (unicode 202F) are replaced with hash char (#).
This problem doesn't exist in html books.


Version-Release number of selected component (if applicable):4.3.2


How reproducible: always


Steps to Reproduce:
1. in bookdir : git clone -b f24 https://pagure.io/<project> (in my case <project> is release-notes but now all the nnbsp have been replaced with nbsp)
2. cd <project> 
3. publican trans_drop
4. publican update pot
5. publican update_po --langs=fr
6. in release-notes/fr replace po file with translated po files from zanata (ensure there are some narrow-no-break-spaces in the translation)
6. publican build --formats=html,pdf --langs=fr
7. look at the pdf in the tmp dir.

Actual results:
This is an example#;

Expected results:
This is an exampleβ€―;

Additional info:
A lot of fonts lacks this char, thus on the web it's generally rendered by a no-break-space (not narrow). 
On paper, generally it matters and authors often chose a font that display this char properly i.e. narrow space.
In the French language, narrow-no-break-spaces are required before ?,! and ; and between an number and the unit. They may be used between a program name and its version number.

Comment 1 Jeff Fearn 🐞 2016-06-08 22:46:15 UTC
Are you using FOP? If so that is the problem, FOP doesn't do font fallback very well, so you have to force it to use a specific font that has the characters required.

We don't carry things like this for FOP anymore so the onus is on brands to carry them or move over to using wkhtmltopdf to get proper font handling.

Basically you add something like this to the brand's pdf.xsl file.

<xsl:template name="pickfont-sans">
	<xsl:variable name="font">
		<!--xsl:call-template name="pickfont"/-->
		<xsl:choose>
			<xsl:when test="$l10n.gentext.language = 'ar-SA' or l10n.gentext.language = 'ar'">
				<xsl:text>KacstBook,</xsl:text>
			</xsl:when>
			<xsl:when test="$l10n.gentext.language = 'ja-JP' or l10n.gentext.language = 'ja'">
				<xsl:text>IPAPGothic,Sazanami Gothic,</xsl:text>
			</xsl:when>
		</xsl:choose>
	</xsl:variable>
		
	<xsl:copy-of select="$font"/><xsl:text>Liberation Sans,sans-serif</xsl:text>

</xsl:template>

<xsl:template name="pickfont-mono">
	<xsl:variable name="font">
		<!--xsl:call-template name="pickfont"/-->
		<xsl:choose>
			<xsl:when test="$l10n.gentext.language = 'ar-SA' or l10n.gentext.language = 'ar'">
				<xsl:text>KacstScreen,</xsl:text>
			</xsl:when>
			<xsl:when test="$l10n.gentext.language = 'ja-JP' or l10n.gentext.language = 'ja'">
				<xsl:text>IPAGothic,Sazanami Gothic,</xsl:text>
			</xsl:when>
		</xsl:choose>
	</xsl:variable>

	<xsl:copy-of select="$font"/><xsl:text>Liberation Mono,monospace</xsl:text>

</xsl:template>


<xsl:param name="title.font.family">
	<xsl:call-template name="pickfont-sans"/>
</xsl:param>

<xsl:param name="body.font.family">
	<xsl:call-template name="pickfont-sans"/>
</xsl:param>

<xsl:param name="monospace.font.family">
	<xsl:call-template name="pickfont-mono"/>
</xsl:param>

<xsl:param name="sans.font.family">
	<xsl:call-template name="pickfont-sans"/>
</xsl:param>


/me shudders

XSL is just evil compared to CSS :)

Comment 2 jaaf64 2016-06-09 03:55:40 UTC
I don't know whether I am using FOP or not. The only thing I can say is that FOP is installed on my computer.

Could you confirm that the pdf.xsl file you are speaking of is this one:

/usr/share/publican/xsl/pdf.xsl 
?

I added the lines you suggest at the bottom of the file (just before </xsl:stylesheet>) but it had no effect at all (# char is stil in place of narrow-no-break-space.

Comment 3 Jeff Fearn 🐞 2016-06-09 22:43:07 UTC
(In reply to jaaf64 from comment #2)
> I don't know whether I am using FOP or not. The only thing I can say is that
> FOP is installed on my computer.

Can you dump the entire build log in to a file and attach it?

> Could you confirm that the pdf.xsl file you are speaking of is this one:
> 
> /usr/share/publican/xsl/pdf.xsl 
> ?

That's the Publican one, it should probably be done in a brand, but that should work if you are using fop.

> I added the lines you suggest at the bottom of the file (just before
> </xsl:stylesheet>) but it had no effect at all (# char is stil in place of
> narrow-no-break-space.

I don't know if the font in the example has the necessary character, it's just an example of what to do if you know of a font that has them.

Comment 4 jaaf64 2016-06-10 04:35:21 UTC
(In reply to Jeff Fearn from comment #3)
> (In reply to jaaf64 from comment #2)
> > I don't know whether I am using FOP or not. The only thing I can say is that
> > FOP is installed on my computer.
> 
> Can you dump the entire build log in to a file and attach it?
I returned the pdf.xsl file to its original form and after that rebuilt the book from freshly downloaded files.

Here is the build log (attachement publican.log). Unfortunately it's in French.
But mainly the red messages say this:
e.g.
WARNING : missing message in PO files, please consider updating POT and PO files

"As always, Fedora continues to develop (<ulink url=\"https://fedoraproject.org/wiki/Red_Hat_contributions\">Red Hat contributions</ulink>) and integrate the latest free and open source software (<ulink url=\"https://fedoraproject.org/wiki/Releases/24/ChangeSet\">Fedora &PRODVER; Features)</ulink>. The following sections provide a brief overview of major changes from the last release of Fedora."

In the case of this particular message (and the following one) I completely blanked the message and retyped it to ensure that no malicious invisible char may be present in it. 

The result is this message you see in the log file and no translation in the pdf. Nonetheless, the html book includes the translation, this leads me to think the po file is correct.

An other thing to notice: I also have the # issue with char from an other European language it is c accented (Δ‡) – we have it in a family name 
> 
> > Could you confirm that the pdf.xsl file you are speaking of is this one:
> > 
> > /usr/share/publican/xsl/pdf.xsl 
> > ?
> 
> That's the Publican one, it should probably be done in a brand, but that
> should work if you are using fop.

I am not an expert in publican, but if it should be in a brand should I install something to have the required brand to build the release-notes guide.
> 
> > I added the lines you suggest at the bottom of the file (just before
> > </xsl:stylesheet>) but it had no effect at all (# char is stil in place of
> > narrow-no-break-space.
> 
> I don't know if the font in the example has the necessary character, it's
> just an example of what to do if you know of a font that has them.
I will take the time to find a convenient font a retry this.

Comment 5 jaaf64 2016-06-10 04:38:18 UTC
Created attachment 1166474 [details]
publican build log example

in addition to comment 4

Comment 6 jaaf64 2016-06-10 05:06:27 UTC
I made a mistake in comment 4: In fact, the untranslated messages in pdf are also untranslated in the html. mea culpa.

Comment 7 jaaf64 2016-06-10 06:08:28 UTC
Point 1: Adding only these lines in pdf.xsl solved the narrow-no-break space and Δ‡ problem.
<xsl:param name="title.font.family">
	<xsl:text>FreeSans,Liberation Sans,sans-serif</xsl:text>
</xsl:param>

<xsl:param name="body.font.family">
	<xsl:text>FreeSans,Liberation Sans,sans-serif</xsl:text>
</xsl:param>

<xsl:param name="monospace.font.family">
	<xsl:text>Liberation Mono,monospace</xsl:text>
</xsl:param>

<xsl:param name="sans.font.family">
	<xsl:text>FreeSans,Liberation Sans,sans-serif</xsl:text>
</xsl:param>

At first there was not FreeSans and it solved only the Δ‡ problem. This made me think something is wrong in your first proposal, maybe in the <xsl:choose> structure.
Then FreeSans solved narrow-no-break-space problem.

Point 2 : I think the problem on missing messages is totally different and probably needs an other bug report. What do you think?

Comment 8 Jeff Fearn 🐞 2016-06-19 21:57:49 UTC
> Point 2 : I think the problem on missing messages is totally different and
> probably needs an other bug report. What do you think?

Is the message in the PO file and not being detected, or is the message simply not in there?

Comment 9 jaaf64 2016-06-20 06:04:22 UTC
I have trouble answering you as things seem to have evolved. 

At the time I posted this bug report, a message translated on zanata, could appear untranslated in the po files downloaded from zanata β€” meaning that the message appeared twice in English (msgid and msgstr). Not once in English (msgid) and once translated (msgstr). Even retyping manually the translation in zanata didn't change anything.

Now, things are different,  the publican build tool may warn for a missing message, but looking at the po file downloaded from zanata, thing seem correct: msgid in English, and msgstr translated.
Moreover, a lot of such messages (declared missing) are message whose translation (msgstr) is the same as original (msgid) because they don't need translation e.g. commands. I say a lot, but not all of them.


Note You need to log in before you can comment on or make changes to this bug.