Bug 1101050

Summary: Entities in CDATA are resolved in translations
Product: [Community] Publican Reporter: Lee Newson <lnewson>
Component: publicanAssignee: Jeff Fearn 🐞 <jfearn>
Status: CLOSED CURRENTRELEASE QA Contact: Bruce Reeler <breeler>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.1CC: aigao, ddomingo, jito, lnewson, rlandman
Target Milestone: 4.2   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.2.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-01 03:40:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Example Book none

Description Lee Newson 2014-05-26 00:45:14 UTC
Created attachment 899135 [details]
Example Book

Description of problem:
Entity examples are not being escaped properly when getting the strings for translation. This means that the translated content will be different to the source content.


Version-Release number of selected component (if applicable):
4.1.3

How reproducible:
Always

Steps to Reproduce:
1. Download the attached book
2. Run publican update_pot
3. Notice that the single ampersand is "&amp;" (correct) however the entity "&amp" is the same when it should be "&amp;amp;".

Actual results:

"<![CDATA[& is represented as &amp;]]>" is pulled out as "&amp; is represented as &amp;"

Expected results:

"<![CDATA[& is represented as &amp;]]>" is pulled out as "&amp; is represented as &amp;amp;

Additional information:

Note: For this bug I've only used ampersand as an example, however it's reproducible with any entity.

Comment 1 Lee Newson 2014-05-26 00:45:58 UTC
This is similar to BZ#958343

Comment 2 Jeff Fearn 🐞 2014-05-26 03:44:32 UTC
CDATA is not XML and is specifically ignored by the XML parser, is this causing an issue anywhere in publicans build process?

Comment 3 Lee Newson 2014-05-26 03:55:57 UTC
Yes, if you have an example of a configuration file. ie

<programlisting language="Java"><![CDATA[public class Test {
  /*
   * This method use the BUILD_ID entity. ie &BUILD_ID;
   */
  public void addBuildDataEntity(final String value) {
     ...
  }
}]]></programlisting>

Then this won't build as the pot entry will be:

msgid "public class Test {\n"
"  /*\n"
"   * This method use the BUILD_ID entity. ie &BUILD_ID;\n"
"   */\n"
"  public void addBuildDataEntity(final String value) {\n"
"     ...\n"
"  }\n"
"}"
msgstr ""

This means that when it builds a translation it'll try and resolve the &BUILD_ID; entity example, however since it doesn't exist it causes the build to fail.

Additionally as mentioned above if you used something like:

<programlisting><![CDATA[& is represented as &amp;]]></programlisting>

when built using the source will show:

& is represented as &amp;

however when built using an untranslated or even a translated string (assuming a translator didn't manually fix it) then it will come out as:

& is represented as &

Secondly, CDATA is part of the XML specification, so it shouldn't be getting ignored. See http://www.w3.org/TR/REC-xml/#sec-cdata-sect

Comment 4 Lee Newson 2014-05-26 03:56:49 UTC
Opps the example I gave wasn't a config file sorry. I was initially going to do that and then I switched to the example I gave.

Comment 5 Jeff Fearn 🐞 2014-05-26 04:06:53 UTC
(In reply to Lee Newson from comment #3)
> Yes, if you have an example of a configuration file. ie
> 
> <programlisting language="Java"><![CDATA[public class Test {
>   /*
>    * This method use the BUILD_ID entity. ie &BUILD_ID;
>    */
>   public void addBuildDataEntity(final String value) {
>      ...
>   }
> }]]></programlisting>
> 
> Then this won't build as the pot entry will be:
> 
> msgid "public class Test {\n"
> "  /*\n"
> "   * This method use the BUILD_ID entity. ie &BUILD_ID;\n"
> "   */\n"
> "  public void addBuildDataEntity(final String value) {\n"
> "     ...\n"
> "  }\n"
> "}"
> msgstr ""
> 
> This means that when it builds a translation it'll try and resolve the
> &BUILD_ID; entity example, however since it doesn't exist it causes the
> build to fail.

So this is the bug, it should never be resolving entities in CDATA.

> Additionally as mentioned above if you used something like:
> 
> <programlisting><![CDATA[& is represented as &amp;]]></programlisting>
> 
> when built using the source will show:
> 
> & is represented as &amp;
> 
> however when built using an untranslated or even a translated string
> (assuming a translator didn't manually fix it) then it will come out as:
> 
> & is represented as &
> 
> Secondly, CDATA is part of the XML specification, so it shouldn't be getting
> ignored. See http://www.w3.org/TR/REC-xml/#sec-cdata-sect

The whole point is that it contains content that is not escaped or resolved.

Comment 6 Lee Newson 2014-05-26 04:09:56 UTC
(In reply to Jeff Fearn from comment #5)
> (In reply to Lee Newson from comment #3)
> 
> So this is the bug, it should never be resolving entities in CDATA.

Yeah that is what I was trying to get at sorry Jeff.

> 
> 
> The whole point is that it contains content that is not escaped or resolved.

+1

Comment 7 Jeff Fearn 🐞 2014-05-26 06:41:46 UTC
This requires a patch to XML::TreeBuilder.

Comment 8 Jeff Fearn 🐞 2014-05-27 03:42:15 UTC
Fix CDATA handling so that it remains untouched through all code paths.

To ssh://git.fedorahosted.org/git/publican.git
   6652bcf..facd12a  devel -> devel

Comment 9 Jeff Fearn 🐞 2014-07-21 05:49:46 UTC
*** Bug 1117561 has been marked as a duplicate of this bug. ***

Comment 10 Bruce Reeler 2014-08-25 06:06:59 UTC
Looks fixed, but lnewson will re-check as well, due to po/pot file updates required.

Old Chapter.pot file (in this bug's attachment) had:

msgid "&amp; is represented as &amp;"
msgstr """"

After running publican update_pot, new Chapter.pot has:

msgid "<![CDATA[& is represented as &amp;]]>"
msgstr ""

After running: 
publican build --langs ja-JP --formats html

Test Chapter  contains:
įŽŽ1įŦ  Test Chapter

& is represented as &amp;

< is represented as &lt;

&blah; is an example entity

Comment 11 Lee Newson 2014-08-25 06:27:39 UTC
Verified that the translated CDATA content is also displayed when the book is built. Moving this onto VERIFIED as per Bruce's request.

Comment 12 Jeff Fearn 🐞 2014-09-01 03:40:28 UTC
A fix for this shipped in Publican 4.2.0.