Created attachment 899135 [details] Example Book Description of problem: Entity examples are not being escaped properly when getting the strings for translation. This means that the translated content will be different to the source content. Version-Release number of selected component (if applicable): 4.1.3 How reproducible: Always Steps to Reproduce: 1. Download the attached book 2. Run publican update_pot 3. Notice that the single ampersand is "&" (correct) however the entity "&" is the same when it should be "&amp;". Actual results: "<![CDATA[& is represented as &]]>" is pulled out as "& is represented as &" Expected results: "<![CDATA[& is represented as &]]>" is pulled out as "& is represented as &amp; Additional information: Note: For this bug I've only used ampersand as an example, however it's reproducible with any entity.
This is similar to BZ#958343
CDATA is not XML and is specifically ignored by the XML parser, is this causing an issue anywhere in publicans build process?
Yes, if you have an example of a configuration file. ie <programlisting language="Java"><![CDATA[public class Test { /* * This method use the BUILD_ID entity. ie &BUILD_ID; */ public void addBuildDataEntity(final String value) { ... } }]]></programlisting> Then this won't build as the pot entry will be: msgid "public class Test {\n" " /*\n" " * This method use the BUILD_ID entity. ie &BUILD_ID;\n" " */\n" " public void addBuildDataEntity(final String value) {\n" " ...\n" " }\n" "}" msgstr "" This means that when it builds a translation it'll try and resolve the &BUILD_ID; entity example, however since it doesn't exist it causes the build to fail. Additionally as mentioned above if you used something like: <programlisting><![CDATA[& is represented as &]]></programlisting> when built using the source will show: & is represented as & however when built using an untranslated or even a translated string (assuming a translator didn't manually fix it) then it will come out as: & is represented as & Secondly, CDATA is part of the XML specification, so it shouldn't be getting ignored. See http://www.w3.org/TR/REC-xml/#sec-cdata-sect
Opps the example I gave wasn't a config file sorry. I was initially going to do that and then I switched to the example I gave.
(In reply to Lee Newson from comment #3) > Yes, if you have an example of a configuration file. ie > > <programlisting language="Java"><![CDATA[public class Test { > /* > * This method use the BUILD_ID entity. ie &BUILD_ID; > */ > public void addBuildDataEntity(final String value) { > ... > } > }]]></programlisting> > > Then this won't build as the pot entry will be: > > msgid "public class Test {\n" > " /*\n" > " * This method use the BUILD_ID entity. ie &BUILD_ID;\n" > " */\n" > " public void addBuildDataEntity(final String value) {\n" > " ...\n" > " }\n" > "}" > msgstr "" > > This means that when it builds a translation it'll try and resolve the > &BUILD_ID; entity example, however since it doesn't exist it causes the > build to fail. So this is the bug, it should never be resolving entities in CDATA. > Additionally as mentioned above if you used something like: > > <programlisting><![CDATA[& is represented as &]]></programlisting> > > when built using the source will show: > > & is represented as & > > however when built using an untranslated or even a translated string > (assuming a translator didn't manually fix it) then it will come out as: > > & is represented as & > > Secondly, CDATA is part of the XML specification, so it shouldn't be getting > ignored. See http://www.w3.org/TR/REC-xml/#sec-cdata-sect The whole point is that it contains content that is not escaped or resolved.
(In reply to Jeff Fearn from comment #5) > (In reply to Lee Newson from comment #3) > > So this is the bug, it should never be resolving entities in CDATA. Yeah that is what I was trying to get at sorry Jeff. > > > The whole point is that it contains content that is not escaped or resolved. +1
This requires a patch to XML::TreeBuilder.
Fix CDATA handling so that it remains untouched through all code paths. To ssh://git.fedorahosted.org/git/publican.git 6652bcf..facd12a devel -> devel
*** Bug 1117561 has been marked as a duplicate of this bug. ***
Looks fixed, but lnewson will re-check as well, due to po/pot file updates required. Old Chapter.pot file (in this bug's attachment) had: msgid "& is represented as &" msgstr """" After running publican update_pot, new Chapter.pot has: msgid "<![CDATA[& is represented as &]]>" msgstr "" After running: publican build --langs ja-JP --formats html Test Chapter contains: įŽŽ1įŦ Test Chapter & is represented as & < is represented as < &blah; is an example entity
Verified that the translated CDATA content is also displayed when the book is built. Moving this onto VERIFIED as per Bruce's request.
A fix for this shipped in Publican 4.2.0.