Bug 744277

Summary: Python client should not import .po files with stray quotes
Product: [Retired] Zanata Reporter: Bryan Kearney <bkearney>
Component: Component-PythonClientAssignee: James Ni <jni>
Status: CLOSED NEXTRELEASE QA Contact: Ding-Yi Chen <dchen>
Severity: high Docs Contact:
Priority: urgent    
Version: unspecifiedCC: jni, pcormier, runab, sflaniga, sshedmak, yshao, zanata-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 1.3.5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-26 08:15:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 786670    
Attachments:
Description Flags
Test project containing illegal stray quote in as.po none

Description Bryan Kearney 2011-10-07 18:00:42 UTC
String which have newlines cause issues with client. Using the following command:

zanata po pull --srcdir ..

I get the following diff on my translations:

-msgid "\nUsage: %s [options] MODULENAME --help\n"
-msgstr "\n"ব্যৱহাৰ: %s [বিকল্পসমূহ] MODULENAME --help\n"
+msgid ""
+"\n"
+"Usage: %s [options] MODULENAME --help\n"
+msgstr ""
+"\n"
+"\"ব্যৱহাৰ: %s [বিকল্পসমূহ] MODULENAME --help\n"


This splitting of the file by line causes gettext to not find the string.

Comment 2 Sean Flanigan 2011-10-13 03:29:56 UTC
I assume this string comes from as.po in subscriptionmanager?

BTW, there seems to be a stray quote char on the second line of that diff
(after \n).

Splitting by line should be fine; if you run the original PO through Gettext's
msgcat it will do the same thing.  

What is the tool/library which consumes the PO file?  msgfmt?

Comment 3 Bryan Kearney 2011-10-13 11:57:20 UTC
We are using this command in the make file:

compile-po:
        for lang in $(basename $(notdir $(wildcard po/*.po))) ; do \
                echo $$lang ; \
                mkdir -p po/build/$$lang/LC_MESSAGES/ ; \
                msgfmt -c --statistics -o po/build/$$lang/LC_MESSAGES/rhsm.mo po/$$lang.po ; \
        done


We have fixd the bug in as.po

-- bk

Comment 4 Sean Flanigan 2011-10-14 00:48:05 UTC
I can't get msgfmt to complain about as.po, either with the version from Zanata as of yesterday, or today.

What exactly was the error from msgfmt?  Do you have a copy of the file from Zanata which triggered the error?

Comment 5 Bryan Kearney 2011-10-14 01:21:13 UTC
The issue we say was the the string split on multiple lines
msgid ""
"\n"
"Usage: %s [options] MODULENAME --help\n"
msgstr ""
"\n"
"\"ব্যৱহাৰ: %s [বিকল্পসমূহ] MODULENAME --help\n"

Was no picked up. We had to collpase the strings:

msgid "\nUsage: %s [options] MODULENAME --help\n"
msgstr "\n"ব্যৱহাৰ: %s [বিকল্পসমূহ] MODULENAME --help\n"

The strings are written as "\n"ব্যৱহাৰ: %s [বিকল্পসমূহ] MODULENAME --help\n" in the source file. It is the import/export from zanata which breaks it onto many lines.

Comment 6 Sean Flanigan 2011-10-14 02:08:27 UTC
Okay, I've run the original as.po and as.po as pulled from zanata through msgfmt then msgunfmt, and got this diff:

$ msgfmt -c --statistics as.po -o as.mo
420 translated messages.                                     
$ msgfmt -c --statistics as-pulled.po -o as-pulled.mo
420 translated messages.  
$ msgunfmt as.mo > as.mo.po
$ msgunfmt as-pulled.mo > as-pulled.mo.po
$ diff -U5 as.mo.po as-pulled.mo.po
--- as.mo.po    2011-10-14 11:34:14.411349895 +1000
+++ as-pulled.mo.po     2011-10-14 11:34:22.510224846 +1000
@@ -104,11 +104,11 @@
 msgid ""
 "\n"
 "Usage: %s [options] MODULENAME --help\n"
 msgstr ""
 "\n"
-"বযৱহাৰ: %s [বিকলপসমহ] MODULENAME --help\n"
+"\"বযৱহাৰ: %s [বিকলপসমহ] MODULENAME --help\n"

 msgid "    Entitled Repositories in %s"
 msgstr "%s -ত অনজঞা থকা ভৰালসমহ"

 msgid "    Installed Product Status"


We're investigating where the extra quote came from (a translator, or Zanata's import process).

Comment 7 Sean Flanigan 2011-10-14 02:36:43 UTC
Going by http://git.fedorahosted.org/git/?p=subscription-manager.git;a=blobdiff;f=po/as.po;h=7bb789bbe43a43a78f66bfbcde22cf9a8b9957b8;hp=34a810df2139bae091190483c18d0e906dd3d10d;hb=37a269921d46b74db0f2b32bc739647aa53263d1;hpb=c900ff31d099801fdb832801e85730c3a49a88c8

and by the database history, I think as.po was imported with the stray (unescaped) quote.  The python client *should* have rejected the entire file as illegal, but instead it treated the stray quote as a literal quote.  

I've created a test PO file so that we can fix the import bug, but in the meantime I've removed the extra quote from that string in Zanata.  You should be able to pull the document now.

NB: The fact that your runtime PO library treated it as an untranslated string may be a bug in that library, because it still had an entry for that msgid, even though it was a bad one.

Comment 8 Sean Flanigan 2011-10-14 02:45:23 UTC
Created attachment 528142 [details]
Test project containing illegal stray quote in as.po

The file as.po should be rejected as illegal, not imported.

Comment 11 James Ni 2011-10-20 01:43:58 UTC
Hi

I have create a patch for polib and submit it to upstream, please see the link below:
https://bitbucket.org/izi/polib/issue/27/polib-doesnt-check-unescaped-quote

I also suggest that use "msgfmt --check as.po" before pushing to zanata server with python-client, since python-client doesn't have any syntax validation or escape semantic check, the python-polib doesn't provide escape semantic check also. I have test 'msgfmt --check' with Sean's illegal po file, it shows following message:

$ msgfmt --check as.po
as.po:25:13: syntax error
as.po:25: keyword "s" unknown
as.po:26: end-of-line within string
msgfmt: found 3 fatal errors
   
So msgfmt could detect the syntax error, I think this could be a workaround for this issue.

Comment 14 Sean Flanigan 2011-10-28 06:53:51 UTC
Regarding the newlines:

From http://www.gnu.org/software/gettext/manual/html_node/PO-Files.html#PO-Files

"One should carefully distinguish between end of lines marked as ‘\n’ inside quotes, which are part of the represented string, and end of lines in the PO file itself, outside string quotes, which have no incidence on the represented string."

Thus if you run Gettext msgfmt on a PO file like this:
  msgid "\nUsage: %s [options] MODULENAME --help\n"
  msgstr "\nবযৱহাৰ: %s [বিকলপসমহ] MODULENAME --help\n"

or on a PO file like this:
  msgid ""
  "\n"
  "Usage: %s [options] MODULENAME --help\n"
  msgstr ""
  "\n"
  "বযৱহাৰ: %s [বিকলপসমহ] MODULENAME --help\n"

then the MO file produced by msgfmt is identical.  Zanata only adds insignificant newlines outside the quotes (just like Gettext msgcat).  


Naturally, we still need to fix the Zanata import (or polib) so that it rejects illegal PO files.  Running "msgfmt --check" _before_ zanata import should help as a workaround.

Comment 16 James Ni 2011-10-31 10:13:31 UTC
Hi

The author of polib have modified source code to fix unescaped quotes issue i reported, please look at commit 8ee09b305e8e at https://bitbucket.org/izi/polib/changesets. And he also accept my pull request for adding a check for quotes at the beginning of the strings, please look at commit dbafdc621bf4 at https://bitbucket.org/izi/polib/changesets

We will try to package 0.7.0 version polib on RHEL and Fedora, and include above commit as a patch.

Comment 17 Ding-Yi Chen 2011-11-01 01:03:43 UTC
Should I also include the commits between 0.7.0 and 8eee09, or just commit 8ee09b305e8e and dbafdc621bf4?

Comment 18 Sean Flanigan 2011-11-01 01:17:55 UTC
All the intervening commits look good, and there's another bugfix which might be good to have.  I'd say take them all.

Comment 19 Ding-Yi Chen 2012-02-10 05:11:41 UTC
python-polib-0.7.0-2 is push to stable for f16, f15, el6 and el5.

Comment 20 Ding-Yi Chen 2012-02-21 07:51:16 UTC
James, I an not quite sure whether python-polib provides the error information.
That is, it returns "Error: Stray quotes detected in <filename>" or "Error: invalid format: <filename>"

As in version 1.3.3-25-g7cff, the error message returns:
"error: Cannot process the po file" 
even if .pot file contains stray quotes.

Using the error message I provide should be more informative.

Comment 21 James Ni 2012-02-27 03:57:13 UTC
Hi Ding,

Thanks, i have modified source code to output the error message from python-polib, if unescaped quote is detected. It should give more informative info to users. Please check the latest code on git-hub for verify.

Comment 22 Ding-Yi Chen 2012-02-28 01:31:11 UTC
VERIFIED with 1.3.4-3-gfc46