Bug 678186 - [l10n] update_po produces inconsistent results
Summary: [l10n] update_po produces inconsistent results
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Publican
Classification: Community
Component: publican
Version: 2.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: 3.0
Assignee: Jeff Fearn 🐞
QA Contact: tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-17 03:08 UTC by Ruediger Landmann
Modified: 2012-10-31 03:11 UTC (History)
4 users (show)

Fixed In Version: 3.0.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-10-31 03:11:12 UTC
Embargoed:


Attachments (Terms of Use)
example XML file (79.88 KB, text/xml)
2011-02-17 03:08 UTC, Ruediger Landmann
no flags Details
PO file with the extra lines (77.91 KB, text/x-gettext-translation)
2011-02-17 03:09 UTC, Ruediger Landmann
no flags Details
PO file without the extra lines -- update_po doesn't add them in. (77.94 KB, text/x-gettext-translation)
2011-02-17 03:10 UTC, Ruediger Landmann
no flags Details

Description Ruediger Landmann 2011-02-17 03:08:47 UTC
Created attachment 479256 [details]
example XML file

Description of problem:

Publican sometimes includes lines in msgid entries in PO files that contain nothing but two sets of quote marks. This happens when the corresponding XML file includes a <screen> element with a carriage return before the closing tag. For example:

<screen>
rhnpush --server=http://localhost/APP -c 'rhel-5.3-beta' -d /var/satellite/custom-distro/rhel-i386-server-5.3-beta/Server/
</screen>

is represented in the PO file as:

#. Tag: screen
#, no-c-format
msgid "\n"
"rhnpush --server=http://localhost/APP -c 'rhel-5.3-beta' -d /var/satellite/custom-distro/rhel-i386-server-5.3-beta/Server/\n"
""
msgstr ""

If Publican updates a PO file with an msgid that matches except for the extra line, Publican doesn't add this line. For example, update_po does not change this entry:

#. Tag: screen
#, no-c-format
msgid "\n"
"rhnpush --server=http://localhost/APP -c 'rhel-5.3-beta' -d /var/satellite/custom-distro/rhel-i386-server-5.3-beta/Server/\n"
msgstr ""

We have examples of books where PO files lack these lines for reasons that are unclear.

These lines aren't a problem in themselves, but Publican also counts each of these lines as a word when you run publican lang_stats -- which means that the word count for different languages might be different. 

This doesn't impact on translation directly, but can make life interesting for anyone managing a translation project; the word counts that lang_stats produces serve as a crude but handy checksum to make sure that all members of a translation team are working on up-to-date PO files. When different languages report different word counts, it's not immediately obvious that everyone's translating the same thing :)

Version-Release number of selected component (if applicable):
2.5-1

How reproducible:
100%

Steps to Reproduce:
1. generate a PO file from an XML file that includes a <screen> element with its closing tag on a new line
2. run lang_stats on the target language and note the result
3. edit the PO file to remove any lines in msgid entries that consist only of '""' 
4. run lang_stats on the target language and note the result
5. run publican update_po to refresh the PO file
6. open the PO file to note that the '""' lines are not restored
7. run lang_stats on the PO file yet again and note the result
  
Actual results:
results in steps 4 and 7 are the same, but differ from result in step 2

Expected results:
same results in steps 2, 4, and 7

Additional info:

Comment 1 Ruediger Landmann 2011-02-17 03:09:34 UTC
Created attachment 479257 [details]
PO file with the extra lines

Comment 2 Ruediger Landmann 2011-02-17 03:10:31 UTC
Created attachment 479258 [details]
PO file without the extra lines -- update_po doesn't add them in.

Comment 3 Jeff Fearn 🐞 2011-02-26 08:24:22 UTC
Merging POT and PO files is currently done by msgmerge from gettext, it might be worth testing various options to msgmerge to see if the behaviour can be changed.

Current options would look like:

msgmerge --no-wrap --quiet --backup=none --update foo.po

Try it without --update, also trying the new code path in https://bugzilla.redhat.com/show_bug.cgi?id=661569 would be worth a shot.

Comment 4 Jeff Fearn 🐞 2011-04-18 06:10:35 UTC
This should have been fixed by #661569, requires testing.

Comment 5 Michael Hideo 2012-06-08 01:22:26 UTC
create xml file that contains screen element with a carriage return before closing tag. then create pot file. check for empty strings.

Comment 6 Michael Hideo 2012-06-08 01:29:17 UTC
create xml file that contains screen element with a carriage return before closing tag. then create pot file. check for empty strings.

Comment 7 Hedda Peters 2012-06-14 02:03:34 UTC
Verified.

I followed Rudi's steps 1-7 to reproduce using the attached example XML file.
Publican still produces those lines with only two sets of quote marks in the po-file. After removing them manually and updating the po-file, it does not add them in again - as described by Rudi. 

However, running publican lang_stats is now producing consistent statistics, with or without those lines the results for steps 2, 4 and 7 are the same.

This matches the expected result in the OP, hence verified.


Note You need to log in before you can comment on or make changes to this bug.