Created attachment 479256 [details] example XML file Description of problem: Publican sometimes includes lines in msgid entries in PO files that contain nothing but two sets of quote marks. This happens when the corresponding XML file includes a <screen> element with a carriage return before the closing tag. For example: <screen> rhnpush --server=http://localhost/APP -c 'rhel-5.3-beta' -d /var/satellite/custom-distro/rhel-i386-server-5.3-beta/Server/ </screen> is represented in the PO file as: #. Tag: screen #, no-c-format msgid "\n" "rhnpush --server=http://localhost/APP -c 'rhel-5.3-beta' -d /var/satellite/custom-distro/rhel-i386-server-5.3-beta/Server/\n" "" msgstr "" If Publican updates a PO file with an msgid that matches except for the extra line, Publican doesn't add this line. For example, update_po does not change this entry: #. Tag: screen #, no-c-format msgid "\n" "rhnpush --server=http://localhost/APP -c 'rhel-5.3-beta' -d /var/satellite/custom-distro/rhel-i386-server-5.3-beta/Server/\n" msgstr "" We have examples of books where PO files lack these lines for reasons that are unclear. These lines aren't a problem in themselves, but Publican also counts each of these lines as a word when you run publican lang_stats -- which means that the word count for different languages might be different. This doesn't impact on translation directly, but can make life interesting for anyone managing a translation project; the word counts that lang_stats produces serve as a crude but handy checksum to make sure that all members of a translation team are working on up-to-date PO files. When different languages report different word counts, it's not immediately obvious that everyone's translating the same thing :) Version-Release number of selected component (if applicable): 2.5-1 How reproducible: 100% Steps to Reproduce: 1. generate a PO file from an XML file that includes a <screen> element with its closing tag on a new line 2. run lang_stats on the target language and note the result 3. edit the PO file to remove any lines in msgid entries that consist only of '""' 4. run lang_stats on the target language and note the result 5. run publican update_po to refresh the PO file 6. open the PO file to note that the '""' lines are not restored 7. run lang_stats on the PO file yet again and note the result Actual results: results in steps 4 and 7 are the same, but differ from result in step 2 Expected results: same results in steps 2, 4, and 7 Additional info:
Created attachment 479257 [details] PO file with the extra lines
Created attachment 479258 [details] PO file without the extra lines -- update_po doesn't add them in.
Merging POT and PO files is currently done by msgmerge from gettext, it might be worth testing various options to msgmerge to see if the behaviour can be changed. Current options would look like: msgmerge --no-wrap --quiet --backup=none --update foo.po Try it without --update, also trying the new code path in https://bugzilla.redhat.com/show_bug.cgi?id=661569 would be worth a shot.
This should have been fixed by #661569, requires testing.
create xml file that contains screen element with a carriage return before closing tag. then create pot file. check for empty strings.
Verified. I followed Rudi's steps 1-7 to reproduce using the attached example XML file. Publican still produces those lines with only two sets of quote marks in the po-file. After removing them manually and updating the po-file, it does not add them in again - as described by Rudi. However, running publican lang_stats is now producing consistent statistics, with or without those lines the results for steps 2, 4 and 7 are the same. This matches the expected result in the OP, hence verified.