Red Hat Bugzilla – Bug 740122
Make \n in source text visible in the translation editor
Last modified: 2013-03-03 21:22:06 EST
When importing translations from zanata, the following msgfmt errors are encountered.
for lang in as bn de en es fr gu hi it ja kn ko ml mr or pa pt pt_BR ru ta te zh_CN zh_TW ; do \
echo $lang ; \
mkdir -p po/build/$lang/LC_MESSAGES/ ; \
msgfmt -c --statistics -o po/build/$lang/LC_MESSAGES/rhsm.mo po/$lang.po ; \
po/as.po:54: `msgid' and `msgstr' entries do not both end with '\n'
po/as.po:724: `msgid' and `msgstr' entries do not both end with '\n'
po/as.po:758: `msgid' and `msgstr' entries do not both end with '\n'
Per a discussion on IRC, it seems that the \n is not preserved and shown to the translator. This causes inadvertent mis-translations.
IRC transcript is below:
(09:22:32 PM) Hedda: sflanigan: bryan_kearney1: The problem might be that Zanata doesn't show the translators those \n in the original message - so we don't know that we actually need to add one of those to our translation, too. Validation checks will then complain if not both messages end in \n.
(09:22:33 PM) bryan_kearney1: http://git.fedorahosted.org/git/?p=subscription-manager.git
(09:22:43 PM) bryan_kearney1: sflanigan: you can see the commit and revert
(09:23:03 PM) bryan_kearney1: Hedda: as a nOOb, that seems like a critical issue
(09:23:19 PM) Hedda: it is indeed - one I only just found out about
(09:23:25 PM) sflanigan: Would something like that break msgfmt though?
(09:23:53 PM) bryan_kearney1: sflanigan: that is what those errors are
(09:24:05 PM) sflanigan: Oh, I see. right
(09:24:52 PM) Hedda: same issue just broke my build of a documentation... the translation had \n inserted into tags, which I didn't see on Zanata.
(09:24:59 PM) Hedda: only publican complained later
(09:26:59 PM) Hedda: so yeah, sflanigan, this is actually a very important new issue :-(
(09:27:02 PM) CIA-82 left the room (quit: Ping timeout: 260 seconds).
(09:27:26 PM) bryan_kearney1: how does this get raised.. I am waiting on translations to fix bugs.. but can not use exports from zanata.
(09:27:27 PM) sflanigan: Yes
(09:27:44 PM) CIA-82 [~CIA@cia.atheme.org] entered the room.
(09:29:31 PM) jni [~email@example.com] entered the room.
(09:32:23 PM) bryan_kearney1: sflanigan: do I raise a bug? Cry wolf? something else?
(09:33:11 PM) Hedda: bryan_kearney1: I think mospina is already working on it... that's what he just told me
(09:33:15 PM) sflanigan: Raise a zanata bug, but it's going to take a while to do anything from that side of things. We'll have to find a way of fixing the .po files and pushing them back to zanata
"msgfmt -c" checks a number of things which Zanata doesn't, including format strings and the gettext header.
In this case, most of the invalid messages were just missing newlines at the end of the string, but I think there was one Python string formatting error too. ('%' instead of '%s')
Making the source-language newlines visible to translators would help a lot, so we should definitely do that within the scope of this bug.
We also need to do something about format checking generally.
The table will now show newlines with the ¶ character (not in the textareas, only for source language cells or target cells which have been saved).
Runa, let me know if you want an option to turn this feature off. But it might be safer to leave it on at all times, to reduce the likelihood of missing/extra newlines.
This change is in the 1.4 branch.
I suggest it be kept on, but we need to find a way to specifically indicate that it represents a new line escape character and not anything else.
Runa, can you suggest a good way of indicating that? I think most people will work out what's going on once they've inserted a newline and seen it turn into a ¶.
Perhaps we can go with what we have now for Zanata 1.4.2, and create a new bug for any enhancements.
I agree that we should check that leading/trailing newlines match too (bug 746140), but it will take longer, because we haven't created the validation framework yet.
Runa, do you think it is sufficient to show ¶ after Save, or when you are editing?
I really should have asked this before - was there any specific demand earlier
for the escape characters to be converted? This is a major deviation from
standard gettext conventions and I am wondering why this was implemented in the
No specific prior demand from the UI side, it's just that any PO reading library has to handle escape characters properly, and that usually involves converting \n to newline when loading. To convert it back to \n for the editor would actually be extra work for us, and we haven't had any demand for that before now.
BTW, this is not a deviation from gettext. The two characters "\n" in a PO file encode a newline. If you read it with standard gettext (eg the C function "gettext()"), you get a newline, not a backslash and an n.
In other words, by the time we see the string in Zanata, it's already a newline. We can represent it in the editor however you like, but the more complex we get, the longer it will take, so I suggest we start with something simple.
As mentioned by a couple of translators here:
not every translator wants to worry about Gettext escaping rules.
Not every document in Zanata is a PO document anyway. Escaping is different in gettext, in Properties and in XLIFF.
We could certainly look at implementing a feature to escape newlines/tabs/etc when when opening a message for editing, and back again when saving, but this will need more testing to make sure the escaping is always correct. And it would probably be best to make it optional because of the points above.
To describe a potential corner case:
For a message like the following where the msgid has new line characters both within and outside the string, how would the ¶ be displayed and eventually saved?
" SELinux denied access requested by $SOURCE. The current boolean \n"
" settings do not allow this access. If you have not setup $SOURCE to\n"
" require this access this may signal an intrusion attempt. If you do "
" this access you need to change the booleans on this system to allow \n"
" the access.\n"
This specific example is from setroubleshoot-framework.
I'm not sure what you mean by within and outside the string, but it would look like this (minus the quotes) in the table:
SELinux denied access requested by $SOURCE. The current boolean ¶
settings do not allow this access. If you have not setup $SOURCE to¶
require this access this may signal an intrusion attempt. If you do
this access you need to change the booleans on this system to allow ¶
As before, those characters will still be saved as newlines in the database, and converted to \n when generating PO files.
I checked the source for this string, which is here:
Looks like those line breaks are some wraps that are implemented for long strings when the .PO files are created. So
Sean, Dchen, - Please allow the ¶ to be displayed at all times. Thanks.
VERIFIED with Zanata version 1.4.2-SNAPSHOT (20111017-1623)