740122 – Make \n in source text visible in the translation editor

Bug 740122 - Make \n in source text visible in the translation editor

Summary: Make \n in source text visible in the translation editor

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Zanata
Classification:	Retired
Component:	Usability
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	1.4.2
Assignee:	Runa Bhattacharjee
QA Contact:	Ding-Yi Chen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Zanata-1.4.2
TreeView+	depends on / blocked

Reported:	2011-09-21 01:50 UTC by Bryan Kearney
Modified:	2013-03-04 02:22 UTC (History)
CC List:	4 users (show)
Fixed In Version:	1.4.2-SNAPSHOT (20111017-1623)
Story Points:	---
Clone Of:
Environment:
Last Closed:	2011-10-28 07:02:47 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	744277	0	urgent	CLOSED	Python client should not import .po files with stray quotes	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	746140	0	urgent	CLOSED	Check that leading/trailing newlines match between source and target	2021-02-22 00:41:40 UTC

Internal Links: 744277 746140

Description Bryan Kearney 2011-09-21 01:50:14 UTC

When importing translations from zanata, the following msgfmt errors are encountered.

for lang in as bn de en es fr gu hi it ja kn ko ml mr or pa pt pt_BR ru ta te zh_CN zh_TW ; do \
		echo $lang ; \
		mkdir -p po/build/$lang/LC_MESSAGES/ ; \
		msgfmt -c --statistics -o po/build/$lang/LC_MESSAGES/rhsm.mo po/$lang.po ; \
	done
as
po/as.po:54: `msgid' and `msgstr' entries do not both end with '\n'
po/as.po:724: `msgid' and `msgstr' entries do not both end with '\n'
po/as.po:758: `msgid' and `msgstr' entries do not both end with '\n'

Per a discussion on IRC, it seems that the \n is not preserved and shown to the translator. This causes inadvertent mis-translations.

IRC transcript is below:

(09:22:32 PM) Hedda: sflanigan: bryan_kearney1: The problem might be that Zanata doesn't show the translators those \n in the original message - so we don't know that we actually need to add one of those to our translation, too.  Validation checks will then complain if not both messages end in \n.
(09:22:33 PM) bryan_kearney1: http://git.fedorahosted.org/git/?p=subscription-manager.git
(09:22:43 PM) bryan_kearney1: sflanigan: you can see the commit and revert
(09:23:03 PM) bryan_kearney1: Hedda: as a nOOb, that seems like a critical issue
(09:23:19 PM) Hedda: it is indeed - one I only just found out about
(09:23:25 PM) sflanigan: Would something like that break msgfmt though?
(09:23:53 PM) bryan_kearney1: sflanigan: that is what those errors are
(09:24:05 PM) sflanigan: Oh, I see. right
(09:24:52 PM) Hedda: same issue just broke my build of a documentation... the translation had \n inserted into tags, which I didn't see on Zanata.
(09:24:59 PM) Hedda: only publican complained later
(09:26:59 PM) Hedda: so yeah, sflanigan, this is actually a very important new issue :-(
(09:27:02 PM) CIA-82 left the room (quit: Ping timeout: 260 seconds).
(09:27:26 PM) bryan_kearney1: how does this get raised.. I am waiting on translations to fix bugs.. but can not use exports from zanata.
(09:27:27 PM) sflanigan: Yes
(09:27:44 PM) CIA-82 [~CIA.org] entered the room.
(09:29:31 PM) jni [~jni.244.88] entered the room.
(09:32:23 PM) bryan_kearney1: sflanigan: do I raise a bug? Cry wolf? something else?
(09:33:11 PM) Hedda: bryan_kearney1: I think mospina is already working on it... that's what he just told me
(09:33:15 PM) sflanigan: Raise a zanata bug, but it's going to take a while to do anything from that side of things.  We'll have to find a way of fixing the .po files and pushing them back to zanata

Comment 1 Sean Flanigan 2011-09-21 02:00:31 UTC

"msgfmt -c" checks a number of things which Zanata doesn't, including format strings and the gettext header.

In this case, most of the invalid messages were just missing newlines at the end of the string, but I think there was one Python string formatting error too. ('%' instead of '%s')

Making the source-language newlines visible to translators would help a lot, so we should definitely do that within the scope of this bug. 

We also need to do something about format checking generally.

Comment 3 Sean Flanigan 2011-10-13 04:37:11 UTC

The table will now show newlines with the ¶ character (not in the textareas, only for source language cells or target cells which have been saved).

Runa, let me know if you want an option to turn this feature off.  But it might be safer to leave it on at all times, to reduce the likelihood of missing/extra newlines.

This change is in the 1.4 branch.

Comment 5 Runa Bhattacharjee 2011-10-14 02:24:18 UTC

I suggest it be kept on, but we need to find a way to specifically indicate that it represents a new line escape character and not anything else.

Comment 6 Sean Flanigan 2011-10-14 03:15:27 UTC

Runa, can you suggest a good way of indicating that?  I think most people will work out what's going on once they've inserted a newline and seen it turn into a ¶.

Perhaps we can go with what we have now for Zanata 1.4.2, and create a new bug for any enhancements.

I agree that we should check that leading/trailing newlines match too (bug 746140), but it will take longer, because we haven't created the validation framework yet.

Comment 7 Ding-Yi Chen 2011-10-14 04:48:31 UTC

Runa, do you think it is sufficient to show ¶ after Save, or when you are editing?

Comment 8 Runa Bhattacharjee 2011-10-14 06:21:02 UTC

I really should have asked this before - was there any specific demand earlier
for the escape characters to be converted? This is a major deviation from
standard gettext conventions and I am wondering why this was implemented in the
first place.

Comment 9 Sean Flanigan 2011-10-14 08:01:06 UTC

No specific prior demand from the UI side, it's just that any PO reading library has to handle escape characters properly, and that usually involves converting \n to newline when loading.  To convert it back to \n for the editor would actually be extra work for us, and we haven't had any demand for that before now.

BTW, this is not a deviation from gettext.  The two characters "\n" in a PO file encode a newline.  If you read it with standard gettext (eg the C function "gettext()"), you get a newline, not a backslash and an n.

In other words, by the time we see the string in Zanata, it's already a newline.   We can represent it in the editor however you like, but the more complex we get, the longer it will take, so I suggest we start with something simple.


As mentioned by a couple of translators here:
  http://blog.transifex.net/2010/04/agile-project-development/
not every translator wants to worry about Gettext escaping rules.

Not every document in Zanata is a PO document anyway.  Escaping is different in gettext, in Properties and in XLIFF.


We could certainly look at implementing a feature to escape newlines/tabs/etc when when opening a message for editing, and back again when saving, but this will need more testing to make sure the escaping is always correct.  And it would probably be best to make it optional because of the points above.

Comment 10 Runa Bhattacharjee 2011-10-14 09:47:57 UTC

To describe a potential corner case:

For a message like the following where the msgid has new line characters both within and outside the string, how would the ¶ be displayed and eventually saved?

msgid ""
"\n"
"\n"
"    SELinux denied access requested by $SOURCE. The current boolean \n"
"    settings do not allow this access.  If you have not setup $SOURCE to\n"
"    require this access this may signal an intrusion attempt. If you do "
"intend \n"
"    this access you need to change the booleans on this system to allow \n"
"    the access.\n"
"    "
msgstr ""



This specific example is from setroubleshoot-framework.

Comment 11 Sean Flanigan 2011-10-17 00:56:34 UTC

I'm not sure what you mean by within and outside the string, but it would look like this (minus the quotes) in the table:
-----------------------------------------
¶
¶
    SELinux denied access requested by $SOURCE. The current boolean ¶
    settings do not allow this access.  If you have not setup $SOURCE to¶
    require this access this may signal an intrusion attempt. If you do 
intend ¶
    this access you need to change the booleans on this system to allow ¶
    the access.¶
    
-----------------------------------------

As before, those characters will still be saved as newlines in the database, and converted to \n when generating PO files.

Comment 12 Runa Bhattacharjee 2011-10-17 08:22:03 UTC

I checked the source for this string, which is here:

https://fedorahosted.org/setroubleshoot/browser/plugins/src/catchall_boolean.py#L35

Looks like those line breaks are some wraps that are implemented for long strings when the .PO files are created. So 

Sean, Dchen, - Please allow the ¶ to be displayed at all times. Thanks.

Comment 13 Ding-Yi Chen 2011-10-18 03:04:00 UTC

VERIFIED with Zanata version 1.4.2-SNAPSHOT (20111017-1623)

Note You need to log in before you can comment on or make changes to this bug.