Bug 872039 - Escaping with single-quote (a.k.a. Apostrophes ') character in MessageFormat strings can cause confusing validation warnings
Summary: Escaping with single-quote (a.k.a. Apostrophes ') character in MessageFormat ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Zanata
Classification: Retired
Component: Component-Logic
Version: development
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 2.1
Assignee: David Mason
QA Contact: Ding-Yi Chen
URL:
Whiteboard:
: 856019 1113439 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-11-01 04:40 UTC by Alex Eng
Modified: 2014-07-03 00:47 UTC (History)
6 users (show)

Fixed In Version: 2.1-SNAPSHOT (20121217-0033)
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-26 04:06:27 UTC
Embargoed:


Attachments (Terms of Use)

Description Alex Eng 2012-11-01 04:40:03 UTC
Description of problem:


The following string is near the end of
java/code/src/com/redhat/rhn/frontend/strings/java/StringResource in
satellite 5.5


"""The following is a list of errors gathered while Spacewalk attempts to
synchronize kickstart distributions from Spacewalk to Cobbler.  These
errors must be corrected for the distributions to be available for
kickstarting systems:

{0}

Please check:

/var/log/rhn/rhn_taskomatic_daemon.log and
/var/log/tomcat5/catalina.out
/var/log/cobbler/cobbler.log

for more detailed errors.  If you don't resolve the errors the kickstart
tree will not be usable for kickstarting.
        """

It has the translation:


"""Ce qui suit est une liste d'erreurs à la suite des tentatives de
synchronisation
des distributions kickstart de Spacewalk à Cobbler. Ces
erreurs doivent être corrigées pour que les distributions puissent être
disponibles
 pour les kickstarting systèmes :

{0}

Veuillez cocher:
/var/log/rhn/rhn_taskomatic_daemon.log et
/var/log/tomcat5/catalina.out
/var/log/cobbler/cobbler.log

 pour des erreurs plus détaillées.  Si vous ne pouvez pas résoudre les
erreurs de l'arborescence de kickstart
ne sera pas utilisable pour kickstarting."""

But it is shown as having a missing variable {0}, even though it is there.

Comment 1 David Mason 2012-11-01 22:40:37 UTC
This is caused by having an uneven number of apostrophies (') before the variable, which makes the validator consider the variable escaped. In the example above, apostrophes in "d'erreurs" and "l'arborescence" would be considered the beginning and end of a quoted string, which includes the first {0}

In a Java format string, for normal cases, where there are a pair of apostrophes surrounding any other characters, the characters they are surrounding are 'escaped' so will not be interpreted as variables. The apostrophes are also removed in this case. Also if there is a single apostrophe without another to close it, the remainder of the string after the apostrophe is considered quoted and the apostrophe is removed.

Generally, to insert a single apostrophe such as in "d'erreurs", the apostrophe should be doubled like so: "d''erreurs". This allows Java to distinguish it from the beginning or end of an escaped stretch of characters.

There are a few more complexities to how apostrophes are handled by Java MessageFormat, which are described under "Patterns and Their Interpretation" at:

http://docs.oracle.com/javase/1.4.2/docs/api/java/text/MessageFormat.html

where they give the warning:

"The rules for using quotes within message format patterns unfortunately have shown to be somewhat confusing. In particular, it isn't always obvious to localizers whether single quotes need to be doubled or not..."


In addition, the above only applies to strings that will be used with Java MessageFormat. Unfortunately there is no reliable way to detect whether this is the case, and it is recommended that source comments are included to indicate where this is the case.

Comment 2 Sean Flanigan 2012-11-05 23:39:39 UTC
We could generate a warning when a string contains both {0} and an odd number of apostrophes (').

We could also generate a warning if the target contains single quotes around some other characters (eg 'one or more words', or '{0}'), when the source does not have quotes around anything, or when the source only uses double apostrophes to represent literal apostrophes.

Comment 3 Runa Bhattacharjee 2012-12-06 10:25:45 UTC
Is there any way this could be marked as a different kind of a warning on the editor, instead of a 'validation warning for missing variables'?

Comment 4 Sean Flanigan 2012-12-06 23:43:03 UTC
I think the warnings I mentioned should cover it, in fact I think we just need to add these two warnings (only active if source contains {0}):

1. "translation contains an odd number of apostrophes; this may cause other warnings"
2. "translation uses single quotes around something, whereas source does not; this may cause other warnings" [whenever the regex "'[^']+'" is found, ie one or more characters inside single quotes]

I think these warnings would be in addition to the existing warnings, but they should be listed first.

Comment 5 David Mason 2012-12-06 23:44:35 UTC
Runa, do you mean detecting when variables are 'missing' specifically because they are between quotes in the translation, rather than because they are not in the translation at all? If so that should be possible. Does the following example describe what you mean?

Source: "{0} {1} {2}"
Target: "{0} '{1}'"

Warnings:
 - missing variable {2}
 - unexpected quoting of variable {1}

Comment 6 Sean Flanigan 2012-12-07 00:05:22 UTC
That would produce more meaningful error messages than my warning #2, but I think we still need to add my warning #1 about odd numbers of apostrophes (which probably indicates an apostrophe they forgot to double).  

We would need to make sure we can handle other text being inside the quotes with the accidentally quoted variable:

"Sorry Dave, I'm {0} I can't do that."

And we should probably only warn about the quoted {0} if we can't also find {0} outside the quotes.  So this should not generate a warning:

Source: "{0} '{1}' {1} {2}"
Target: "{0} '{1}' {1} {2}"


Oh, and the quoting warnings should also suggest that any apostrophes added by the translator will need to be doubled, eg like this:

"Sorry Dave, I''m {0} I can''t do that."

Of course, all of this only applies if the source string contains a variable like {0}, which is our best indication that MessageFormat will be used for that string.

Comment 7 Runa Bhattacharjee 2012-12-07 10:26:53 UTC
(In reply to comment #5)
> Runa, do you mean detecting when variables are 'missing' specifically
> because they are between quotes in the translation, rather than because they
> are not in the translation at all? If so that should be possible. Does the
> following example describe what you mean?
> 
> Source: "{0} {1} {2}"
> Target: "{0} '{1}'"
> 
> Warnings:
>  - missing variable {2}
>  - unexpected quoting of variable {1}

Well at times, it is not even a valid scenario for a warning. For instance, if we use Sean's example:

"Sorry Dave, I'm {0} I can't do that."

and the apostrophe is completely removed during translation for a different script which does not use it. 

Source: "Sorry Dave, I'm {0} I can't do that."
Target: "<translated text in Indic text> {0} <translated text in Indic text> "

This shows an error presently for 'missing variable'.

Comment 8 Sean Flanigan 2012-12-10 07:40:09 UTC
Actually, the string "Sorry Dave, I'm {0} I can't do that." would be invalid for English too.  If {0} is meant to be treated as a MessageFormat variable, the correct string is "Sorry Dave, I''m {0} I can''t do that."

Comment 9 David Mason 2012-12-11 23:51:43 UTC
Added warnings in 2.1-SNAPSHOT
 - when number of non-doubled apostrophes does not match between source and translation
 - when there are any characters quoted in translation if none are quoted in source
 - when variables are quoted in source but not in translation
 - when variables are quoted in translation but not in source

See: https://github.com/zanata/zanata/commit/bc52cbc56f2ec415b9c06511ce1127fadc7bc139

Comment 10 Ding-Yi Chen 2012-12-17 04:42:01 UTC
VERIFIED with Zanata version 2.1-SNAPSHOT (20121217-0033)

Comment 11 Ding-Yi Chen 2014-07-03 00:40:51 UTC
*** Bug 856019 has been marked as a duplicate of this bug. ***

Comment 12 Ding-Yi Chen 2014-07-03 00:47:39 UTC
*** Bug 1113439 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.