Description of problem: The following string is near the end of java/code/src/com/redhat/rhn/frontend/strings/java/StringResource in satellite 5.5 """The following is a list of errors gathered while Spacewalk attempts to synchronize kickstart distributions from Spacewalk to Cobbler. These errors must be corrected for the distributions to be available for kickstarting systems: {0} Please check: /var/log/rhn/rhn_taskomatic_daemon.log and /var/log/tomcat5/catalina.out /var/log/cobbler/cobbler.log for more detailed errors. If you don't resolve the errors the kickstart tree will not be usable for kickstarting. """ It has the translation: """Ce qui suit est une liste d'erreurs à la suite des tentatives de synchronisation des distributions kickstart de Spacewalk à Cobbler. Ces erreurs doivent être corrigées pour que les distributions puissent être disponibles pour les kickstarting systèmes : {0} Veuillez cocher: /var/log/rhn/rhn_taskomatic_daemon.log et /var/log/tomcat5/catalina.out /var/log/cobbler/cobbler.log pour des erreurs plus détaillées. Si vous ne pouvez pas résoudre les erreurs de l'arborescence de kickstart ne sera pas utilisable pour kickstarting.""" But it is shown as having a missing variable {0}, even though it is there.
This is caused by having an uneven number of apostrophies (') before the variable, which makes the validator consider the variable escaped. In the example above, apostrophes in "d'erreurs" and "l'arborescence" would be considered the beginning and end of a quoted string, which includes the first {0} In a Java format string, for normal cases, where there are a pair of apostrophes surrounding any other characters, the characters they are surrounding are 'escaped' so will not be interpreted as variables. The apostrophes are also removed in this case. Also if there is a single apostrophe without another to close it, the remainder of the string after the apostrophe is considered quoted and the apostrophe is removed. Generally, to insert a single apostrophe such as in "d'erreurs", the apostrophe should be doubled like so: "d''erreurs". This allows Java to distinguish it from the beginning or end of an escaped stretch of characters. There are a few more complexities to how apostrophes are handled by Java MessageFormat, which are described under "Patterns and Their Interpretation" at: http://docs.oracle.com/javase/1.4.2/docs/api/java/text/MessageFormat.html where they give the warning: "The rules for using quotes within message format patterns unfortunately have shown to be somewhat confusing. In particular, it isn't always obvious to localizers whether single quotes need to be doubled or not..." In addition, the above only applies to strings that will be used with Java MessageFormat. Unfortunately there is no reliable way to detect whether this is the case, and it is recommended that source comments are included to indicate where this is the case.
We could generate a warning when a string contains both {0} and an odd number of apostrophes ('). We could also generate a warning if the target contains single quotes around some other characters (eg 'one or more words', or '{0}'), when the source does not have quotes around anything, or when the source only uses double apostrophes to represent literal apostrophes.
Is there any way this could be marked as a different kind of a warning on the editor, instead of a 'validation warning for missing variables'?
I think the warnings I mentioned should cover it, in fact I think we just need to add these two warnings (only active if source contains {0}): 1. "translation contains an odd number of apostrophes; this may cause other warnings" 2. "translation uses single quotes around something, whereas source does not; this may cause other warnings" [whenever the regex "'[^']+'" is found, ie one or more characters inside single quotes] I think these warnings would be in addition to the existing warnings, but they should be listed first.
Runa, do you mean detecting when variables are 'missing' specifically because they are between quotes in the translation, rather than because they are not in the translation at all? If so that should be possible. Does the following example describe what you mean? Source: "{0} {1} {2}" Target: "{0} '{1}'" Warnings: - missing variable {2} - unexpected quoting of variable {1}
That would produce more meaningful error messages than my warning #2, but I think we still need to add my warning #1 about odd numbers of apostrophes (which probably indicates an apostrophe they forgot to double). We would need to make sure we can handle other text being inside the quotes with the accidentally quoted variable: "Sorry Dave, I'm {0} I can't do that." And we should probably only warn about the quoted {0} if we can't also find {0} outside the quotes. So this should not generate a warning: Source: "{0} '{1}' {1} {2}" Target: "{0} '{1}' {1} {2}" Oh, and the quoting warnings should also suggest that any apostrophes added by the translator will need to be doubled, eg like this: "Sorry Dave, I''m {0} I can''t do that." Of course, all of this only applies if the source string contains a variable like {0}, which is our best indication that MessageFormat will be used for that string.
(In reply to comment #5) > Runa, do you mean detecting when variables are 'missing' specifically > because they are between quotes in the translation, rather than because they > are not in the translation at all? If so that should be possible. Does the > following example describe what you mean? > > Source: "{0} {1} {2}" > Target: "{0} '{1}'" > > Warnings: > - missing variable {2} > - unexpected quoting of variable {1} Well at times, it is not even a valid scenario for a warning. For instance, if we use Sean's example: "Sorry Dave, I'm {0} I can't do that." and the apostrophe is completely removed during translation for a different script which does not use it. Source: "Sorry Dave, I'm {0} I can't do that." Target: "<translated text in Indic text> {0} <translated text in Indic text> " This shows an error presently for 'missing variable'.
Actually, the string "Sorry Dave, I'm {0} I can't do that." would be invalid for English too. If {0} is meant to be treated as a MessageFormat variable, the correct string is "Sorry Dave, I''m {0} I can''t do that."
Added warnings in 2.1-SNAPSHOT - when number of non-doubled apostrophes does not match between source and translation - when there are any characters quoted in translation if none are quoted in source - when variables are quoted in source but not in translation - when variables are quoted in translation but not in source See: https://github.com/zanata/zanata/commit/bc52cbc56f2ec415b9c06511ce1127fadc7bc139
VERIFIED with Zanata version 2.1-SNAPSHOT (20121217-0033)
*** Bug 856019 has been marked as a duplicate of this bug. ***
*** Bug 1113439 has been marked as a duplicate of this bug. ***