Bug 814599

Summary: RFE: %x validation should support positional printf strings (eg %1$s)
Product: [Retired] Zanata Reporter: Ding-Yi Chen <dchen>
Component: Component-UIAssignee: Patrick Huang <pahuang>
Status: CLOSED CURRENTRELEASE QA Contact: Ding-Yi Chen <dchen>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 1.6-SNAPSHOTCC: damason, pahuang, sflaniga, zanata-bugs
Target Milestone: ---   
Target Release: 1.7   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 1.7-SNAPSHOT (20120710-0026) Doc Type: Bug Fix
Doc Text:
Cause User wants to use positional printf variables in translation. Consequence Positional printf variables will be marked as invalid by current normal printf validation rule. Change added a new validation rule to validate against positional printf. This rule and the normal printf rule is mutually exclusive so that users won't get confusing results. Result User will be able to use positional printf variables in their translation and still being able to validate.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-11 05:11:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ding-Yi Chen 2012-04-20 08:08:32 UTC
Description of problem:
According to man 3 printf,
one can specify explicitly which argument is taken, at each place where an argument is required in format string.

That is, format string like "%1$s %3$lu %2$s" is valid.

However, Zanata still show validation warnings like:

Missing variables: %s, %s, and %lu
Unexpected variables: %1$s, %3$lu, and %2$s

Version-Release number of selected component (if applicable):
Zanata version 1.6.0-alpha-2-SNAPSHOT (20120420-0001)

How reproducible:
Always

Steps to Reproduce:
1. On project Tar, version 1.26, locale zh-Hans (or zh-CN), document "tar"
2. Search "read error"
3. See source entry: %s: Read error at byte %s, while reading %lu byte
   While target is: %1$s:读入 %3$lu 字节时,在 %2$s 字节处发生读错误
  
Actual results:
Validation Warnings:
Missing variables: %s, %s, and %lu
Unexpected variables: %1$s, %3$lu, and %2$s

Expected results:
No warning, as they are valid and proper translated format string.

Additional info:

Comment 1 Sean Flanigan 2012-06-01 04:55:17 UTC
I'm not sure which versions of printf actually support the extension for positional parameters, but there's a bit of info here:

http://lua-users.org/lists/lua-l/2009-03/msg00450.html

and here:

http://pubs.opengroup.org/onlinepubs/009695399/functions/printf.html

Comment 2 Ding-Yi Chen 2012-07-02 06:37:52 UTC
It's in the POSIX.1-2001 (with two technical corrigenda), published in 2004. Same as the 2nd link provided in comment 1.

In other words,  it's already around for quite some time.

Comment 3 Patrick Huang 2012-07-02 07:25:14 UTC
Taking from http://pubs.opengroup.org/onlinepubs/009695399/functions/printf.html
============================================================================
[XSI]  Conversions can be applied to the nth argument after the format in the argument list, rather than to the next unused argument. In this case, the conversion specifier character % (see below) is replaced by the sequence "%n$", where n is a decimal integer in the range [1,{NL_ARGMAX}], giving the position of the argument in the argument list. This feature provides for the definition of format strings that select arguments in an order appropriate to specific languages (see the EXAMPLES section).

The format can contain either numbered argument conversion specifications (that is, "%n$" and "*m$"), or unnumbered argument conversion specifications (that is, % and * ), but not both. The only exception to this is that %% can be mixed with the "%n$" form. The results of mixing numbered and unnumbered argument specifications in a format string are undefined. When numbered argument specifications are used, specifying the Nth argument requires that all the leading arguments, from the first to the (N-1)th, are specified in the format string.

============================================================================

I suggest we check for numbered variables in target string. If it contains valid numbering, which means:
1. no mixed unnumbered variables appeared.
2. number position is valid from 1 to n.
Then we strip the number out and validate them as per normal.
If the numbering is invalid, it will not strip number and will be validate against the raw string. Error message won't be as precise as to indicate which offending variable it is.
 
This way we can prevent false positive like in above example without having to do too much extra checking.

Comment 4 Patrick Huang 2012-07-05 03:05:52 UTC
New requirement:

Add specific warning for different failing reasons. i.e.
1. mixed unnumbered and numbered variables
2. number position invalid

Comment 5 Ding-Yi Chen 2012-07-05 03:14:26 UTC
As tested with 1.7-SNAPSHOT (20120705-0026)

The functionality is there, however, the warning can be more informative by following way:

1. Show "numbered arguments cannot mixed with unumbered arguments"
   if detected.

2. For numbered arguments, Shows
   "xxx is out of range", 
   "xxx is duplicated"
   "xxx is missing"

For example, if 

Source:  invalid argument %s for %s
Target: %2$s 的参数 %3$s 无效

Actual:
Missing variables: %s and %s
Unexpected variables: %2$s and %3$s

Expected:
"%3$s" is out of range
"%1$s" is missing

Comment 6 Patrick Huang 2012-07-06 01:53:33 UTC
committed into master:
https://github.com/zanata/zanata/commit/6c3ffbde82e025c63d36a8653a885d2182d4ae9a

Comment 7 Sean Flanigan 2012-07-06 06:36:18 UTC
(In reply to comment #2)
> It's in the POSIX.1-2001 (with two technical corrigenda), published in 2004.
> Same as the 2nd link provided in comment 1.
> 
> In other words,  it's already around for quite some time.

Thanks.  Note that the "%n$" bits of that document [1] are marked as an XSI extension. They are optional extensions to POSIX printf, which are required for XSI conformance.

So POSIX printf is not guaranteed to support positional argument, let alone non-POSIX printf.  Translators can't use XSI printf strings in every project.  

So we really need to have two validators - one for ordinary printf, and one for XSI (positional) printf.  


[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/printf.html

Comment 8 Patrick Huang 2012-07-09 06:06:15 UTC
Split into two separate validators. One for normal use one for positional variables. These two validators are mutually exclusive which means user can at most turn on one at a time.

committed into master:
https://github.com/zanata/zanata/commit/a83d909e4213fd4dd3674df80d62985b984c3363

Comment 9 Ding-Yi Chen 2012-07-10 01:50:06 UTC
VERIFIED with Zanata version 1.7-SNAPSHOT (20120710-0026)