Bug 1215274 - Should be able to specify minimum percentage completion on pull
Summary: Should be able to specify minimum percentage completion on pull
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Zanata
Classification: Retired
Component: Component-zanata-client
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: client-3.7
Assignee: Patrick Huang
QA Contact: Ding-Yi Chen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-24 20:55 UTC by stephane
Modified: 2015-07-30 01:57 UTC (History)
7 users (show)

Fixed In Version: Commit a0cd16978d6590b6509ec8fab5f8adff2fb8c521
Doc Type: Bug Fix
Doc Text:
Story Points: 2
Clone Of:
Environment:
Last Closed: 2015-07-30 01:57:52 UTC
Embargoed:


Attachments (Terms of Use)

Description stephane 2015-04-24 20:55:52 UTC
We currently use Transifex and are looking to switch over to Zanata. However, in trying to replicate our workflow, we found that we can't specify a minimum percentage of accepted translations when we do a pull using zanata-cli. We can specify this when we use the Transifex client to pull. This functionality is important to us as we don't want to pull down translations which aren't very complete.

Comment 1 Michelle Kim 2015-05-06 01:38:28 UTC
Hi Stephane,

We would like to clarify one point while discussing the implementation details:

Do you expect this threshold command run against project level or document level? because it changes how we implement. 

For example, if you set the minimum percentage to 80%, and if there are two documents in that language one with 100% translated and the other 78% translated into French, would you still want to download the whole sets of documents as the overall percentage for French is above 80%?

And also if there are old files in the folder with everything pulled and you run another command with 80% percent as minimum threshold, do you expect the old files with less than 80% all cleared?

Thanks
Michelle

Comment 2 Andreas Jaeger 2015-05-07 19:08:54 UTC
Let me answer instead of Stephane here.

In OpenStack we use these two scenarios:
1) For most projects, we use 75 % for all translated files
2) For one project, we use 75 % for most files and a different value for two other files

The goal is to import new translations for the first time only if they are sufficiently translated (75 %) but then keep updating them even if they translation rate is getting lower.


What we do right now is the following:

1)
# Download new files that are at least 75 % translated.
# Also downloads updates for existing files that are at least 75 %
# translated.
tx pull -a -f --minimum-perc=75

# Pull upstream translations of all downloaded files but do not
# download new files.
tx pull -f

The effect of this is that we update existing files and get new files that are at least 75 % translated.

2)
Setup the config file with percentages per file

# Download new files.
# Also downloads updates for existing files that are
# translated to a certain amount as configured in the config file
tx pull -a -f

# Pull upstream translations of all downloaded files but do not
# download new files.
# Use lower percentage here to update the existing files.
tx pull -f --minimum-perc=50


A download of files where minimum percentage is given only updates files with that minimum percentage, it does not delete any already downloaded files.

The behaviour that transifex has - as documented above - serves our needs well. 


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We also have some postprocessing script (in bash) run afterwards to delete downloaded files with less than 20 % translation. So, our full use case is:

1)
* Download new files with more than 75 %
* Update all downloaded files
* Delete files with less than 20 % - done via bash script

Comment 3 Carlos Munoz 2015-05-08 06:32:55 UTC
Hi Andreas, thanks for the explanation. I think the design we have in mind will cover *most* of your needs. The only that it won't cover is the per-file assignment of minimum percentages per project (i.e. The minimum percentage is a command line option only that will apply to all files being downloaded).

So, having said that, the answer to Michelle's questions are:
1. The minimum translation percentage has to be evaluated on a document basis (not overall).
2. Already written files should not be cleared, even if they don't satisfy the minimum percentage. As I understood, your bash postprocessing script will take care of some of that.

Let me know if you have any questions.

Comment 4 Andreas Jaeger 2015-05-08 07:38:20 UTC
Hi Carlos,

This sounds good and will cover most projects - and we can handle the odd case differently.

> 2. Already written files should not be cleared, even if they don't satisfy the > minimum percentage. As I understood, your bash postprocessing script will take > care of some of that.

Correct, my bash script takes care of that.

Comment 5 Patrick Huang 2015-05-19 01:44:11 UTC
https://github.com/zanata/zanata-client/pull/63

Comment 6 Ding-Yi Chen 2015-05-21 05:29:03 UTC
Note that with --min-doc-percent 100, only fully translated documents are downloaded.

However, in other number, round-up will be apply, i.e. 
document with 94.98% are still downloaded with --min-doc-percent 95.

Comment 7 Ding-Yi Chen 2015-05-26 06:18:02 UTC
Hi Andreas,

Should the percent word base or message base?

For example,

You have 100 messages in a document, 99 of them are one word message and 1 has 99 words. 

The 100 word-messages is translated, others is not.

Do you prefer to see the document 
1% translated (message based), 
or 50% translated (word based)

Comment 8 Ding-Yi Chen 2015-05-27 08:13:40 UTC
After team discuss, we pick message-base statistic because this option is mostly for project maintainers. And other translation service like Gnome use message base statics by default.

https://l10n.gnome.org/languages/cs/gnome-gimp/ui-part/

We also have other maintainers request to see the message-base statistics.

To view the message base statistics:

1. From project-version page -> LANG_YOU_INTEREST -> ANY_DOCUMENT,
2. click on the breadcomb -> LANG_YOU_INTEREST
3. On Radio box "Stats by", choose Message

Then you can see the message base statistics.

Comment 9 Ding-Yi Chen 2015-05-27 08:14:53 UTC
VERIFIED with zanata-client-3.7.0-SNAPSHOT
Commit a0cd16978d6590b6509ec8fab5f8adff2fb8c521

Comment 10 Ding-Yi Chen 2015-05-28 00:33:33 UTC
Merge commit d13462f1dd31e37438a634743fe5a2861bbadf4e

Comment 11 Andreas Jaeger 2015-05-29 19:56:47 UTC
I agree with the message base statistics! Thanks for implementing it!


Note You need to log in before you can comment on or make changes to this bug.