1076995 – Zanata does not copy the most recent translation

Bug 1076995 - Zanata does not copy the most recent translation

Summary: Zanata does not copy the most recent translation

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Zanata
Classification:	Retired
Component:	Component-Logic, Component-CopyTrans
Sub Component:
Version:	3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Damian Jansen
QA Contact:	Zanata-QA Mailling List
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-03-17 00:50 UTC by Yuko Katabami
Modified:	2015-07-29 01:59 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Story Points:	---
Clone Of:
Environment:
Last Closed:	2015-07-29 01:59:46 UTC
Embargoed:
Flags:	ykatabam: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1077439	0	unspecified	CLOSED	RFE: Use lucene indexes to do Copy Trans.	2021-02-22 00:41:40 UTC

Internal Links: 1077439

Description Yuko Katabami 2014-03-17 00:50:02 UTC

Description of problem:
Zanata does not copy the most recent translation when update is pushed from PressGang.

Version-Release number of selected component (if applicable): 3.3


How reproducible: I am not sure.


Steps to Reproduce:
1. Push update of a topic which contains translation strings that has more than one 100% match translation available in TM
2. Check which translation memory is copied


Actual results:
Older translation is copied as fuzzy

Expected results:
It should pick the most recent translation

Additional info:
This is similar to the resolved bug:
https://bugzilla.redhat.com/show_bug.cgi?id=896332

Comment 7 Isaac Rooskov 2014-03-18 01:50:12 UTC

@lnewson @carlos

Wearing my Localization Supervisor hat: 
The above examples Yuko has provided mean that all books for RHEV 3.3.1 will need to be proofread again even though we undertook this for the GA of RHEV 3.3. This is a lot of extra work and something seems to have gone wrong, since incorrect strings were copied as "translated". 

Wearing my Zanata Product Manager hat:
@lnewson: Has there been any change between January 2014 and now as to how PressGang formulates the hash? Considering that we are working on a z-stream update, copytrans shouldn't of had any issues copying over the correct strings, however it seems to have copied across very old ones again :S 

Thanks for the assist guys! 

Isaac

Comment 8 Carlos Munoz 2014-03-18 02:07:16 UTC

I think we need to change the perception of Copy Trans a bit (among other technical aspects of it). There is no way that Zanata will get the exact desired translation every single time, specially if there are multiple translations for the same string in the system. Zanata won't be able to determine exactly which one is desired, save for the options that are given.

In this particular case, changes in PressGang have thrown Zanata's matching algorithm off. Knowing this, we should change the default copy trans settings for this project so that Zanata doesn't mark copied strings as 'Translated', or it simply doesn't look for strings outside the project. In general, maybe anything that is copied and that has more than one possible copy candidate should be forced to fuzzy.

If matches from other projects are still desired, then maybe a two-step copy trans could be done (as described in comment #2) to take advantage of Translation memory.

I will look at the newly reported cases and let you know my findings.

Comment 9 Lee Newson 2014-03-18 02:25:50 UTC

(In reply to Isaac Rooskov from comment #7)
> Wearing my Zanata Product Manager hat:
> @lnewson: Has there been any change between January 2014 and now as to how
> PressGang formulates the hash? Considering that we are working on a z-stream
> update, copytrans shouldn't of had any issues copying over the correct
> strings, however it seems to have copied across very old ones again :S 

Hey Isaac, no this was way before then. It would have been around about July 2012 if I had to guess.

As for the strings being copied as translated, that does seem weird as that would mean it's in the same project and has the same resId (since anything from another project should be marked fuzzy). I'm wondering if the translation has been copied in from another book, in which case we'll probably need to implement BZ#1066765 sooner rather than later. Anyways I'll take a look from our side of things today as well and see what I can find.

Comment 10 Yuko Katabami 2014-03-18 02:46:43 UTC

(In reply to Lee Newson from comment #9)
> (In reply to Isaac Rooskov from comment #7)
> > Wearing my Zanata Product Manager hat:
> > @lnewson: Has there been any change between January 2014 and now as to how
> > PressGang formulates the hash? Considering that we are working on a z-stream
> > update, copytrans shouldn't of had any issues copying over the correct
> > strings, however it seems to have copied across very old ones again :S 
> 
> Hey Isaac, no this was way before then. It would have been around about July
> 2012 if I had to guess.
> 
> As for the strings being copied as translated, that does seem weird as that
> would mean it's in the same project and has the same resId (since anything
> from another project should be marked fuzzy). I'm wondering if the
> translation has been copied in from another book, in which case we'll
> probably need to implement BZ#1066765 sooner rather than later. Anyways I'll
> take a look from our side of things today as well and see what I can find.

Some came from non-skynet project, such as old V2V Guide.
Many came from JBoss. 

I am just wondering if there is a way to restore our translation as of 3.3 GA state.

Comment 16 Carlos Munoz 2014-03-18 06:10:30 UTC

When copy trans copies a translation, it gives the credit to the original author of the translation. So when you see the jboss translator as the author, it could be because the copied translation was originally done by him.

Another thing to take into account is that copy trans will reuse translations from deleted documents (not deleted projects or versions). This might be why finding the original source is proving difficult without looking directly in the database.

Comment 19 Carlos Munoz 2014-03-18 23:36:17 UTC

I've requested a DB backup to look at this even more closely.

Comment 20 Carlos Munoz 2014-03-19 02:52:22 UTC

See also: https://bugzilla.redhat.com/show_bug.cgi?id=1077439

This bug will be scheduled for next sprint and will completely overhaul the operation of copy trans to leverage the translation memory.

Comment 22 Carlos Munoz 2014-03-21 00:13:54 UTC

Yuko, I have a db backup, so in order not to block your work, please feel free to work on the affected documents with the assumption that Copy Trans did not do its job.

Comment 30 Zanata Migrator 2015-07-29 01:59:46 UTC

Migrated; check JIRA for bug status: http://zanata.atlassian.net/browse/ZNTA-121

Note You need to log in before you can comment on or make changes to this bug.