Bug 873489

Summary: XLIFF/Properties/PO upload should check that translations correspond to the current text flow contents
Product: [Retired] Zanata Reporter: Sean Flanigan <sflaniga>
Component: Component-LogicAssignee: Sean Flanigan <sflaniga>
Status: CLOSED UPSTREAM QA Contact: Zanata-QA Mailling List <zanata-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: damason, dchen, mkim, pahuang, zanata-bugs
Target Milestone: ---Keywords: screened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-29 03:35:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sean Flanigan 2012-11-06 01:13:16 UTC
Description of problem:

When pushing XLIFF content, it is possible for the translation files to have older source strings than the source files, but with the same trans-unit ids.  For instance:

source file:
...
<trans-unit id="greeting">
  <source>Hello World</source>
</trans-unit>
...

trans file:
...
<trans-unit id="greeting">
  <source>Hello</source>
  <target>Hallo</target>
</trans-unit>
...

NB: even though the ids match, "Hallo" is not a valid translation of the current source "Hello World".


Version-Release number of selected component (if applicable):
2.0.0

How reproducible:
100%

Steps to Reproduce:
1. Push both files above
2. Pull the translation
  
Actual results:
greeting has the incorrect translation "Hallo"

Expected results:
greeting should be untranslated

Additional info:

We may need to change the way we generate resIds for XLIFF files, to be similar to the Gettext approach, so that the server will reject the outdated translation.  If so, we will need a migration path for existing content in the database.

Alternatively the client may be able to load source and target files and compare the source element.  We will need to weigh both approaches.

We have a similar problem for Properties files.

Comment 1 Sean Flanigan 2012-11-07 02:19:59 UTC
See bug 873500 for Properties files.

Comment 2 Sean Flanigan 2012-11-20 06:06:08 UTC
We should add sourceContentHash to TextFlowTarget, and have the XLIFF and Properties clients pass a hash of the source content.  Then the server could verify that the sourceContentHash matches the current version of the HTextFlow, else reject the translation.

Comment 3 Sean Flanigan 2012-11-20 06:06:17 UTC
*** Bug 873500 has been marked as a duplicate of this bug. ***

Comment 4 Sean Flanigan 2013-12-12 04:00:41 UTC
*** Bug 873930 has been marked as a duplicate of this bug. ***

Comment 5 Sean Flanigan 2013-12-12 04:03:24 UTC
As of Zanata 3.2.0, TextFlowTarget includes a sourceHash property.  If sourceHash is provided, TranslatedDocResourceService on the server will check that the sourceHash matches the HTextFlow before persisting the HTextFlowTarget.

The gettext formats (including offlinepo) now provide sourceHash when pushing to the server, but this still needs to be implemented for xliff and properties files.

Comment 6 Sean Flanigan 2014-03-21 05:07:24 UTC
Pull request (WIP): https://github.com/zanata/zanata-common/pull/1

Comment 7 Sean Flanigan 2014-03-21 05:13:59 UTC
Also https://github.com/zanata/zanata-client/pull/14

Comment 8 Sean Flanigan 2014-03-26 03:12:11 UTC
Both pull requests have been updated.

Comment 9 Sean Flanigan 2014-03-26 04:05:11 UTC
Note that the required changes for PO files were in server commit 0a3971e, common commit 24c7951 and client commit 880c69a157.  Source content checking for PO files requires client >= 3.2.0, and only works with Zanata server >= 3.2.0.  (Note that there was no bz for that change.)

Comment 10 Michelle Kim 2015-02-25 05:40:35 UTC
We have similar bug 1194543 that is implementing the solution for formats that use positional identifiers.

Comment 11 David Mason 2015-02-26 01:56:30 UTC
(In reply to Michelle Kim from comment #10)
> We have similar bug 1194543 that is implementing the solution for formats
> that use positional identifiers.

That bug is about source text that changes which identifier it uses (e.g. due to position change), but this bug is about translations being pushed that were translated from an old version of the source. I do not think they are related since the cause and the fixes for them are different.

Comment 12 Zanata Migrator 2015-07-29 03:35:34 UTC
Migrated; check JIRA for bug status: http://zanata.atlassian.net/browse/ZNTA-336