Bug 1070002 - OpenOffice/LibreOffice doc with tracked changes require additional filter
Summary: OpenOffice/LibreOffice doc with tracked changes require additional filter
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Zanata
Classification: Retired
Component: Component-Logic
Version: unspecified
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Sean Flanigan
QA Contact: Zanata-QA Mailling List
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-26 04:40 UTC by Michelle Kim
Modified: 2015-07-31 01:49 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-07-31 01:49:49 UTC
Embargoed:


Attachments (Terms of Use)

Description Michelle Kim 2014-02-26 04:40:53 UTC
Created attachment 867745 [details]
original odt file

Description of problem:

SKO documents are in odt format and has been pushed to Zanata for CJK translation this week. However the first 4 pages of translation editor show irrelevant contents that do not require translation, so translators are now working on the actual file instead of using Zanata.

Carlos investigated and figured out it is actually history of changes been recorded in xml. (e.g <text:tracked-changes> tag )


Version-Release number of selected component (if applicable):


How reproducible:

https://translate.engineering.redhat.com/webtrans/translate?project=sko_documents&iteration=ohcwhiteboard_backgrounder&localeId=ko&locale=en#view:doc;doc:ohcwhiteboard_backgrounder_04.odt

Steps to Reproduce:
1.
2.
3.

Actual results:

Comments/Annotations are displayed in the editor for translation.

Expected results:

Comments/Annotations should be removed from the translation editor.
- Need to apply new script in Okapi filter? to eliminate unnecessary fields showing up in translation editor.

Additional info:

Comment 1 Sean Flanigan 2014-02-27 00:41:20 UTC
The obvious workaround is for the original author to remove the tracked changes from the document before distribution of the original document or translation work.  Such metadata should be redacted before public distribution anyway.

Here's some information about removing hidden metadata from ODT files: http://bumgarnerlaw.com/opensource/index.php?option=com_content&task=view&id=31&Itemid=28


If we simply skip translation for the tracked changes, the resulting document would end up with its character-by-character revision history in English, and the main document in, say, French, but without a revision for the (massive) change where it was translated.  I'm not sure if this would violate the ODF standard, but it is clearly nonsensical.

It would make more sense to strip out the tracked changes entirely when translating, but this would still mean wasting the Zanata server's limited storage space to hold unwanted tracked changes for the source document.

If Zanata is going to do anything about this, then given the confidentiality implications and wasted disk space, I think it should just reject OpenDocument files which include any tracked changes, and direct the user to a page which explains how to remove them.

Comment 4 Ding-Yi Chen 2014-03-20 01:42:18 UTC
David,

Is Okapi (The backend that handle Open Office document) capable of knowing which strings are from tracking?

Comment 6 David Mason 2014-06-10 00:54:09 UTC
I'm not sure if the Okapi filter can be configured to ignore any strings that are just history tracking, but if it can that would be the most straightforward fix for this. If there is no such configuration option, we should be submitting a patch for it to Okapi so that it can. We could also look at adding an option to exclude history data from generated translation files while we're working on it.

I do not agree with rejecting documents that have tracking data in them - it is a zipped format so the extra disk space would be trivial*, and having people prepare their documents in a special and unusual way before we accept them is a usability nightmare. Much better that we just make Zanata behave sensibly.

* even if it did use several times the disk space, that should not take priority over Zanata being easy to use with our supported formats.

Comment 7 Michelle Kim 2014-06-11 00:50:26 UTC
Thanks for the comment, David.

Isaac, Manuel

How urgent or important do you think this bug is? Given that we will get more marketing materials for translation in odt or various formats?

Comment 8 Zanata Migrator 2015-07-31 01:49:49 UTC
Migrated; check JIRA for bug status: http://zanata.atlassian.net/browse/ZNTA-564


Note You need to log in before you can comment on or make changes to this bug.