Created attachment 867745 [details] original odt file Description of problem: SKO documents are in odt format and has been pushed to Zanata for CJK translation this week. However the first 4 pages of translation editor show irrelevant contents that do not require translation, so translators are now working on the actual file instead of using Zanata. Carlos investigated and figured out it is actually history of changes been recorded in xml. (e.g <text:tracked-changes> tag ) Version-Release number of selected component (if applicable): How reproducible: https://translate.engineering.redhat.com/webtrans/translate?project=sko_documents&iteration=ohcwhiteboard_backgrounder&localeId=ko&locale=en#view:doc;doc:ohcwhiteboard_backgrounder_04.odt Steps to Reproduce: 1. 2. 3. Actual results: Comments/Annotations are displayed in the editor for translation. Expected results: Comments/Annotations should be removed from the translation editor. - Need to apply new script in Okapi filter? to eliminate unnecessary fields showing up in translation editor. Additional info:
The obvious workaround is for the original author to remove the tracked changes from the document before distribution of the original document or translation work. Such metadata should be redacted before public distribution anyway. Here's some information about removing hidden metadata from ODT files: http://bumgarnerlaw.com/opensource/index.php?option=com_content&task=view&id=31&Itemid=28 If we simply skip translation for the tracked changes, the resulting document would end up with its character-by-character revision history in English, and the main document in, say, French, but without a revision for the (massive) change where it was translated. I'm not sure if this would violate the ODF standard, but it is clearly nonsensical. It would make more sense to strip out the tracked changes entirely when translating, but this would still mean wasting the Zanata server's limited storage space to hold unwanted tracked changes for the source document. If Zanata is going to do anything about this, then given the confidentiality implications and wasted disk space, I think it should just reject OpenDocument files which include any tracked changes, and direct the user to a page which explains how to remove them.
David, Is Okapi (The backend that handle Open Office document) capable of knowing which strings are from tracking?
I'm not sure if the Okapi filter can be configured to ignore any strings that are just history tracking, but if it can that would be the most straightforward fix for this. If there is no such configuration option, we should be submitting a patch for it to Okapi so that it can. We could also look at adding an option to exclude history data from generated translation files while we're working on it. I do not agree with rejecting documents that have tracking data in them - it is a zipped format so the extra disk space would be trivial*, and having people prepare their documents in a special and unusual way before we accept them is a usability nightmare. Much better that we just make Zanata behave sensibly. * even if it did use several times the disk space, that should not take priority over Zanata being easy to use with our supported formats.
Thanks for the comment, David. Isaac, Manuel How urgent or important do you think this bug is? Given that we will get more marketing materials for translation in odt or various formats?
Migrated; check JIRA for bug status: http://zanata.atlassian.net/browse/ZNTA-564