999729 – RFE: Support TM Export as TXT file

Bug 999729 - RFE: Support TM Export as TXT file

Summary: RFE: Support TM Export as TXT file

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Zanata
Classification:	Retired
Component:	Usability
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Isaac Rooskov
QA Contact:	Zanata-QA Mailling List
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-22 00:16 UTC by Isaac Rooskov
Modified:	2015-08-06 05:55 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Story Points:	---
Clone Of:
Environment:
Last Closed:	2014-06-12 04:33:00 UTC
Embargoed:

Attachments	(Terms of Use)
memory sample (24.81 KB, application/octet-stream) 2013-09-16 02:37 UTC, Aiko	no flags	Details
View All

Description Isaac Rooskov 2013-08-22 00:16:42 UTC

Description of problem:

Add ability to export TMs as standard TXT file.

Requested by Aiko (asasaki):

"I have often used memory in .txt format for asset management and global search purposes.
A txt file is handy because it is not heavy, and memory created through other tools such as Trados or other external tools can be exported and merged into text format. It makes it easier for individuals and teams to accumulate and manage memory asset in one format, and allows quick global search across multiple memory files."

Comment 1 Sean Flanigan 2013-08-23 00:51:26 UTC

Hi Aiko, 

What is the use case you have in mind?  What would you do with the txt file?  If it's just for reading in a text editor, would it help if the TMX were formatted more nicely?

Do you have a sample of the expected layout?  Wordfast and Trados are two incompatible TXT formats for translation memories that I know of, and I'm sure there are more.  Or would any simple txt layout be okay?  (TMX is a form of text... :-)

Also, have you looked for a separate tool that can convert the translation memory from TMX to your preferred format?

Comment 3 Sean Flanigan 2013-09-12 04:02:47 UTC

So the main thing is to have source and target in a single line?  Like CSV?

Comment 5 Aiko 2013-09-16 02:37:33 UTC

Created attachment 798089 [details]
memory sample

Comment 6 Sean Flanigan 2013-09-17 07:59:34 UTC

Thanks.

Here's an excerpt:

<Segment>0000013719
<Control>
00011800000001122533351English(U.S.)JAPANESEXXXXXX_.000XXX_.dita
</Control>
<Source>Click <uicontrol outputclass="XXXguicontrol">OK</uicontrol>.</Source>
<Target><uicontrol outputclass="XXXguicontrol">?OK?</uicontrol>?????????</Target>
</Segment>
...

It looks a bit like XML, except without the top-level element, and with random nested tags (like "uicontrol").  I think its main virtue is that each Source and Target segment is on a line by itself, which should help with grep.

We could look at making sure our TMX is exported in a neatly formatted way, with only one string per line.  It would look something like this:

<tu srclang="en-US" tuid="myproject:1.0:myproject:edbc3dc4ac083b40418f0dee7f552177">
  <tuv xml:lang="en-US">
    <seg>Disk Usage Analyzer</seg>
  </tuv>
  <tuv xml:lang="ja">
    <seg>??????????</seg>
  </tuv>
</tu>
...

In the meantime, you could run Zanata's exported TMX through an XML pretty printer.  If you have XMLStarlet installed, you can format TMX like this:

$ xmlstarlet fo zanata-myproject-1.0-allLocales.tmx

As root, you can install xmlstarlet from fedora or EPEL with:
# yum install xmlstarlet

But I'm sure there are other XML formatters too.

Would something like that work?

Comment 7 Ding-Yi Chen 2014-03-19 05:52:10 UTC

Aiko,

Does Sean's method work for you?

Comment 8 Aiko 2014-06-10 05:41:48 UTC

Ding-Yi,

So sorry not to respond to your confirmation email.

Can you help me with how to install XMLStarlet?

After running # yum install xmlstarlet, the message "No package xmlstarlet available." appears.

Thank you for your help.

Aiko

Comment 9 Ding-Yi Chen 2014-06-10 07:32:46 UTC

For Fedora, 

yum -y install xmlstarlet

should work.

For RHEL/CentOS
 you need to have EPEL installed:

 + For RHEL/CentOS 7, run following:
   yum -y localinstall http://mirror.aarnet.edu.au/pub/epel/beta/7/x86_64/epel-release-7-0.1.noarch.rpm

 + For RHEL/CentOS 6, run following:
   yum -y localinstall http://mirror.aarnet.edu.au/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

 + For RHEL/CentOS 5, run following:
   yum -y localinstall http://mirror.aarnet.edu.au/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm


After you install EPEL, you can start installing XMLStarlet by:
 yum -y install xmlstarlet

Comment 10 Ding-Yi Chen 2014-06-11 06:28:39 UTC

Correction, xmlstarlet is not in EPEL6 and EPEL5.

However, my epel6-collection has it.

To get the epel6-collection:
wget http://repos.fedorapeople.org/repos/dchen/epel6-collection/epel-epel6-collection.repo

sudo mv epel-epel6-collection.repo /etc/yum.repos.d/

Then you can:
yum -y install xmlstarlet

Comment 11 Ding-Yi Chen 2014-06-11 08:46:01 UTC

Aiko,

If I understand correctly, what you like to do are:
1. Search existing translation for your locale.
2. Share translation among translators.

In this case, you don't actually need to export TM, you can either:
1. Use TM in Zanata. In the bottom of Translation editor, you can search TM.
2. Use Glossary in Zanata. Japanese translation team can upload the standard 
   terms, and upload as Glossary, so every Japanese translation team member can
   search, read, and use the glossary.


TMX, on the other hand, is meant to be used by system admins who need to copy TM from one Zanata server to another Zanata server.

Am I understand your need correctly?

Comment 12 Aiko 2014-06-11 23:44:41 UTC

Hi Ding-Yi

I initially asked about global search or grep function for memory in text format.

So part of my questions was whether we can export TM or not.


If possible, please let me know how we can do it.


Thanks

Aiko

Comment 13 Ding-Yi Chen 2014-06-12 01:01:16 UTC

Yes, Zanata can export TM, and the output file can be read by any of the plain text editor. You can also use grep on the .tmx file.

However, currently global (All projects, all locales) TMX export is only available to administrators, because this action consume a lot of system resources and lots of time.

Perhaps there should be another RFE for exporting a single locale for all projects, but that's probability only offer to language consolidator.

For individual translators, with Zanata-version (3.3.x) they can export all locales for a given project or project version.

With Zanata-version (3.4.x), translators can also export one locale in a given project or project version.

Comment 14 Ding-Yi Chen 2014-06-12 04:33:00 UTC

After talked with Aiko, her requirement can be addressed in Bug 1108444.

Thus, I hereby close this bug, as you can already use any plain text utilities including grep on TMX files.

Note You need to log in before you can comment on or make changes to this bug.