Bug 999729 - RFE: Support TM Export as TXT file
RFE: Support TM Export as TXT file
Status: CLOSED NOTABUG
Product: Zanata
Classification: Community
Component: Usability (Show other bugs)
3.0
Unspecified Unspecified
unspecified Severity medium
: ---
: ---
Assigned To: Isaac Rooskov
Zanata-QA Mailling List
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-21 20:16 EDT by Isaac Rooskov
Modified: 2015-08-06 01:55 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-12 00:33:00 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
memory sample (24.81 KB, application/octet-stream)
2013-09-15 22:37 EDT, Aiko
no flags Details

  None (edit)
Description Isaac Rooskov 2013-08-21 20:16:42 EDT
Description of problem:

Add ability to export TMs as standard TXT file.

Requested by Aiko (asasaki@redhat.com):

"I have often used memory in .txt format for asset management and global search purposes.
A txt file is handy because it is not heavy, and memory created through other tools such as Trados or other external tools can be exported and merged into text format. It makes it easier for individuals and teams to accumulate and manage memory asset in one format, and allows quick global search across multiple memory files."
Comment 1 Sean Flanigan 2013-08-22 20:51:26 EDT
Hi Aiko, 

What is the use case you have in mind?  What would you do with the txt file?  If it's just for reading in a text editor, would it help if the TMX were formatted more nicely?

Do you have a sample of the expected layout?  Wordfast and Trados are two incompatible TXT formats for translation memories that I know of, and I'm sure there are more.  Or would any simple txt layout be okay?  (TMX is a form of text... :-)

Also, have you looked for a separate tool that can convert the translation memory from TMX to your preferred format?
Comment 3 Sean Flanigan 2013-09-12 00:02:47 EDT
So the main thing is to have source and target in a single line?  Like CSV?
Comment 5 Aiko 2013-09-15 22:37:33 EDT
Created attachment 798089 [details]
memory sample
Comment 6 Sean Flanigan 2013-09-17 03:59:34 EDT
Thanks.

Here's an excerpt:

<Segment>0000013719
<Control>
00011800000001122533351English(U.S.)JAPANESEXXXXXX_.000XXX_.dita
</Control>
<Source>Click <uicontrol outputclass="XXXguicontrol">OK</uicontrol>.</Source>
<Target><uicontrol outputclass="XXXguicontrol">?OK?</uicontrol>?????????</Target>
</Segment>
...

It looks a bit like XML, except without the top-level element, and with random nested tags (like "uicontrol").  I think its main virtue is that each Source and Target segment is on a line by itself, which should help with grep.

We could look at making sure our TMX is exported in a neatly formatted way, with only one string per line.  It would look something like this:

<tu srclang="en-US" tuid="myproject:1.0:myproject:edbc3dc4ac083b40418f0dee7f552177">
  <tuv xml:lang="en-US">
    <seg>Disk Usage Analyzer</seg>
  </tuv>
  <tuv xml:lang="ja">
    <seg>??????????</seg>
  </tuv>
</tu>
...

In the meantime, you could run Zanata's exported TMX through an XML pretty printer.  If you have XMLStarlet installed, you can format TMX like this:

$ xmlstarlet fo zanata-myproject-1.0-allLocales.tmx

As root, you can install xmlstarlet from fedora or EPEL with:
# yum install xmlstarlet

But I'm sure there are other XML formatters too.

Would something like that work?
Comment 7 Ding-Yi Chen 2014-03-19 01:52:10 EDT
Aiko,

Does Sean's method work for you?
Comment 8 Aiko 2014-06-10 01:41:48 EDT
Ding-Yi,

So sorry not to respond to your confirmation email.

Can you help me with how to install XMLStarlet?

After running # yum install xmlstarlet, the message "No package xmlstarlet available." appears.

Thank you for your help.

Aiko
Comment 9 Ding-Yi Chen 2014-06-10 03:32:46 EDT
For Fedora, 

yum -y install xmlstarlet

should work.

For RHEL/CentOS
 you need to have EPEL installed:

 + For RHEL/CentOS 7, run following:
   yum -y localinstall http://mirror.aarnet.edu.au/pub/epel/beta/7/x86_64/epel-release-7-0.1.noarch.rpm

 + For RHEL/CentOS 6, run following:
   yum -y localinstall http://mirror.aarnet.edu.au/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

 + For RHEL/CentOS 5, run following:
   yum -y localinstall http://mirror.aarnet.edu.au/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm


After you install EPEL, you can start installing XMLStarlet by:
 yum -y install xmlstarlet
Comment 10 Ding-Yi Chen 2014-06-11 02:28:39 EDT
Correction, xmlstarlet is not in EPEL6 and EPEL5.

However, my epel6-collection has it.

To get the epel6-collection:
wget http://repos.fedorapeople.org/repos/dchen/epel6-collection/epel-epel6-collection.repo

sudo mv epel-epel6-collection.repo /etc/yum.repos.d/

Then you can:
yum -y install xmlstarlet
Comment 11 Ding-Yi Chen 2014-06-11 04:46:01 EDT
Aiko,

If I understand correctly, what you like to do are:
1. Search existing translation for your locale.
2. Share translation among translators.

In this case, you don't actually need to export TM, you can either:
1. Use TM in Zanata. In the bottom of Translation editor, you can search TM.
2. Use Glossary in Zanata. Japanese translation team can upload the standard 
   terms, and upload as Glossary, so every Japanese translation team member can
   search, read, and use the glossary.


TMX, on the other hand, is meant to be used by system admins who need to copy TM from one Zanata server to another Zanata server.

Am I understand your need correctly?
Comment 12 Aiko 2014-06-11 19:44:41 EDT
Hi Ding-Yi

I initially asked about global search or grep function for memory in text format.

So part of my questions was whether we can export TM or not.


If possible, please let me know how we can do it.


Thanks

Aiko
Comment 13 Ding-Yi Chen 2014-06-11 21:01:16 EDT
Yes, Zanata can export TM, and the output file can be read by any of the plain text editor. You can also use grep on the .tmx file.

However, currently global (All projects, all locales) TMX export is only available to administrators, because this action consume a lot of system resources and lots of time.

Perhaps there should be another RFE for exporting a single locale for all projects, but that's probability only offer to language consolidator.

For individual translators, with Zanata-version (3.3.x) they can export all locales for a given project or project version.

With Zanata-version (3.4.x), translators can also export one locale in a given project or project version.
Comment 14 Ding-Yi Chen 2014-06-12 00:33:00 EDT
After talked with Aiko, her requirement can be addressed in Bug 1108444.

Thus, I hereby close this bug, as you can already use any plain text utilities including grep on TMX files.

Note You need to log in before you can comment on or make changes to this bug.