Description of problem: Add ability to export TMs as standard TXT file. Requested by Aiko (asasaki): "I have often used memory in .txt format for asset management and global search purposes. A txt file is handy because it is not heavy, and memory created through other tools such as Trados or other external tools can be exported and merged into text format. It makes it easier for individuals and teams to accumulate and manage memory asset in one format, and allows quick global search across multiple memory files."
Hi Aiko, What is the use case you have in mind? What would you do with the txt file? If it's just for reading in a text editor, would it help if the TMX were formatted more nicely? Do you have a sample of the expected layout? Wordfast and Trados are two incompatible TXT formats for translation memories that I know of, and I'm sure there are more. Or would any simple txt layout be okay? (TMX is a form of text... :-) Also, have you looked for a separate tool that can convert the translation memory from TMX to your preferred format?
So the main thing is to have source and target in a single line? Like CSV?
Created attachment 798089 [details] memory sample
Thanks. Here's an excerpt: <Segment>0000013719 <Control> 00011800000001122533351English(U.S.)JAPANESEXXXXXX_.000XXX_.dita </Control> <Source>Click <uicontrol outputclass="XXXguicontrol">OK</uicontrol>.</Source> <Target><uicontrol outputclass="XXXguicontrol">?OK?</uicontrol>?????????</Target> </Segment> ... It looks a bit like XML, except without the top-level element, and with random nested tags (like "uicontrol"). I think its main virtue is that each Source and Target segment is on a line by itself, which should help with grep. We could look at making sure our TMX is exported in a neatly formatted way, with only one string per line. It would look something like this: <tu srclang="en-US" tuid="myproject:1.0:myproject:edbc3dc4ac083b40418f0dee7f552177"> <tuv xml:lang="en-US"> <seg>Disk Usage Analyzer</seg> </tuv> <tuv xml:lang="ja"> <seg>??????????</seg> </tuv> </tu> ... In the meantime, you could run Zanata's exported TMX through an XML pretty printer. If you have XMLStarlet installed, you can format TMX like this: $ xmlstarlet fo zanata-myproject-1.0-allLocales.tmx As root, you can install xmlstarlet from fedora or EPEL with: # yum install xmlstarlet But I'm sure there are other XML formatters too. Would something like that work?
Aiko, Does Sean's method work for you?
Ding-Yi, So sorry not to respond to your confirmation email. Can you help me with how to install XMLStarlet? After running # yum install xmlstarlet, the message "No package xmlstarlet available." appears. Thank you for your help. Aiko
For Fedora, yum -y install xmlstarlet should work. For RHEL/CentOS you need to have EPEL installed: + For RHEL/CentOS 7, run following: yum -y localinstall http://mirror.aarnet.edu.au/pub/epel/beta/7/x86_64/epel-release-7-0.1.noarch.rpm + For RHEL/CentOS 6, run following: yum -y localinstall http://mirror.aarnet.edu.au/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm + For RHEL/CentOS 5, run following: yum -y localinstall http://mirror.aarnet.edu.au/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm After you install EPEL, you can start installing XMLStarlet by: yum -y install xmlstarlet
Correction, xmlstarlet is not in EPEL6 and EPEL5. However, my epel6-collection has it. To get the epel6-collection: wget http://repos.fedorapeople.org/repos/dchen/epel6-collection/epel-epel6-collection.repo sudo mv epel-epel6-collection.repo /etc/yum.repos.d/ Then you can: yum -y install xmlstarlet
Aiko, If I understand correctly, what you like to do are: 1. Search existing translation for your locale. 2. Share translation among translators. In this case, you don't actually need to export TM, you can either: 1. Use TM in Zanata. In the bottom of Translation editor, you can search TM. 2. Use Glossary in Zanata. Japanese translation team can upload the standard terms, and upload as Glossary, so every Japanese translation team member can search, read, and use the glossary. TMX, on the other hand, is meant to be used by system admins who need to copy TM from one Zanata server to another Zanata server. Am I understand your need correctly?
Hi Ding-Yi I initially asked about global search or grep function for memory in text format. So part of my questions was whether we can export TM or not. If possible, please let me know how we can do it. Thanks Aiko
Yes, Zanata can export TM, and the output file can be read by any of the plain text editor. You can also use grep on the .tmx file. However, currently global (All projects, all locales) TMX export is only available to administrators, because this action consume a lot of system resources and lots of time. Perhaps there should be another RFE for exporting a single locale for all projects, but that's probability only offer to language consolidator. For individual translators, with Zanata-version (3.3.x) they can export all locales for a given project or project version. With Zanata-version (3.4.x), translators can also export one locale in a given project or project version.
After talked with Aiko, her requirement can be addressed in Bug 1108444. Thus, I hereby close this bug, as you can already use any plain text utilities including grep on TMX files.