Red Hat Bugzilla – Bug 995907
Exported Translation Memory / Project TMX poorly formatted
Last modified: 2014-11-19 00:30:29 EST
Description of problem:
The exported xml from TMX / TM exports is not very readable, that is the tu,tuv,segs are all on the same line. If anyone wants to pore over it to hack translations for re-importing it's not pleasant.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Pre: An installed TM via Administration->Translation Memory
1. Got to Administration->Translation Memory
2. Press Export on the target TM
3. Save the file
4. Open the file in text editor
<tu><tuv $attributes><seg>text</seg>/tuv><tuv $attributes><seg>text</seg>/tuv>...
I tried to find a way to format tu elements nicely, but I couldn't convince XOM to indent things nicely without also treating all whitespace the same. (Whitespace inside seg elements is significant.) If you enable indenting in XOM Serializer, it stops respecting whitespace, and the only way to change that is to set xml:space="preserve", but this is not allowed by the TMX 1.4 DTD.
We might be able to do something with an output filter, or by using a different method of generating XML. I'm not sure it's worth it though.
Another workaround might be a client-side script to reformat the XML file.
What about this as a workaround?
xmllint --format zanata-myproject-master-allLocales.tmx | less
Too much effort when xmllint does the job.