Bug 995907

Summary: Exported Translation Memory / Project TMX poorly formatted
Product: [Retired] Zanata Reporter: Damian Jansen <djansen>
Component: Component-LogicAssignee: Sean Flanigan <sflaniga>
Status: CLOSED WONTFIX QA Contact: Zanata-QA Mailling List <zanata-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: developmentCC: djansen, sflaniga, zanata-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-11-19 05:30:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Damian Jansen 2013-08-12 00:11:24 UTC
Description of problem:
The exported xml from TMX / TM exports is not very readable, that is the tu,tuv,segs are all on the same line. If anyone wants to pore over it to hack translations for re-importing it's not pleasant.

Version-Release number of selected component (if applicable):
Dev

How reproducible:
Easy always

Steps to Reproduce:
Pre: An installed TM via Administration->Translation Memory
1. Got to Administration->Translation Memory
2. Press Export on the target TM
3. Save the file
4. Open the file in text editor

Actual results:
<xml...>
...
<tu><tuv $attributes><seg>text</seg>/tuv><tuv $attributes><seg>text</seg>/tuv>...

Expected results:
<tu>
  <tuv $attributes>
    <seg>...</seg>
  </tuv>
</tu>

Additional info:

Comment 1 Sean Flanigan 2013-08-12 03:55:34 UTC
I tried to find a way to format tu elements nicely, but I couldn't convince XOM to indent things nicely without also treating all whitespace the same.  (Whitespace inside seg elements is significant.)  If you enable indenting in XOM Serializer, it stops respecting whitespace, and the only way to change that is to set xml:space="preserve", but this is not allowed by the TMX 1.4 DTD.

We might be able to do something with an output filter, or by using a different method of generating XML.  I'm not sure it's worth it though.

Another workaround might be a client-side script to reformat the XML file.

Comment 2 Sean Flanigan 2014-03-06 04:35:56 UTC
What about this as a workaround?

    xmllint --format zanata-myproject-master-allLocales.tmx | less

Comment 3 Damian Jansen 2014-11-19 05:30:29 UTC
Too much effort when xmllint does the job.