Bug 987199

Summary:	RFE: pass a "sortas" attribute to Publican from PO files
Product:	[Community] Publican	Reporter:	Ruediger Landmann <rlandman>
Component:	publican	Assignee:	Jeff Fearn 🐞 <jfearn>
Status:	CLOSED CURRENTRELEASE	QA Contact:	tools-bugs <tools-bugs>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.2	CC:	aigao, jfearn, rlandman
Target Milestone:	---	Keywords:	FutureFeature
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	4.0.0	Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-12-19 02:46:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ruediger Landmann 2013-07-22 23:50:57 UTC

DocBook supports a "sortas" attribute for strings that will be sorted and collated automatically in glossaries and indexes. However, translators cannot take advantage of this attribute without altering the XML. 

Without the manual override provided by the sortas attribute, indexes and glossaries in languages that are not machine-sortable (Japanese in particular) are useless.

Supporting a magic word in PO files could expose the "sortas" attribute to translators. For example, the word "SORTAS" and anything following it would always be removed from a msgstr like:

msgid="foo"
msgstr="bar SORTAS baz"

but when applied to any of the four DocBook elements that support the "sortas" attribute (<primary>, <secondary>, <tertiary>, and <glossentry>), it would add a "sortas" element:

<primary sortas="baz">bar</primary>

The downside is that anyone using other tools to apply these PO files to XML strings is going to get rubbish:

<primary>bar SORTAS baz</primary>

However, the glossaries and indexes of anyone doing that are already broken anyway, and an identifiable string like this would be easily removed before or even after transformation by anyone who needed to do so.

Comment 1 Jeff Fearn 🐞 2013-07-23 06:16:15 UTC

(In reply to Ruediger Landmann from comment #0)
> DocBook supports a "sortas" attribute for strings that will be sorted and
> collated automatically in glossaries and indexes. However, translators
> cannot take advantage of this attribute without altering the XML. 
> 
> Without the manual override provided by the sortas attribute, indexes and
> glossaries in languages that are not machine-sortable (Japanese in
> particular) are useless.
> 
> Supporting a magic word in PO files could expose the "sortas" attribute to
> translators. For example, the word "SORTAS" and anything following it would
> always be removed from a msgstr like:
> 
> msgid="foo"
> msgstr="bar SORTAS baz"
> 
> but when applied to any of the four DocBook elements that support the
> "sortas" attribute (<primary>, <secondary>, <tertiary>, and <glossentry>),
> it would add a "sortas" element:
> 
> <primary sortas="baz">bar</primary>
> 
> The downside is that anyone using other tools to apply these PO files to XML
> strings is going to get rubbish:
> 
> <primary>bar SORTAS baz</primary>
> 
> However, the glossaries and indexes of anyone doing that are already broken
> anyway, and an identifiable string like this would be easily removed before
> or even after transformation by anyone who needed to do so.

To be specific this is about the sortas attribute in the glossentry tag.

I'd like to do it in a way that doesn't stuff up other tools, or at least is a harmless as possible, so I think we could use remark tags and put them after the translated content.

e.g.
#. Tag: glossentry
msgid="foo"
msgstr="bar<remark>SORTAS baz</remark>"

That way most systems will hide the remark and the order will revert to the upstream order.

The sortas would take everything after the space so <remark>SORTAS baz bar foo</remark> would end up <primary sortas="baz bar foo">bar</primary> which I'm assuming might be significant in one language or another :)

Thoughts?

Comment 2 Ruediger Landmann 2013-07-25 23:55:31 UTC

(In reply to Jeff Fearn from comment #1)

> To be specific this is about the sortas attribute in the glossentry tag.

If we implement it for <glossentry>, I think we really should have it for <primary>, <secondary>, and <tertiary> as well (the children of <indexterm>) -- indexes are affected by the same sorting limitation as glossaries are, and at present, are in nonsensical order in some languages.

> I'd like to do it in a way that doesn't stuff up other tools, or at least is
> a harmless as possible, so I think we could use remark tags and put them
> after the translated content.

Agreed that remark tags is a much better solution that shouldn't break anything for anyone -- thanks! 

> The sortas would take everything after the space so <remark>SORTAS baz bar
> foo</remark> would end up <primary sortas="baz bar foo">bar</primary> which
> I'm assuming might be significant in one language or another :)
> 
> Thoughts?

Yes -- perfect :)

Comment 3 Jeff Fearn 🐞 2013-07-26 00:03:16 UTC

(In reply to Ruediger Landmann from comment #2)
> (In reply to Jeff Fearn from comment #1)
> 
> > To be specific this is about the sortas attribute in the glossentry tag.
> 
> I think we really should have it for <primary>, <secondary>, and <tertiary> as well

Sure.

Comment 4 HSS Product Manager 2013-09-18 05:28:17 UTC

HSS-QE has reviewed and declined this request. QE for this bug will be handled by IED.

Comment 5 Jeff Fearn 🐞 2013-09-19 01:10:59 UTC

It appears we don't actually need to do any code changes at all!

We'd only need to handle attributes indirectly if they are the root node being translated, however no node that serves as a root node for translation has this attribute.

e.g.

<indexterm>
<primary>Introduction 1</primary>
</indexterm>

In this XML indexterm is a the node Publican will base a translation block on. Translators can simply add the sortas attribute to the translation string.

msgid "<primary>Introduction 1</primary>"
msgstr "<primary sortas=\"banana\">Introduzione 1</primary>"


The tags containing this attribute are glossentry, secondary, primary, tertiary.

Moving to PUG so that this procedure can be documented.

Comment 7 Ruediger Landmann 2013-11-18 06:31:09 UTC

The approach in comment 5 works for <indexterm>s because we keep all the child tags together, but not for <glossentry>s, because we split them up. For example:

<glossary>
  <glossentry>
    <glossterm>Standard Generalized Markup Language</glossterm>
    <glossdef>
      <para>Some reasonable definition here.</para>
    </glossdef>
  </glossentry>
</glossary>

becomes:

#. Tag: glossterm
#, no-c-format
msgid "Standard Generalized Markup Language"
msgstr ""

#. Tag: para
#, no-c-format
msgid "Some reasonable definition here."
msgstr ""

in the PO(T) file

Moving this back to Publican itself

Comment 8 Jeff Fearn 🐞 2013-11-20 06:23:21 UTC

Please add an glosslist to the PUG so I can test on it.

Comment 9 Jeff Fearn 🐞 2013-11-24 23:35:41 UTC

Doh, forgot to update this :(

Fix committed to git repo, please add a glossary/glosslist to pub and test.

Comment 10 Ruediger Landmann 2013-11-25 05:18:44 UTC

Verified in publican-3.9.9-0.fc19.t27.noarch: test in comment 7 becomes:

#. Tag: glossentry
#, no-c-format
msgid "<glossterm>Standard Generalized Markup Language</glossterm>"
msgstr ""

#. Tag: para
#, no-c-format
msgid "Some reasonable definition here."
msgstr ""