Bug 475684 - Find solution for using Glossaries with publican
Summary: Find solution for using Glossaries with publican
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Publican
Classification: Community
Component: publican
Version: future
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jeff Fearn 🐞
QA Contact: Ruediger Landmann
URL:
Whiteboard:
: 485949 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-12-10 01:02 UTC by David O'Brien
Modified: 2011-08-22 23:53 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-22 23:53:15 UTC
Embargoed:


Attachments (Terms of Use)

Description David O'Brien 2008-12-10 01:02:49 UTC
Description of problem:

The latest version of publican (0.39) has banned the use of glosslist tags, making using Glossaries impossible (or at least very difficult), and causing books that use them to not build.

Reasoning: Considered unprofessional, difficult to translate, sorting issues.

Arguments against: Being considered unprofessional is an opinion. Glossaries are a useful resource in technical doc. If necessary they can remain untranslated.

Version-Release number of selected component (if applicable):
0.39

How reproducible:
always

Steps to Reproduce:
1.
2.
3.
  
Actual results:
*ERROR: BUILD FAILED! Banned tag found*
glosslist:      This tag set imposes English-language order on glossaries, making them useless when translated.
Remove all glosslist tags before attempting to build.
make: *** [xml-en-US] Error 4

Expected results:


Additional info:

Comment 1 Jeff Fearn 🐞 2008-12-10 01:58:19 UTC
I think glosslist should be allowed and glossdiv should be banned instead, because a glosslist is, IMHO, an unordered list.

I've added Manuel as a CC for his input from a translators perspective.

Cheers, Jeff.

Comment 2 Jeff Fearn 🐞 2009-01-05 05:28:19 UTC
Manuel, would a UTS #10 collation routine conforming to http://www.unicode.org/reports/tr10/ be sufficient to sort a flat glossary?

i.e. a glosslist or glossary without glossdivs?

e.g. of the form:

Term1: definition1
Term2: definition2
Term3: definition3

Where TermX may or may not be translated, but definitionX is always translated.

Cheers, Jeff.

Comment 3 Jeff Fearn 🐞 2009-01-20 04:37:16 UTC
Oops, I buggered up the flags, still need info from Manuel.

Comment 4 Manuel Ospina 2009-01-20 06:59:56 UTC
It would be sufficient for a phonetic alphabet. I'm not sure how it would work in other systems. How does this algorithm sort non-alphabetical writing systems?  
I have added Chester for his opinion. 

(In reply to comment #2)
> Manuel, would a UTS #10 collation routine conforming to
> http://www.unicode.org/reports/tr10/ be sufficient to sort a flat glossary?
> 
> i.e. a glosslist or glossary without glossdivs?
> 
> e.g. of the form:
> 
> Term1: definition1
> Term2: definition2
> Term3: definition3
> 
> Where TermX may or may not be translated, but definitionX is always translated.
> 
> Cheers, Jeff.

Comment 5 Chester Cheng 2009-01-20 07:11:08 UTC
Non-alphabetical characters are in "Symbols".   I guess it's in ASCII order.

To me glossary is quite important.

(In reply to comment #4)
> It would be sufficient for a phonetic alphabet. I'm not sure how it would work
> in other systems. How does this algorithm sort non-alphabetical writing
> systems?  
> I have added Chester for his opinion. 
>

Comment 6 Jeff Fearn 🐞 2009-01-21 00:38:39 UTC
I had a chat to Asgeir and he believes that TR10 collation should be sufficient to sort mixed language content in the correct order.

I suggest:


1: Package Unicode::Collate

2: Use Unicode::Collate in cleanXml to sort the glosslist on glossterm after the translated XML has been cleaned.

3: Remove the ban on glosslist

4: Consider banning glossdiv as it can have mixed language content at multiple levels. This breaks l10n layout.

There is no time frame for this ATM due to work on RHTS.

Comment 7 David O'Brien 2009-01-21 00:59:12 UTC
Can we remove the ban sooner rather than later?

I ask because, afaik, the only (RH) books that use glossaries are not translated yet. Is it possible to apply these bans on a brand basis? This way I can still update publican to take advantage of fixes and enhancements without removing my glossary.

There are glossaries in the IDM doc, none of which is translated. IPA doc is scheduled for translation in the next release, but that is not for some time. The rest of the IDM doc (Directory Server, Cert. System, etc.) is not scheduled for translation.

I *think* the oVirt doc uses glossaries, but I don't know what translation plans exist.

cheers
David

Comment 8 Jeff Fearn 🐞 2009-01-21 05:10:38 UTC
(In reply to comment #7)
> Can we remove the ban sooner rather than later?

lol no. The mere possibility that there _may_ be a fix at some unknown time in the future is not a sane reason to change anything now.
 
> I ask because, afaik, the only (RH) books that use glossaries are not
> translated yet. Is it possible to apply these bans on a brand basis?

Brands already control this by setting STRICT. Red Hat brands set STRICT, common, fedora, etc brands do not set STRICT. Non-STRICT brands get a warning instead of an error about these things.

Comment 9 David O'Brien 2009-01-21 06:21:52 UTC
This is not something that is getting fixed. This is something that is getting banned due to opinions. The mere possibility that someone might add a glossary to a book that is going to get translated is not a sane reason to ban the necessary tags in the first place.

I thought STRICT settings were involved in how this was treated but wasn't sure, thanks.

Comment 10 Jeff Fearn 🐞 2009-01-21 22:27:40 UTC
(In reply to comment #9)
> This is not something that is getting fixed.

It breaks our ability to translate content, so from any perspective that doesn't ignore translation it is broken and needs to be fixed. 

> This is something that is getting
> banned due to opinions.

I think it's highly insulting that you insinuate we have not done due diligence on this functionality. It takes in to account all aspects of the Documentation work flow; our customers expectations, and the real history and decisions of the past that have positively and negatively affected the Docs team and Red Hat.

> The mere possibility that someone might add a glossary
> to a book that is going to get translated is not a sane reason to ban the
> necessary tags in the first place.

It is not acceptable to break translation work flow regardless of the current translation status of a particular work.

It is a sane policy given the volume of content, the size of the team we work in, and that ignoring translation work flow has bitten us in the ass previously and it cost us significantly to rectify that short sightedness.

I suppose if you don't have to care about the other people in the team and you chose to ignore that this exact same attitude has occurred before and cost us dearly, then sure, maybe we are just being silly.

Comment 11 Jeff Fearn 🐞 2009-02-17 21:44:34 UTC
*** Bug 485949 has been marked as a duplicate of this bug. ***

Comment 12 Deon Ballard 2009-02-17 22:06:56 UTC
I don't particularly care about other people! So, can we please allow glossaries? Is there an ETA for that, even for the glosslist compromise?

Glossaries are very useful, whether it's for new products like IPA or RH Virtual Directory or long-standing and intricate products like RHEL itself. They're a great reference for every level of user. I personally use them all the time. That is my opinion. Your opinion is that they aren't worth the effort because of the amount of time they take. Great. We have two opinions. 

Is it not possible simply to not translate the glossaries or to leave them out of translated docs? It seems there can be a procedural resolution rather than flat out prohibiting glossaries.

Comment 13 Jeff Fearn 🐞 2009-02-17 22:46:10 UTC
(In reply to comment #12)
> I don't particularly care about other people!

Welcome to the public mailing list.

> So, can we please allow
> glossaries?

They break translation, until there is a solution that doesn't breach the stated ECS policy, that breaking translation is _never_ acceptable, they will remain disabled for STRICT brands.

> Is there an ETA for that, even for the glosslist compromise?

No.

> Glossaries are very useful, whether it's for new products like IPA or RH
> Virtual Directory or long-standing and intricate products like RHEL itself.
> They're a great reference for every level of user.

No one is arguing they aren't useful or desirable.

> I personally use them all
> the time. That is my opinion. Your opinion is that they aren't worth the effort
> because of the amount of time they take. Great. We have two opinions. 

No, we have a dozen opinions, and one policy that breaking translation is never acceptable.

> Is it not possible simply to not translate the glossaries or to leave them out
> of translated docs? It seems there can be a procedural resolution rather than
> flat out prohibiting glossaries.

I have been informed by management that treating translated content as of secondary importance or excluding content from translations is not acceptable. Your manager is aware of the effects of these policies and you should take up the prioritisation of these issues with them directly.

Comment 14 Deon Ballard 2009-02-17 23:19:41 UTC
Sigh. I thought the facetiousness was implied in "I don't care about people, so can I have my glossary now." Next time, I'll use a /sarc tag. (Unless those are banned, too...)

Comment 15 Michael Hideo 2009-06-17 01:49:25 UTC
Deon,

Still awaiting disposition from blocker.

- Mike

Comment 16 Deon Ballard 2009-06-17 17:31:28 UTC
That's cool. The promise of a resolution being in the works is good for now. Thanks for keeping the bug updated.

Comment 18 Jeff Fearn 🐞 2009-11-27 11:11:39 UTC
I removed the blocker because:

1: newer XSL is available in the docs brew root and  yum repo
2: publican 1.3 (due next week) will have glossary.sort enabled for all formats

Still requires testing on a translated glossary.

Comment 19 Ruediger Landmann 2010-05-06 06:54:40 UTC
I had time to experiment a bit with this a few weeks ago; here's what I found, with help from translators:

<glossentry>s inside a <glossary> get sorted correctly (at least superficially[0]) for languages that use the Latin and Cyrillic alphabets. Languages with different writing systems present different problems:

Chinese:
<glossentry>s appear in no discernible pattern. They're probably being sorted according to Unicode codepoint.

Japanese:
A glossary in a Japanese technical publications could include up to four different writing systems: Latin, Katakana, Hiragana, and Kanji. Terms presented in Latin script should be separated from those presented in the three Japanese writing systems (already sorted correctly), but terms in Katakana, Hiragana, and Kanji should be interspersed according to their pronunciation. At present, we're getting all the Katakana first, then all the Hiragana, then all the Kanji. Katakana and Hiragana are syllabic scripts that represent the same 50 syllables; sorting them shouldn't be difficult and can probably be achieved easily in an update to the docbook locale. The problem is that a single Kanji character can represent one, two, or more syllables and its pronunciation (and therefore sort order) can change when combined with other Kanji. 

Still untested:
Korean
all Indic languages

Korean and the various Indic languages that we support use syllabic scripts; if they aren't already working correctly, I think that should be easily fixed in the locale. 

I note that these sorting issues affect not only glossaries, but any books that have indexes as well. 

[0] not all languages sort the Latin alphabet the same way, particularly when it comes to handling accented characters or characters outside the "basic Latin" group. I didn't explore what happens at these edges.

Comment 20 Jeff Fearn 🐞 2010-05-10 08:40:11 UTC
Hi Rudi, can we get access to this glossary? Also can we get it in an ordered list in the correct sorting order?

Comment 21 RHEL Program Management 2010-08-09 19:34:50 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.


Note You need to log in before you can comment on or make changes to this bug.