Bug 471028

Summary: locale name should be hi instead of hi_IN
Product: [Fedora] Fedora Documentation Reporter: Rajesh Ranjan <rranjan>
Component: release-notesAssignee: Release Notes Tracker <relnotes>
Status: CLOSED CURRENTRELEASE QA Contact: Karsten Wade <kwade>
Severity: urgent Docs Contact:
Priority: urgent    
Version: develCC: aalam, ankit, piotrdrag, rranjan, stickster
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-11-20 22:57:05 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 151189    

Description Rajesh Ranjan 2008-11-11 08:34:12 EST
For Hindi lagauge locale name is here on the preview page 'hi_IN'...It should be 'hi' only like other gu, pa, mr locales etc. Please see here

http://docs.fedoraproject.org/release-notes/f10preview/

here for hi_IN, hi only shlould be!
Comment 1 Piotr Drąg 2008-11-11 10:20:58 EST
There is both hi and hi_IN in git repo:

http://git.fedorahosted.org/git/?p=docs/release-notes.git;a=tree;f=po;hb=HEAD
Comment 2 Karsten Wade 2008-11-11 12:16:34 EST
Adding Jeff Fearn who can help us with the answer to ...

Do we need country codes in the language for Publican?

Maybe Paul remembers, but I forget why we use these?  What circumstances does it help?

I'd like us to make a clear policy and stick to it, merging and deleting whichever ones we do not use.
Comment 3 Jeff Fearn 2008-11-11 16:25:47 EST
(In reply to comment #2)
> Adding Jeff Fearn who can help us with the answer to ...
> 
> Do we need country codes in the language for Publican?

Publican should handle 2 and 5 character codes.

> Maybe Paul remembers, but I forget why we use these?  What circumstances does
> it help?

It makes language code format consistent across all languages since at the minimum the two Chinese languages use 5 character codes.

> I'd like us to make a clear policy and stick to it, merging and deleting
> whichever ones we do not use.

IMHO xx-YY for all languages is clear.

xx[-YY] is less clear.

What happens if you have a xx language and later on someone adds a xx-YY variant? I recently heard this happened to one project when es-MX was added and es already existed.

My point being that xx[-YY] is more prone to confusion than always having xx-YY.

Cheers, Jeff.
Comment 4 Piotr Drąg 2008-11-11 17:00:55 EST
(In reply to comment #3)
> It makes language code format consistent across all languages since at the
> minimum the two Chinese languages use 5 character codes.
> 

But not every language has to use 5 character codes. Most of them don't have any variants and always will be just one language. 2 character codes are used for years, and it has never been problematic in any way.

> IMHO xx-YY for all languages is clear.
> xx[-YY] is less clear.
> 

But changing de facto standard (xx[_YY]) to something not used in any place except publican does not give you any benefits. It makes things even worse, because whole FLOSS world use 2 character codes (with exceptions like zh_CN or bn_IN, where it's really necessary), including glibc language tables!

> What happens if you have a xx language and later on someone adds a xx-YY
> variant? I recently heard this happened to one project when es-MX was added and
> es already existed.
> 

Is this really a problem? In which way?

> My point being that xx[-YY] is more prone to confusion than always having
> xx-YY.
> 

From l10n view, it's just complicating things that always have been simple.
Comment 5 Jeff Fearn 2008-11-11 18:01:37 EST
(In reply to comment #4)
> (In reply to comment #3)
> > It makes language code format consistent across all languages since at the
> > minimum the two Chinese languages use 5 character codes.
> > 
> 
> But not every language has to use 5 character codes. Most of them don't have
> any variants and always will be just one language. 2 character codes are used
> for years, and it has never been problematic in any way.

He asked for a clear policy, always use 5 letter codes is clear, use 2 or 5 letter codes is less clear.

> > IMHO xx-YY for all languages is clear.
> > xx[-YY] is less clear.
> > 
> 
> But changing de facto standard (xx[_YY]) to something not used in any place
> except publican does not give you any benefits.

This has nothing to do with Publican, it's Fedora docs policy and it would make sense to clarify this policy regardless of what tool chain is being used.

> It makes things even worse,
> because whole FLOSS world use 2 character codes (with exceptions like zh_CN or
> bn_IN, where it's really necessary), including glibc language tables!

This is the difference between producing translations for software and translations for documentation.

The underscore should not be used for documentation translations since it is not a valid language delimiter in XML or XHTML. The HYPHEN-MINUS is the only valid delimiter in XML and XHTML. Using the underscore produces invalid XML and XHTML.

See: http://www.w3.org/TR/REC-xml/#sec-lang-tag

The shell uses a different ISO standard for language delineation than is used in XML and XHTML, publican handles this difference when producing desktop documentation, I'm not sure what other tools chains do.

> > What happens if you have a xx language and later on someone adds a xx-YY
> > variant? I recently heard this happened to one project when es-MX was added and
> > es already existed.
> > 
> 
> Is this really a problem? In which way?

He asked for a clear policy, having mixed 2 and 5 letter codes for the same language does not aid clarity.

> > My point being that xx[-YY] is more prone to confusion than always having
> > xx-YY.
> > 
> 
> From l10n view, it's just complicating things that always have been simple.

I don't think using hi-IN instead of hi is complex in any way.

This ticket wouldn't exist if there wasn't some existing confusion.

Cheers, Jeff.
Comment 6 A S Alam 2008-11-12 01:00:21 EST
Ticket's existence's reason is as Build is not available for 'hi' (need to add LINGUAS)

hi_IN or hi-IN is not exist in our system (eg: http://l10n.fedoraproject.org/languages/hi_IN/)

in our current system, committer (or translator) can give any name to 
any file (by mistake) with "or, type the name for a new one:".
like 'xx_ZZ.po'or 'xx_CC.po' 

we can restrict  transfix system from commit anybody to any file name (for Translation like 'po/hi.po' or 'po/hi_IN.po'). It caused this issue.

just FYI:  hi_IN.po file is not actually maintained currently by translator.
(cvs log for it)
------
revision 1.1
date: 2007/11/21 16:44:49;  author: ruturajv;  state: Exp;
complete with security
and first page
--------
Comment 7 Paul W. Frields 2008-11-20 22:57:05 EST
Fixed.

I would suggest that someone open another bug to eliminate all the xx_YY in Fedora Docs and replace with xx-YY so these codes can be used equally as needed in our XML or in the shell without silly mangling.  The existing toolchain has no problem using either AFAIK, and even if we're not going to be on it much longer it behooves us to follow standards.