871464 – [sr@latin] script not parsed as 'Latn'

Bug 871464 - [sr@latin] script not parsed as 'Latn'

Summary: [sr@latin] script not parsed as 'Latn'

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	babel
Sub Component:
Version:	17
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Jeffrey C. Ollie
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-10-30 13:33 UTC by Steve Tyler
Modified:	2012-11-08 08:30 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-11-07 19:40:38 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Steve Tyler 2012-10-30 13:33:35 UTC

Description of problem:

Babel does not provide a mapping from 'sr@latin' to 'sr_Latn'.

Translation files for 'Serbian (Latin)' are in the directory:
/usr/share/locale/sr@latin/

'sr@latin' is the name used at transifex:
https://fedora.transifex.com/projects/p/fedora/language/sr@latin/

>>> babel.Locale.parse('sr@latin').__dict__
{'_Locale__data': None, 'territory': None, 'variant': None, 'language': 'sr', 'script': None}
>>> babel.Locale.parse('sr_Latn').__dict__
{'_Locale__data': None, 'territory': None, 'variant': None, 'language': 'sr', 'script': 'Latn'}

Version-Release number of selected component (if applicable):
python-babel-0.9.6-3.fc17.noarch
filesystem-3-2.fc17.x86_64

How reproducible:
Always.

Steps to Reproduce:
1. babel.Locale.parse('sr@latin').__dict__
  
Actual results:
'script': None

Expected results:
'script': 'Latn'

Additional info:

Bug 866730 - invalid locales configured for some languages

$ ls -d /usr/share/locale/*@*
$ rpm -ql filesystem | grep locale | grep '@'

Comment 1 Felix Schwarz 2012-10-30 23:26:32 UTC

As far as I know Babel is not supposed to parse such locale strings the way you suggested ('@...' will be ignored at best). Therefore it's definitively a thing to discuss upstream. 

I have to admit, I'm a bit reluctant changing the notion of locale identifiers in Babel but best you bring this topic up on the Babel mailing list.

We can leave this bug open until there is a final decision from Babel.

Comment 2 Steve Tyler 2012-10-31 02:50:56 UTC

Yes, Babel explicitly ignores modifiers of all types. There is no indication in the code as to why useful information that could be stored in the Locale object is simply thrown away. According to Babel, there is no distinction between 'sr' and 'sr@latin' -- the Locale objects are identical:

>>> babel.Locale.parse('sr').__dict__
{'_Locale__data': None, 'territory': None, 'variant': None, 'language': 'sr', 'script': None}
>>> babel.Locale.parse('sr@latin').__dict__
{'_Locale__data': None, 'territory': None, 'variant': None, 'language': 'sr', 'script': None}


$ less -N /usr/lib/python2.7/site-packages/babel/core.py
...
    715 def parse_locale(identifier, sep='_'):
    716     """Parse a locale identifier into a tuple of the form::
    717     
    718       ``(language, territory, script, variant)``
...
    739     Encoding information and locale modifiers are removed from the identifier:
...
    761     if '@' in identifier:
    762         # this is a locale modifier such as @euro, which we don't care about
    763         # either
    764         identifier = identifier.split('@', 1)[0]
...

Comment 3 Steve Tyler 2012-10-31 04:02:14 UTC

At Transifex, 'sr@latin' seems to be unique in having '@' in the language code:

$ curl -s https://fedora.transifex.com/ | grep '@'
      <a href="/projects/p/fedora/language/sr@latin/"  class="tipsy_enable" title="language code: sr@latin" >Serbian (Latin)</a>

Comment 4 Vratislav Podzimek 2012-11-06 12:28:13 UTC

(In reply to comment #3)
> At Transifex, 'sr@latin' seems to be unique in having '@' in the language
> code:
> 
> $ curl -s https://fedora.transifex.com/ | grep '@'
>       <a href="/projects/p/fedora/language/sr@latin/"  class="tipsy_enable"
> title="language code: sr@latin" >Serbian (Latin)</a>
This is because there are no other translations for languages using the '@SCRIPT' specification.
But I believe there is no reason for babel using it's own format and not accepting the commonly used language[_territory][.codeset][@modifier] format.

Comment 5 Steve Tyler 2012-11-06 13:02:57 UTC

Babel uses the data from the Unicode Common Locale Data Repository (CLDR):
http://cldr.unicode.org/

The string 'sr_Latn_RS' is a Unicode locale identifier. The identifier conforms to a comprehensive specification:

Unicode Technical Standard #35
Unicode Locale Data Markup Language (LDML)
http://www.unicode.org/reports/tr35/

The EBNF and ABNF syntax specifications are here:

3. Unicode Language and Locale Identifiers
http://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers

The CLDR locale data summary chart can be found here:

Locale Data Summary for root [Root]
http://www.unicode.org/cldr/charts/summary/root.html

'sr_Latn_RS' can be found here:

Locale Data Summary for sr_Latn [Serbian (Latin)]
http://www.unicode.org/cldr/charts/summary/sr_Latn.html

Comment 6 Steve Tyler 2012-11-06 13:17:59 UTC

The CLDR is actively maintained:

CLDR 22.1 Release Note
Date: 2012-10-26
http://cldr.unicode.org/index/downloads/cldr-22-1

"CLDR 22.1 contains data for 215 languages and 227 territories—654 locales in all. Version 22.1 is an update release, with several important fixes to CLDR 22.0."

core.zip has the data in XML format:
http://unicode.org/Public/cldr/22.1/

$ ls -1 cldr-22.1/common/main/sr*
cldr-22.1/common/main/sr_Cyrl_BA.xml
cldr-22.1/common/main/sr_Cyrl_ME.xml
cldr-22.1/common/main/sr_Cyrl_RS.xml
cldr-22.1/common/main/sr_Cyrl.xml
cldr-22.1/common/main/sr_Latn_BA.xml
cldr-22.1/common/main/sr_Latn_ME.xml
cldr-22.1/common/main/sr_Latn_RS.xml
cldr-22.1/common/main/sr_Latn.xml
cldr-22.1/common/main/sr.xml

Comment 7 Steve Tyler 2012-11-06 13:30:01 UTC

The original point of this bug was:

Babel does not provide a mapping from 'sr@latin' to 'sr_Latn'.

Now, I have changed my mind: :-)

'sr@latin' is not a valid Unicode locale identifier, yet Babel parses it without complaint. Instead, Babel should reject 'sr@latin' by raising an UnknownLocaleError exception.

>>> import babel
>>> babel.Locale.parse('sr@latin')
<Locale "sr">
>>> babel.Locale.parse('foo')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/babel/core.py", line 212, in parse
    return cls(*parse_locale(identifier, sep=sep))
  File "/usr/lib/python2.7/site-packages/babel/core.py", line 137, in __init__
    raise UnknownLocaleError(identifier)
babel.core.UnknownLocaleError: unknown locale 'foo'
>>> babel.Locale.parse('foo@bar')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/babel/core.py", line 212, in parse
    return cls(*parse_locale(identifier, sep=sep))
  File "/usr/lib/python2.7/site-packages/babel/core.py", line 137, in __init__
    raise UnknownLocaleError(identifier)
babel.core.UnknownLocaleError: unknown locale 'foo'

Comment 8 Steve Tyler 2012-11-06 13:51:39 UTC

All of these should raise an exception:

>>> babel.Locale.parse('sr_RS.UTF-8')
<Locale "sr_RS">
>>> babel.Locale.parse('sr_RS.RandomJunk')
<Locale "sr_RS">
>>> babel.Locale.parse('sr_RS@MoreRandomJunk')
<Locale "sr_RS">
>>> babel.Locale.parse('sr_RS.I_can_put_anything_I_want_in_here.parse().accepts.it')
<Locale "sr_RS">

Here, Babel raises an exception, but the exception is for an empty string, not the input string:

>>> babel.Locale.parse('@foo')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/babel/core.py", line 212, in parse
    return cls(*parse_locale(identifier, sep=sep))
  File "/usr/lib/python2.7/site-packages/babel/core.py", line 769, in parse_locale
    raise ValueError('expected only letters, got %r' % lang)
ValueError: expected only letters, got ''

Comment 9 Vratislav Podzimek 2012-11-07 09:06:13 UTC

I believe babel should really understand the notation used in the system (e.g. sr_RS.UTF-8@latin). Otherwise this package is unusable without a mapping from common locales to the CLDR values. Having this mapping in every application/library using babel is nonsense, the only rational solution is to have the mapping directly in the babel.

Comment 10 Jeffrey C. Ollie 2012-11-07 19:40:38 UTC

This discussion really seems like it should be taken upstream to the Babel developers as there are larger design issues involved in how to properly solve the problem.  As such, I would not be willing to add a patch to the Fedora package as that would diverge too much from upstream.

Comment 11 Steve Tyler 2012-11-07 20:08:40 UTC

(In reply to comment #10)
> This discussion really seems like it should be taken upstream to the Babel
> developers as there are larger design issues involved in how to properly
> solve the problem.  As such, I would not be willing to add a patch to the
> Fedora package as that would diverge too much from upstream.

Thanks for your comments. Fedora certainly should not be patching Babel for this issue, but blowing off Babel clients with a "take it upstream close-out" indicates a package maintainer who does not understand that it is his job to interface with upstream.

Comment 12 Felix Schwarz 2012-11-07 20:47:42 UTC

I don't know what Jeffrey's line of thought was but I have to say I'd prefer this discussion on the babel mailing list as well. I'd say that "interfacing with upstream" should not be placed on the maintainer's shoulders. 

This is a new feature for babel so it requires some background info/justification and thought to get it implemented correctly. Best you make a good case yourself on the babel mailing list instead of Jeffrey or someone else relaying it.

Having this bug in Bugzilla does not feel useful to me as well - there is nothing to be done within the Fedora scope *and* it's unlikely other Fedora users will expect information here.

Comment 13 Jeffrey C. Ollie 2012-11-07 21:40:54 UTC

(In reply to comment #11)
> blowing off Babel clients with a "take it upstream
> close-out" indicates a package maintainer who does not understand that it is
> his job to interface with upstream.

Requiring a package maintainer to interface with upstream on every bug, especially ones like this one that are going to require a significant development effort is the way to alienate lots of package maintainers.

I'm a competent programmer (which shouldn't be a requirement to be a package maintainer) but I have no familiarity with the internals of the code and little knowledge/interest in the problem domain (I only speak two languages, English and bad English) so I'm not the best person to be advocating for the sorts of changes you are looking for.

When it comes to packaging issues I'm willing to work with upstream to solve them (see http://babel.edgewall.org/ticket/34 for proof).

My interest in Babel was that I needed to package Babel before I could package something else that I was more interested in. If there's someone that's active upstream and wants to maintain the Babel package I'm more than willing to cede ownership.

Comment 14 Steve Tyler 2012-11-07 22:20:19 UTC

Vratislav came up with a solution that works fine for 'sr@latin', so I'm removing this bug as a blocker for Bug 866730:

Bug 872786 - [sr@latin] 'Serbian (Latin)' not listed in languages menu; 'Serbian (Serbia)' listed twice

Comment 15 Felix Schwarz 2012-11-08 07:24:53 UTC

I just want to mention that effectively I am upstream - but anyway I like to have new feature discussion happen at the Babel mailing list!

Comment 16 Steve Tyler 2012-11-08 08:30:24 UTC

Babel Mailing Lists
http://babel.edgewall.org/wiki/MailingList

browse the archive
http://groups.google.com/group/python-babel

Note You need to log in before you can comment on or make changes to this bug.