Description of problem: Babel does not provide a mapping from 'sr@latin' to 'sr_Latn'. Translation files for 'Serbian (Latin)' are in the directory: /usr/share/locale/sr@latin/ 'sr@latin' is the name used at transifex: https://fedora.transifex.com/projects/p/fedora/language/sr@latin/ >>> babel.Locale.parse('sr@latin').__dict__ {'_Locale__data': None, 'territory': None, 'variant': None, 'language': 'sr', 'script': None} >>> babel.Locale.parse('sr_Latn').__dict__ {'_Locale__data': None, 'territory': None, 'variant': None, 'language': 'sr', 'script': 'Latn'} Version-Release number of selected component (if applicable): python-babel-0.9.6-3.fc17.noarch filesystem-3-2.fc17.x86_64 How reproducible: Always. Steps to Reproduce: 1. babel.Locale.parse('sr@latin').__dict__ Actual results: 'script': None Expected results: 'script': 'Latn' Additional info: Bug 866730 - invalid locales configured for some languages $ ls -d /usr/share/locale/*@* $ rpm -ql filesystem | grep locale | grep '@'
As far as I know Babel is not supposed to parse such locale strings the way you suggested ('@...' will be ignored at best). Therefore it's definitively a thing to discuss upstream. I have to admit, I'm a bit reluctant changing the notion of locale identifiers in Babel but best you bring this topic up on the Babel mailing list. We can leave this bug open until there is a final decision from Babel.
Yes, Babel explicitly ignores modifiers of all types. There is no indication in the code as to why useful information that could be stored in the Locale object is simply thrown away. According to Babel, there is no distinction between 'sr' and 'sr@latin' -- the Locale objects are identical: >>> babel.Locale.parse('sr').__dict__ {'_Locale__data': None, 'territory': None, 'variant': None, 'language': 'sr', 'script': None} >>> babel.Locale.parse('sr@latin').__dict__ {'_Locale__data': None, 'territory': None, 'variant': None, 'language': 'sr', 'script': None} $ less -N /usr/lib/python2.7/site-packages/babel/core.py ... 715 def parse_locale(identifier, sep='_'): 716 """Parse a locale identifier into a tuple of the form:: 717 718 ``(language, territory, script, variant)`` ... 739 Encoding information and locale modifiers are removed from the identifier: ... 761 if '@' in identifier: 762 # this is a locale modifier such as @euro, which we don't care about 763 # either 764 identifier = identifier.split('@', 1)[0] ...
At Transifex, 'sr@latin' seems to be unique in having '@' in the language code: $ curl -s https://fedora.transifex.com/ | grep '@' <a href="/projects/p/fedora/language/sr@latin/" class="tipsy_enable" title="language code: sr@latin" >Serbian (Latin)</a>
(In reply to comment #3) > At Transifex, 'sr@latin' seems to be unique in having '@' in the language > code: > > $ curl -s https://fedora.transifex.com/ | grep '@' > <a href="/projects/p/fedora/language/sr@latin/" class="tipsy_enable" > title="language code: sr@latin" >Serbian (Latin)</a> This is because there are no other translations for languages using the '@SCRIPT' specification. But I believe there is no reason for babel using it's own format and not accepting the commonly used language[_territory][.codeset][@modifier] format.
Babel uses the data from the Unicode Common Locale Data Repository (CLDR): http://cldr.unicode.org/ The string 'sr_Latn_RS' is a Unicode locale identifier. The identifier conforms to a comprehensive specification: Unicode Technical Standard #35 Unicode Locale Data Markup Language (LDML) http://www.unicode.org/reports/tr35/ The EBNF and ABNF syntax specifications are here: 3. Unicode Language and Locale Identifiers http://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers The CLDR locale data summary chart can be found here: Locale Data Summary for root [Root] http://www.unicode.org/cldr/charts/summary/root.html 'sr_Latn_RS' can be found here: Locale Data Summary for sr_Latn [Serbian (Latin)] http://www.unicode.org/cldr/charts/summary/sr_Latn.html
The CLDR is actively maintained: CLDR 22.1 Release Note Date: 2012-10-26 http://cldr.unicode.org/index/downloads/cldr-22-1 "CLDR 22.1 contains data for 215 languages and 227 territories—654 locales in all. Version 22.1 is an update release, with several important fixes to CLDR 22.0." core.zip has the data in XML format: http://unicode.org/Public/cldr/22.1/ $ ls -1 cldr-22.1/common/main/sr* cldr-22.1/common/main/sr_Cyrl_BA.xml cldr-22.1/common/main/sr_Cyrl_ME.xml cldr-22.1/common/main/sr_Cyrl_RS.xml cldr-22.1/common/main/sr_Cyrl.xml cldr-22.1/common/main/sr_Latn_BA.xml cldr-22.1/common/main/sr_Latn_ME.xml cldr-22.1/common/main/sr_Latn_RS.xml cldr-22.1/common/main/sr_Latn.xml cldr-22.1/common/main/sr.xml
The original point of this bug was: Babel does not provide a mapping from 'sr@latin' to 'sr_Latn'. Now, I have changed my mind: :-) 'sr@latin' is not a valid Unicode locale identifier, yet Babel parses it without complaint. Instead, Babel should reject 'sr@latin' by raising an UnknownLocaleError exception. >>> import babel >>> babel.Locale.parse('sr@latin') <Locale "sr"> >>> babel.Locale.parse('foo') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/site-packages/babel/core.py", line 212, in parse return cls(*parse_locale(identifier, sep=sep)) File "/usr/lib/python2.7/site-packages/babel/core.py", line 137, in __init__ raise UnknownLocaleError(identifier) babel.core.UnknownLocaleError: unknown locale 'foo' >>> babel.Locale.parse('foo@bar') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/site-packages/babel/core.py", line 212, in parse return cls(*parse_locale(identifier, sep=sep)) File "/usr/lib/python2.7/site-packages/babel/core.py", line 137, in __init__ raise UnknownLocaleError(identifier) babel.core.UnknownLocaleError: unknown locale 'foo'
All of these should raise an exception: >>> babel.Locale.parse('sr_RS.UTF-8') <Locale "sr_RS"> >>> babel.Locale.parse('sr_RS.RandomJunk') <Locale "sr_RS"> >>> babel.Locale.parse('sr_RS@MoreRandomJunk') <Locale "sr_RS"> >>> babel.Locale.parse('sr_RS.I_can_put_anything_I_want_in_here.parse().accepts.it') <Locale "sr_RS"> Here, Babel raises an exception, but the exception is for an empty string, not the input string: >>> babel.Locale.parse('@foo') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/site-packages/babel/core.py", line 212, in parse return cls(*parse_locale(identifier, sep=sep)) File "/usr/lib/python2.7/site-packages/babel/core.py", line 769, in parse_locale raise ValueError('expected only letters, got %r' % lang) ValueError: expected only letters, got ''
I believe babel should really understand the notation used in the system (e.g. sr_RS.UTF-8@latin). Otherwise this package is unusable without a mapping from common locales to the CLDR values. Having this mapping in every application/library using babel is nonsense, the only rational solution is to have the mapping directly in the babel.
This discussion really seems like it should be taken upstream to the Babel developers as there are larger design issues involved in how to properly solve the problem. As such, I would not be willing to add a patch to the Fedora package as that would diverge too much from upstream.
(In reply to comment #10) > This discussion really seems like it should be taken upstream to the Babel > developers as there are larger design issues involved in how to properly > solve the problem. As such, I would not be willing to add a patch to the > Fedora package as that would diverge too much from upstream. Thanks for your comments. Fedora certainly should not be patching Babel for this issue, but blowing off Babel clients with a "take it upstream close-out" indicates a package maintainer who does not understand that it is his job to interface with upstream.
I don't know what Jeffrey's line of thought was but I have to say I'd prefer this discussion on the babel mailing list as well. I'd say that "interfacing with upstream" should not be placed on the maintainer's shoulders. This is a new feature for babel so it requires some background info/justification and thought to get it implemented correctly. Best you make a good case yourself on the babel mailing list instead of Jeffrey or someone else relaying it. Having this bug in Bugzilla does not feel useful to me as well - there is nothing to be done within the Fedora scope *and* it's unlikely other Fedora users will expect information here.
(In reply to comment #11) > blowing off Babel clients with a "take it upstream > close-out" indicates a package maintainer who does not understand that it is > his job to interface with upstream. Requiring a package maintainer to interface with upstream on every bug, especially ones like this one that are going to require a significant development effort is the way to alienate lots of package maintainers. I'm a competent programmer (which shouldn't be a requirement to be a package maintainer) but I have no familiarity with the internals of the code and little knowledge/interest in the problem domain (I only speak two languages, English and bad English) so I'm not the best person to be advocating for the sorts of changes you are looking for. When it comes to packaging issues I'm willing to work with upstream to solve them (see http://babel.edgewall.org/ticket/34 for proof). My interest in Babel was that I needed to package Babel before I could package something else that I was more interested in. If there's someone that's active upstream and wants to maintain the Babel package I'm more than willing to cede ownership.
Vratislav came up with a solution that works fine for 'sr@latin', so I'm removing this bug as a blocker for Bug 866730: Bug 872786 - [sr@latin] 'Serbian (Latin)' not listed in languages menu; 'Serbian (Serbia)' listed twice
I just want to mention that effectively I am upstream - but anyway I like to have new feature discussion happen at the Babel mailing list!
Babel Mailing Lists http://babel.edgewall.org/wiki/MailingList browse the archive http://groups.google.com/group/python-babel