Bug 866730 - invalid locales configured for some languages
Summary: invalid locales configured for some languages
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: anaconda
Version: 18
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Vratislav Podzimek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedNTH RejectedBlocker
Depends On:
Blocks: F18Beta-accepted, F18BetaFreezeExcept
TreeView+ depends on / blocked
 
Reported: 2012-10-16 02:10 UTC by Steve Tyler
Modified: 2012-11-08 09:31 UTC (History)
11 users (show)

Fixed In Version: anaconda-18.24-1
Clone Of:
Environment:
Last Closed: 2012-11-08 09:31:35 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
anaconda-18.18-more-locales-fix.patch (3.24 KB, patch)
2012-10-19 09:05 UTC, Jens Petersen
no flags Details | Diff
mangle-test-2.py mangleMap locale validation test (3.06 KB, text/plain)
2012-10-19 22:02 UTC, Steve Tyler
no flags Details
mangle-test-report-1.txt report from the previously attached mangle-test-2.py (4.79 KB, text/plain)
2012-10-19 22:10 UTC, Steve Tyler
no flags Details
localization-test-report-1.txt showing locales returned by locale.normalize() (7.06 KB, text/plain)
2012-10-23 21:43 UTC, Steve Tyler
no flags Details
localization-test-2.py locale validation test (2.60 KB, text/plain)
2012-10-23 21:52 UTC, Steve Tyler
no flags Details
localization-test-3.py locale validation test for pyanaconda/localization.py (3.06 KB, text/plain)
2012-10-30 02:37 UTC, Steve Tyler
no flags Details
localization-test-report-2.txt showing results of latest commit with mangleMap patch (7.50 KB, text/plain)
2012-10-30 05:05 UTC, Steve Tyler
no flags Details
localization-manglemap-1.patch (3.10 KB, patch)
2012-10-30 05:23 UTC, Steve Tyler
no flags Details | Diff
localization-manglemap-sr-latin-1.patch (3.65 KB, patch)
2012-10-31 22:20 UTC, Steve Tyler
no flags Details | Diff
Patch for LocaleInfo.__repr__ (3.98 KB, patch)
2012-11-02 13:53 UTC, Vratislav Podzimek
no flags Details | Diff
PatchV2 for the LocaleInfo.__repr__ (4.15 KB, patch)
2012-11-02 19:22 UTC, Vratislav Podzimek
no flags Details | Diff
localization-test-report-18_23_1_EXP_2.txt (7.47 KB, text/plain)
2012-11-02 20:11 UTC, Steve Tyler
no flags Details
localization-test-report-18_24_1.txt showing 75 out of 75 valid locales (7.46 KB, text/plain)
2012-11-03 05:11 UTC, Steve Tyler
no flags Details

Description Steve Tyler 2012-10-16 02:10:14 UTC
Description of problem:

Selecting these languages results in an invalid locale:

Belarusian      be
Bosnian         bs
Basque          eu
Armenian        hy
Georgian        ka
Serbian         sr

These alternative languages from the language menu result in a valid locale:
Basque (Spain)      eu_ES
Serbian (Serbia)    sr_RS

There are no alternatives for Belarusian, Bosnian, Armenian, Georgian.

There are 74 languages in the language menu. I did not test every one ...

Tested by repeatedly running clean, minimal installs from the DVD.

Command-line:
$ qemu-kvm -m 2048 -hda f18-test-1.img -cdrom ~/xfr/fedora/F18/F18-Beta/TC4/Fedora-18-Beta-TC4-x86_64-DVD.iso -usb -vga qxl -boot menu=on -usbdevice mouse

Version-Release number of selected component (if applicable):
anaconda 18.16
Fedora-18-Beta-TC4-x86_64-DVD.iso

How reproducible:
Always.

Steps to Reproduce:
1. Do a clean, minimal install with one of the languages listed above.
2. After rebooting, login as root.
3. # locale
   # cat /etc/sysconfig/i18n
   # cat /etc/locale.conf

Actual results:
During login, "cannot change locale" messages are displayed.
Attachment 620140 [details] is a screenshot with examples.
The 'locale' command returns "No such file or directory" errors.
The files /etc/sysconfig/i18n and /etc/locale.conf have invalid locales.

Expected results:
No messages are displayed during login.
No errors are returned by the 'locale' command.
The files /etc/sysconfig/i18n and /etc/locale.conf have valid locales.

Additional info:
Valid locales are the names of text files in /usr/share/i18n/locales/.

The GNU documentation says this:

2.3.1 Locale Names
"A locale name usually has the form ‘ll_CC’. Here ‘ll’ is an ISO 639 two-letter language code, and ‘CC’ is an ISO 3166 two-letter country code."
http://www.gnu.org/software/gettext/manual/html_node/Locale-Names.html#Locale-Names

See also:
Bug 858591 - anaconda setting invalid system locale xx.UTF-8 not xx_YY.UTF-8

Comment 1 Adam Williamson 2012-10-16 02:25:52 UTC
Proposing as a Beta blocker to make sure we consider this. However I'd actually vote for NTH status rather than blocker, on the basis that the set of affected languages is pretty small in terms of absolute size and number of affected users.

Comment 2 Steve Tyler 2012-10-16 02:48:29 UTC
Specifically, the affected languages are Belarusian (be), Bosnian (bs), Armenian (hy), Georgian (ka). There appear to be alternatives for Basque (eu) and Serbian (sr) in the menu, namely 'Basque (Spain)' (eu_ES) and 'Serbian (Serbia)' (sr_RS), respectively.

For reference, the last column of this table shows locales from /usr/share/i18n/locales/:

Belarusian      be      be_BY, be_BY@latin
Bosnian         bs      bs_BA
Basque          eu      eu_ES, eu_ES@euro
Armenian        hy      hy_AM
Georgian        ka      ka_GE
Serbian         sr      sr_ME, sr_RS, sr_RS@latin

Hand-edited from the this:

$ ls -1 `cat invalid-locales-18.16-1.txt | sed -e 's@^@/usr/share/i18n/locales/@' -e 's/$/_*/'`
/usr/share/i18n/locales/be_BY
/usr/share/i18n/locales/be_BY@latin
/usr/share/i18n/locales/bs_BA
/usr/share/i18n/locales/eu_ES
/usr/share/i18n/locales/eu_ES@euro
/usr/share/i18n/locales/hy_AM
/usr/share/i18n/locales/ka_GE
/usr/share/i18n/locales/sr_ME
/usr/share/i18n/locales/sr_RS
/usr/share/i18n/locales/sr_RS@latin

Comment 3 Steve Tyler 2012-10-16 02:58:02 UTC
Chris: Would it be possible to dump a table of the languages and the corresponding locales to a file? There are 74 languages by my count, and it would be a lot easier to inspect a text file than to run through 74 test installs just to see what gets configured.

Also, I experimented with raising an exception when an invalid locale was configured, but it became apparent that recovery (e.g. cleaning up filesystems) was going to require experts ...

Comment 4 Steve Tyler 2012-10-16 04:49:45 UTC
Assuming I didn't make an mistakes ...

Anaconda Master translation coverage at Transifex:
be           5%
bs          12%
eu_ES        0%
eu          13%
hy           3%
ka          28%
sr@latin    28%
sr          28%

anaconda translations in F17:
$ find `cat invalid-locales-18.16-1.txt | sed -e 's@^@/usr/share/locale/@' -e 's/$/*/'` -name 'anaconda.mo' | xargs ls -1 --size
 16 /usr/share/locale/be/LC_MESSAGES/anaconda.mo
 40 /usr/share/locale/bs/LC_MESSAGES/anaconda.mo
  4 /usr/share/locale/eu_ES/LC_MESSAGES/anaconda.mo
 20 /usr/share/locale/eu/LC_MESSAGES/anaconda.mo
  4 /usr/share/locale/hy/LC_MESSAGES/anaconda.mo
 12 /usr/share/locale/ka/LC_MESSAGES/anaconda.mo
100 /usr/share/locale/sr@latin/LC_MESSAGES/anaconda.mo
128 /usr/share/locale/sr/LC_MESSAGES/anaconda.mo

anaconda translation links at Transifex:
https://fedora.transifex.com/projects/p/fedora/language/be/?project=2059
https://fedora.transifex.com/projects/p/fedora/language/bs/?project=2059
https://fedora.transifex.com/projects/p/fedora/language/eu_ES/?project=2059
https://fedora.transifex.com/projects/p/fedora/language/eu/?project=2059
https://fedora.transifex.com/projects/p/fedora/language/hy/?project=2059
https://fedora.transifex.com/projects/p/fedora/language/ka/?project=2059
https://fedora.transifex.com/projects/p/fedora/language/sr@latin/?project=2059
https://fedora.transifex.com/projects/p/fedora/language/sr/?project=2059

Comment 5 Mike FABIAN 2012-10-16 09:05:13 UTC
(In reply to comment #3)
> Chris: Would it be possible to dump a table of the languages and the
> corresponding locales to a file? There are 74 languages by my count, and it
> would be a lot easier to inspect a text file than to run through 74 test
> installs just to see what gets configured.

The languages Steve mentions which result in an invalid locale are
all missing in “mangleMap”:

mfabian@ari:~/rpmsources/fedora/anaconda/anaconda-18.16 (f18)
$ grep 'mangleMap =' ./pyanaconda/localization.py -A17
    mangleMap = {"af":  "af_ZA",  "am":  "am_ET",  "ar":  "ar_SA",  "as":  "as_IN",
                 "ast": "ast_ES", "bg":  "bg_BG",  "bn":  "bn_BD",  "ca":  "ca_ES",
                 "cs":  "cs_CZ",  "cy":  "cy_GB",  "da":  "da_DK",  "de":  "de_DE",
                 "el":  "el_GR",  "en":  "en_US",  "es":  "es_ES",  "et":  "et_EE",
                 "fa":  "fa_IR",  "fi":  "fi_FI",  "fr":  "fr_FR",  "gl":  "gl_ES",
                 "gu":  "gu_IN",  "he":  "he_IL",  "hi":  "hi_IN",  "hr":  "hr_HR",
                 "hu":  "hu_HU",  "id":  "id_ID",  "ilo": "ilo_PH", "is":  "is_IS",
                 "it":  "it_IT",  "ja":  "ja_JP",  "kk":  "kk_KZ",  "kn":  "kn_IN",
                 "ko":  "ko_KR",  "lt":  "lt_LT",  "lv":  "lv_LV",  "mai": "mai_IN",
                 "mk":  "mk_MK",  "ml":  "ml_IN",  "mr":  "mr_IN",  "ms":  "ms_MY",
                 "nb":  "nb_NO",  "nds": "nds_DE", "ne":  "ne_NP",  "nl":  "nl_NL",
                 "nn":  "nn_NO",  "nso": "nso_ZA", "or":  "or_IN",  "pa":  "pa_IN",
                 "pl":  "pl_PL",  "pt":  "pt_PT",  "ro":  "ro_RO",  "ru":  "ru_RU",
                 "si":  "si_LK",  "sk":  "sk_SK",  "sl":  "sl_SI",  "sq":  "sq_AL",
                 "sr":  "sr_RS",  "sv":  "sv_SE",  "ta":  "ta_IN",  "te":  "te_IN",
                 "tg":  "tg_TJ",  "th":  "th_TH",  "tr":  "tr_TR",  "uk":  "uk_UA",
                 "ur":  "ur_PK",  "vi":  "vi_VN",  "zu":  "zu_ZA"}

mfabian@ari:~/rpmsources/fedora/anaconda/anaconda-18.16 (f18)
$ 


> Also, I experimented with raising an exception when an invalid locale was
> configured, but it became apparent that recovery (e.g. cleaning up
> filesystems) was going to require experts ...

Woudln‘t it be better to set a valid fallback locale like en_US.UTF-8 if
a language cannot be found in mangleMap instead of setting an invalid locale?

Comment 6 Adam Williamson 2012-10-17 17:08:44 UTC
Discussed at 2012-10-17 blocker review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-10-17/f18beta-blocker-review-4.2012-10-17-16.00.log.txt . Agreed that this is a conditional breakage of the criteria, but the likely impact of this bug (the total number of users of the affected languages) is too small for it to qualify as a blocker - especially since the affected translations are apparently highly incomplete anyway (thanks Steve) so in practice you couldn't use them without knowing English. It is rejected as a blocker, accepted as NTH.

Comment 7 Jens Petersen 2012-10-19 08:32:47 UTC
(In reply to comment #5)
> Wouldn‘t it be better to set a valid fallback locale like en_US.UTF-8 if
> a language cannot be found in mangleMap instead of setting an invalid locale?

Yes +1 for this.

Comment 8 Jens Petersen 2012-10-19 09:05:54 UTC
Created attachment 629895 [details]
anaconda-18.18-more-locales-fix.patch

Thanks, Steve, for the careful anaylysis.

This patch adds the 5 missing locales from comment 2 to
the locale mangleMap and makes it fallback to en_US instead of
an invalid locale if the language code is not listed in the map.

(It might not do the right thing for sr@latin yet but I think
this is good enough for now anyway.)

Comment 9 Steve Tyler 2012-10-19 11:22:15 UTC
Thanks for your patch.

-    return mangleMap.get(inLocale, inLocale)
+    return mangleMap.get(inLocale, "en_US")

1. We might want to know if a locale cannot be found, so writing something to a log file might be a good idea.
2. How would users be notified that the "en_US" fallback locale has been configured instead of a locale that corresponds to the language they selected from the menu?

Comment 10 Steve Tyler 2012-10-19 17:38:26 UTC
The patch wouldn't apply against 18.18-1 until I reversed the patch from Attachment 621937 [details]:

$ patch -b -R < anaconda-fix-kk-tg-locales.patch
$ patch -b < anaconda-18.18-more-locales-fix-1.patch

http://git.fedorahosted.org/git/anaconda.git

Comment 11 Steve Tyler 2012-10-19 18:36:27 UTC
-    return mangleMap.get(inLocale, inLocale)
+    return mangleMap.get(inLocale, "en_US")

I don't think this will work. The list 'languages' can have 'langcode's not in mangleMap, because mangleMap is not a complete list. All of those will get mapped to "en_US".

$ less -N anaconda-18.18-1/pyanaconda/localization.py

    165     for langcode in languages:
    166         try:
    167             localedata = babel.Locale.parse(mangleLocale(langcode))
    168         except babel.core.UnknownLocaleError:
    169             continue

Comment 12 Steve Tyler 2012-10-19 22:02:03 UTC
Created attachment 630267 [details]
mangle-test-2.py mangleMap locale validation test

The attached mangle-test-2.py validates the locales in mangleMap against the locale file names in /usr/share/i18n/locales/.

The executive summary is that they are all valid except for ilo_PH and a fake test locale.

Usage: ./mangle-test-2.py
Output is a report to stdout.

The copy of mangleMap used has the patch from Jens applied.

Python programmers working with locales might be interested in the locale.normalize() function. It would work in place of mangleMap, if the locale_alias table it uses were complete.

locale.normalize(localename)

    Returns a normalized locale code for the given locale name. The returned locale code is formatted for use with setlocale(). If normalization fails, the original name is returned unchanged.

    If the given encoding is not known, the function defaults to the default encoding for the locale code just like setlocale().

http://docs.python.org/library/locale.html

The normalize() function and the locale_alias table are here:
$ rpm -qf /usr/lib64/python2.7/locale.py
python-libs-2.7.3-7.2.fc17.x86_64

Comment 13 Steve Tyler 2012-10-19 22:10:47 UTC
Created attachment 630273 [details]
mangle-test-report-1.txt report from the previously attached mangle-test-2.py

This report shows the results of running mangle-test-2.py on an updated F17 system with:

$ rpm -q python-libs glibc-common
python-libs-2.7.3-7.2.fc17.x86_64
glibc-common-2.15-57.fc17.x86_64

The rightmost column allows you to compare mangleMap with locale.normalize().

Comment 14 Steve Tyler 2012-10-20 22:12:55 UTC
I ran seven test installs with Jens' pyanaconda/localization.py patch (without the "en_US" change) from the F18-Beta-TC6 Live CD.

The results are an improvement. Now, Belarusian, Bosnian, Armenian, Georgian are listed with countries in the language menu, and a valid locale is configured for them. Basque is no longer listed in the menu, although 'Basque (Spain)' is still listed.

The one remaining problem is that Serbian still appears in the menu, and 'sr' is configured for the locale.

Menu Item                           Locale
Belarusian (Belarus)                be_BY
Bosnian (Bosnia and Herzegovina)    bs_BA
Basque                              [not in menu]
Basque (Spain)                      eu_ES
Armenian (Armenia)                  hy_AM
Georgian (Georgia)                  ka_GE
Serbian                             sr
Serbian (Serbia)                    sr_RS

Command-line:
$ qemu-kvm -m 2048 -hda f18-test-2.img -cdrom ~/xfr/fedora/F18/F18-Beta/TC6/Fedora-18-Beta-TC6-x86_64-Live-Desktop.iso -usb -vga qxl -boot menu=on -usbdevice mouse

Comment 15 Jens Petersen 2012-10-22 05:47:38 UTC
Thank you, Steve, nice work.

If you have time perhaps perhaps it is worth seeing how much of
the mangleMap could be replaced by something like:

  locale.normalize(inLocale + 'utf8')

It sounds like it could simplify things a bit.

Seems to do the right thing though for me:

>>> locale.normalize('sr@latin')
'sr_RS.UTF-8@latin'
>>> locale.normalize('de.UTF-8')
'de_DE.UTF-8'

Otherwise not really sure what to do about sr@latin at this stage:
it only seems to be 27% translated though for anaconda...

(In reply to comment #9)
> -    return mangleMap.get(inLocale, inLocale)
> +    return mangleMap.get(inLocale, "en_US")
> 
> 1. We might want to know if a locale cannot be found, so writing something
> to a log file might be a good idea.

Good point - I think the lang/locale setting should always be logged anyway.

> 2. How would users be notified that the "en_US" fallback locale has been
> configured instead of a locale that corresponds to the language they
> selected from the menu?

Right - there could be a popup dialog saying that anaconda doesn't
know that the locale for the lang, but actually I would really prefer
anaconda just didn't list langs without a valid associated locale.
Otherwise I think it is better to fallback to a valid locale (en_US.utf8)
than setting an incorrect system one.

Comment 16 Steve Tyler 2012-10-23 04:29:33 UTC
Part of the problem is that 'sr@latin' is getting changed to 'sr':

$ python
Python 2.7.3 (default, Jul 24 2012, 10:05:38) 
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import babel
>>> babel.parse_locale('sr')
('sr', None, None, None)
>>> babel.parse_locale('sr@latin')
('sr', None, None, None)
>>> babel.parse_locale('sr_RS')
('sr', 'RS', None, None)
>>> 

The result is that 'sr' is returned *twice* by get_available_translations(), because babel.Locale.parse() calls babel.parse_locale().
See Line 147 below and /usr/lib/python2.7/site-packages/babel/core.py.

$ less -N anaconda-18.12-1/pyanaconda/localization.py
...
    133 def get_available_translations(domain=None, localedir=None):
    134     domain = domain or gettext._current_domain
    135     localedir = localedir or gettext._default_localedir
    136 
    137     langdict = babel.Locale('en', 'US').languages
    138     messagefiles = gettext.find(domain, localedir, langdict.keys(), all=True)
    139     languages = [path.split(os.path.sep)[-3] for path in messagefiles]
    140 
    141     # usually there are no message files for en
    142     if 'en' not in languages:
    143         languages.append('en')
    144 
    145     for langcode in languages:
    146         try:
    147             localedata = babel.Locale.parse(langcode)
    148         except babel.core.UnknownLocaleError:
    149             continue
    150 
    151         yield LocaleInfo(localedata)
...

Comment 17 Steve Tyler 2012-10-23 05:37:43 UTC
Thanks, Jens. The locale 'sr_RS.UTF-8@latin' seems to be valid:

$ LANG='sr_RS.UTF-8@latin' locale
LANG=sr_RS.UTF-8@latin
...

There is a locale file /usr/share/i18n/locales/sr_RS@latin.

$ grep title /usr/share/i18n/locales/sr_RS@latin
title      "Serbian Latin locale for Serbia"

What would the language menu show? 'Serbian Latin (Serbia)'?

Comment 18 Steve Tyler 2012-10-23 06:00:43 UTC
(In reply to comment #17)
...
> What would the language menu show? 'Serbian Latin (Serbia)'?

At Transifex, it is called 'Serbian (Latin)':
https://fedora.transifex.com/projects/p/fedora/language/sr@latin/

Comment 19 Steve Tyler 2012-10-23 21:43:06 UTC
Created attachment 632405 [details]
localization-test-report-1.txt showing locales returned by locale.normalize()

This report shows locales for which there are anaconda translations for F17. The 'short_name' column has the strings returned by get_available_translations(). The 'locale.normalize()' column has the strings returned by locale.normalize(short_name).

There are 73 translations. Of these, only 5 have a valid short_name, while 68 have a valid normalized name. (These numbers are at the end of the report.)

There are three anomalies:
1. 'sr' is returned twice by get_available_translations().
2. The babel module raises this exception: "unknown locale 'ar_AA'".
3. The actual number of anaconda.mo files is 81:
   $ find /usr/share/locale -name 'anaconda.mo' | wc -l
   81

Testing was done with pyanaconda/localization.py from anaconda-18.12-1. That version does not have the mangleMap.

anaconda-17.29-1.fc17.x86_64
glibc-common-2.15-57.fc17.x86_64
python-babel-0.9.6-3.fc17.noarch
python-libs-2.7.3-7.2.fc17.x86_64

Comment 20 Steve Tyler 2012-10-23 21:52:05 UTC
Created attachment 632419 [details]
localization-test-2.py locale validation test

Comment 21 Steve Tyler 2012-10-24 04:00:05 UTC
(In reply to comment #19)
> 3. The actual number of anaconda.mo files is 81:
>    $ find /usr/share/locale -name 'anaconda.mo' | wc -l
>    81

We appear to be losing some translations. These are returned by locale.normalize():

'bn_IN.UTF-8'
'mai_IN.UTF-8'
'sr_RS.UTF-8@latin'
'zh_TW.big5'

'<' anaconda.mo translation files in /usr/share/locale
'>' Languages returned by get_available_translations()

$ diff lang-trans-1.txt lang-anac-1.txt
5,6d4
< ast
< bal
10d7
< bn_IN
19c16
< en@boldquot
---
> en
21d17
< en@quot
37d32
< ilo
46d40
< mai
52d45
< nds
69c62
< sr@latin
---
> sr
80d72
< zh_TW

==
The input files were generated with:

$ find /usr/share/locale -name 'anaconda.mo' | cut -d '/' -f 5 | sort > lang-trans-1.txt

$ egrep '^ *[0-9]*:' localization-test-report-1.txt | grep -v 'babel:' | sed -e 's/  */:/g' -e 's/\.UTF-8//' | cut -d ':' -f 4 > lang-anac-1.txt

(localization-test-report-1.txt is machine readable, but just barely ... :-))

Comment 22 Adam Williamson 2012-10-24 04:24:56 UTC
As a wise old man once said, every time you use a non-unified diff, a puppy dies.

Comment 23 Steve Tyler 2012-10-24 04:28:49 UTC
locale.normalize() doesn't normalize some languages that have a corresponding locale in /usr/share/i18n/locales/.

$ ls -1 `echo 'ast:bal:bn:ilo:mai:nds:sr:zh' | sed -e 's@:@\n@g' | sed -e 's@^@/usr/share/i18n/locales/@' -e 's@$@*@'`
ls: cannot access /usr/share/i18n/locales/bal*: No such file or directory
ls: cannot access /usr/share/i18n/locales/ilo*: No such file or directory
/usr/share/i18n/locales/ast_ES
/usr/share/i18n/locales/bn_BD
/usr/share/i18n/locales/bn_IN
/usr/share/i18n/locales/mai_IN
/usr/share/i18n/locales/nds_DE
/usr/share/i18n/locales/nds_NL
/usr/share/i18n/locales/sr_ME
/usr/share/i18n/locales/sr_RS
/usr/share/i18n/locales/sr_RS@latin
/usr/share/i18n/locales/zh_CN
/usr/share/i18n/locales/zh_HK
/usr/share/i18n/locales/zh_SG
/usr/share/i18n/locales/zh_TW

Comment 24 Steve Tyler 2012-10-24 04:39:22 UTC
(In reply to comment #22)
> As a wise old man once said, every time you use a non-unified diff, a puppy
> dies.

The "diff -u" was longer and unreadable ... so I liberated myself from habit. :-)

Comment 25 Jens Petersen 2012-10-24 05:57:30 UTC
(In reply to comment #23)
> locale.normalize() doesn't normalize some languages that have a
> corresponding locale in /usr/share/i18n/locales/.
> 
> $ ls -1 `echo 'ast:bal:bn:ilo:mai:nds:sr:zh' | sed -e 's@:@\n@g' | sed -e
> 's@^@/usr/share/i18n/locales/@' -e 's@$@*@'`

mai, sr and, zh work for me at least on F17.  eg:

$ python
>>> import locale
>>> locale.normalize('sr.utf8')
'sr_RS.UTF-8'
>>> locale.normalize('zh.utf8')
'zh_CN.UTF-8'


Without getting too philosophical here,
there is also "Perfect is the enemy of good". :)
Anyway I think we have to be a bit pragmatic here - I am not
sure if there is time to integrate and test a perfect
solution for F18 and quite honestly the number of people doing
installs in some of these languages is going to be rather small.
But of course I am all for improving things to get a better F18 installer. :)

Do you want to post an improved patch so that it can get
reviewed and hopefully integrated into anaconda for testing?

Comment 26 Steve Tyler 2012-10-24 11:42:34 UTC
IMO, pyanaconda/localization.py should not be touched at all at this point. However, we can analyze and document the problems.

Some of the translations that are missing cause babel to raise an UnknownLocaleError exception:

translation     UnknownLocaleError
ast             1
bal             1
bn_IN           0
ilo             1
mai             1
nds             1
sr@latin        0
zh_TW           0

Test with:

>>> import babel
>>> l = babel.Locale.parse('ast')

# if there is no exception:
>>> l = babel.Locale.parse('bn_IN')
>>> l.__dict__
{'_Locale__data': None, 'territory': 'IN', 'variant': None, 'language': 'bn', 'script': None}

Comment 27 Steve Tyler 2012-10-30 02:37:09 UTC
Created attachment 635283 [details]
localization-test-3.py locale validation test for pyanaconda/localization.py

Updated locale validation test for pyanaconda/localization.py:

Usage: ./localization-test-3.py [localization_module[.py]]

This version adds two features:

1. The name of the file under test can be given as an argument:

$ ./localization-test-3.py localization_18_21_1.py

If no argument is given, the file name defaults to "localization.py".

2: Records are tagged for machine-processing:

D: Documentation
L: Locale
S: Summary
E: Error

Comment 28 Steve Tyler 2012-10-30 05:05:15 UTC
Created attachment 635315 [details]
localization-test-report-2.txt showing results of latest commit with mangleMap patch

Here is a locale validation test report for the latest commit[1] with mangleMap patched to include the additional mappings from Jens' patch. There are 75 locales[2] with only one invalid locale: 'sr'.

There are three missing languages for which there is an anaconda translation and a locale file:

translation in          locale file in
/usr/share/locale       /usr/share/i18n/locales/
ast                     ast_ES
bal                     none
ilo                     none
mai                     mai_IN
nds                     nds_DE, nds_NL

Report generated with:
$ ./localization-test-3.py localization_a5ca147a7738ffc7257bd857ea99eac9ea7642a6_2.py > localization-test-report-2.txt

[1] Use a slightly different method to get supported languages (#858801, tagoh).
http://git.fedorahosted.org/cgit/anaconda.git/commit/pyanaconda/localization.py?id=a5ca147a7738ffc7257bd857ea99eac9ea7642a6

[2] Tested with F17.

Comment 29 Steve Tyler 2012-10-30 05:23:32 UTC
Created attachment 635331 [details]
localization-manglemap-1.patch

Apply this patch, which is based on the one from Jens, to commit a5ca147a7738ffc7257bd857ea99eac9ea7642a6 to obtain localization.py generating 74 valid locales out of 75 total.

Use a slightly different method to get supported languages (#858801, tagoh).
http://git.fedorahosted.org/cgit/anaconda.git/commit/pyanaconda/localization.py?id=a5ca147a7738ffc7257bd857ea99eac9ea7642a6

Comment 30 Steve Tyler 2012-10-30 18:52:58 UTC
CLDR 22.1, which was released 2012-10-26, has support for 'ast', but not the other four languages for which there are anaconda translations.[1] There is an open babel ticket for an upgrade to CLDR 21 that is dated 08/19/12.[2] So, I would say that with the addition of the modified patch from Jens and a possible resolution of Bug 871464, this bug will be fixed as well as it can be.

translation in          locale file in              CLDR 22.1
/usr/share/locale       /usr/share/i18n/locales/
ast                     ast_ES                      yes
bal                     none                        no
ilo                     none                        no
mai                     mai_IN                      no
nds                     nds_DE, nds_NL              no

[1] CLDR Releases/Downloads
http://cldr.unicode.org/index/downloads

[2] Ticket #312 (new enhancement)
Upgrade to CLDR 21
http://babel.edgewall.org/ticket/312

Comment 31 Steve Tyler 2012-10-31 22:20:11 UTC
Created attachment 636385 [details]
localization-manglemap-sr-latin-1.patch

This proposed patch for pyanaconda/localization.py:
1. Adds 'Serbian (Latin)' to the languages menu.
2. Writes 'sr_RS.UTF-8@latin' as the locale.
3. Includes the mangleMap changes from Jens.

While working on this, I found that LocaleInfo.__repr__() may return invalid locales. The function appends the encoding, but that format is rejected by the locale command:

$ LANG=sr_RS locale # error messages
$ LANG=sr_RS.UTF-8@latin locale # no error messages

The latter is also returned by locale.normalize(): Comment 15 (Thanks, Jens).

Patch is against:
http://git.fedorahosted.org/cgit/anaconda.git/commit/pyanaconda/localization.py?id=a5ca147a7738ffc7257bd857ea99eac9ea7642a6

Tested by installing with:
1. anaconda-0:18.21-1.fc18.x86_64
2. patched pyanaconda/localization.py
3. Fedora-18-Beta-TC6-x86_64-Live-Desktop.iso

Command-line:
$ qemu-kvm -m 4096 -hda f18-test-2.img -cdrom ~/xfr/fedora/F18/F18-Beta/TC6/Fedora-18-Beta-TC6-x86_64-Live-Desktop.iso -usb -vga qxl -boot menu=on -usbdevice mouse

Comment 32 Vratislav Podzimek 2012-11-02 13:53:24 UTC
Created attachment 637082 [details]
Patch for LocaleInfo.__repr__

What about using this patch? If I understand it correctly we would only need add mangleMap changes and put "sr_RS@latin" instead of "sr_Latn" to the mangleMap on top of it. No mangleReprMap needed in such case.

In [8]: loc = babel.Locale.parse("sr_RS.UTF-8@latin")

In [9]: loc.language
Out[9]: 'sr'

In [10]: loc.territory
Out[10]: 'RS'

In [11]: loc.english_name
Out[11]: u'Serbian (Serbia)'

In [12]: print loc.display_name
Српски (Србија)

In [13]: print loc.script
None
^^^^^^^ bug in babel?

Comment 33 Steve Tyler 2012-11-02 14:15:35 UTC
Babel explicitly ignores modifiers of all types:
Bug 871464 - [sr@latin] script not parsed as 'Latn'

IMO, Babel should support both parsing and formatting of locale strings, but it cannot, because it throws away modifiers, so 'sr@latin' and 'sr' produce the same Locale object. Only with 'sr_Latn' does the 'script' attribute have a value:

>>> import babel
>>> babel.Locale.parse('sr@latin').__dict__
{'_Locale__data': None, 'territory': None, 'variant': None, 'language': 'sr', 'script': None}
>>> babel.Locale.parse('sr').__dict__
{'_Locale__data': None, 'territory': None, 'variant': None, 'language': 'sr', 'script': None}
>>> babel.Locale.parse('sr_Latn').__dict__
{'_Locale__data': None, 'territory': None, 'variant': None, 'language': 'sr', 'script': 'Latn'}

Comment 34 Steve Tyler 2012-11-02 14:26:38 UTC
(In reply to comment #33)
> Babel explicitly ignores modifiers of all types:
> Bug 871464 - [sr@latin] script not parsed as 'Latn'
...

'sr_RS.UTF-8@latin' and 'sr_RS.UTF-8' produce the same Locale object. Babel also throws away the encoding.

>>> babel.Locale.parse('sr_RS.UTF-8@latin').__dict__
{'_Locale__data': None, 'territory': 'RS', 'variant': None, 'language': 'sr', 'script': None}
>>> babel.Locale.parse('sr_RS.UTF-8').__dict__
{'_Locale__data': None, 'territory': 'RS', 'variant': None, 'language': 'sr', 'script': None}

Comment 35 Steve Tyler 2012-11-02 17:37:47 UTC
I applied your patch to localization_18_23_1.py:

$ ./localization-test-3.py localization_18_23_1_EXP_1.py
D: summary:     pyanaconda/localization.py locale validation test

D: module:      localization_18_23_1_EXP_1
D: timestamp:   2012-11-02 17:29:38 UTC

D: n:                   count
D: short_name:          from LocaleInfo object
D: locale.normalize():  value of normalize(short_name) from locale module
D: v1:                  short_name is valid if 1, invalid if 0
D: v2:                  normalize(short_name) is valid if 1, invalid if 0
D: english_name:        from LocaleInfo object

D: note: A valid locale is defined as having a language and a territory.

L:   n: short_name           locale.normalize()    v1  v2    english_name                  
Traceback (most recent call last):
  File "./localization-test-3.py", line 65, in <module>
    for loc_1 in ll.get_available_translations(domain='anaconda', localedir='/usr/share/locale'):
  File "/home/stephent/src/exp/anaconda-languages/localization_18_23_1_EXP_1.py", line 190, in get_available_translations
    localedata.script(script)
TypeError: 'NoneType' object is not callable

==
$ less -N localization_18_23_1_EXP_1.py
...
    187         # BUG: babel.Locale.parse does not parse @script
    188         script = _get_locale_script(langcode)
    189         if script:
    190             localedata.script(script)
    191 
    192         yield localedata
...

Comment 36 Vratislav Podzimek 2012-11-02 18:51:10 UTC
Yeah, I've also encountered that traceback during testing.

Comment 37 Vratislav Podzimek 2012-11-02 19:22:10 UTC
Created attachment 637222 [details]
PatchV2 for the LocaleInfo.__repr__

Could you please try it with this one? There were more problems in the previous version of the patch.

Comment 38 Steve Tyler 2012-11-02 20:11:07 UTC
Created attachment 637237 [details]
localization-test-report-18_23_1_EXP_2.txt

The locale is 'sr.UTF-8@latin', not 'sr_RS.UTF-8@latin'
The english_name is 'Serbian', not 'Serbian (Latin)'.

==
D: summary:     pyanaconda/localization.py locale validation test

D: module:      localization_18_23_1_EXP_2
D: timestamp:   2012-11-02 20:01:18 UTC
...
L:  61: sq_AL.UTF-8          sq_AL.UTF-8            1   1    'Albanian (Albania)'          
L:  62: sr.UTF-8@latin       sr_RS.utf_8_latin      0   1    'Serbian'                     
L:  63: sr_RS.UTF-8          sr_RS.UTF-8            1   1    'Serbian (Serbia)'            
L:  64: sv_SE.UTF-8          sv_SE.UTF-8            1   1    'Swedish (Sweden)'            
...

==
$ LANG='sr.UTF-8@latin' locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=sr.UTF-8@latin
LC_CTYPE="sr.UTF-8@latin"
LC_NUMERIC="sr.UTF-8@latin"
LC_TIME="sr.UTF-8@latin"
LC_COLLATE="sr.UTF-8@latin"
LC_MONETARY="sr.UTF-8@latin"
LC_MESSAGES="sr.UTF-8@latin"
LC_PAPER="sr.UTF-8@latin"
LC_NAME="sr.UTF-8@latin"
LC_ADDRESS="sr.UTF-8@latin"
LC_TELEPHONE="sr.UTF-8@latin"
LC_MEASUREMENT="sr.UTF-8@latin"
LC_IDENTIFICATION="sr.UTF-8@latin"
LC_ALL=

Comment 39 Vratislav Podzimek 2012-11-02 22:15:58 UTC
(In reply to comment #38)
> Created attachment 637237 [details]
> localization-test-report-18_23_1_EXP_2.txt
> 
> The locale is 'sr.UTF-8@latin', not 'sr_RS.UTF-8@latin'
The mangleMap update is needed. And the same goes for:
> $ LANG='sr.UTF-8@latin' locale

> The english_name is 'Serbian', not 'Serbian (Latin)'.
This really doesn't have a reasonable (without additional mapping) solution. But it really should be resolved in babel and not in Anaconda. Also "Serbian (Latin)" doesn't fit in the rest of the list with the "language (territory)" pattern.

I'm posting your patch to the anaconda-patches. If it gets approved for F18, I will push it. Otherwise I'd like to push my patch (with mangleMap update) for F18 and hope that it will fulfill the requirements of the NTH flag. Having "Serbian (Serbia)" twice is not such a big deal.

One thing that I don't understand -- I thought that one could see which one is latin by trying to select one or the other, but both appear to have the same translations (both using cyrilic not latin).

Comment 40 Fedora Update System 2012-11-03 01:03:02 UTC
anaconda-18.24-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/anaconda-18.24-1.fc18

Comment 41 Steve Tyler 2012-11-03 04:43:30 UTC
Bug 872786 - [sr@latin] 'Serbian (Latin)' not listed in languages menu; 'Serbian (Serbia)' listed twice

Since valid locales are configured by both 'Serbian (Serbia)' menu entries, I opened a new bug.

Thanks for getting the patches in.

Comment 42 Steve Tyler 2012-11-03 05:11:52 UTC
Created attachment 637385 [details]
localization-test-report-18_24_1.txt  showing 75 out of 75 valid locales

This locale validation test report for anaconda-18.24-1 shows 75 out of 75 valid locales. There are two remaining issues that do not directly pertain to this bug:

1. 'Serbian (Serbia)' is listed twice with different locales (Bug 872786).
2. 'Basque (Spain)' is listed twice with the same locale.
   (The installer languages menu lists 'Basque (Spain)' once, however.)

Comment 43 Fedora Update System 2012-11-03 19:27:38 UTC
Package anaconda-18.24-1.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing anaconda-18.24-1.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-17543/anaconda-18.24-1.fc18
then log in and leave karma (feedback).

Comment 44 Fedora Update System 2012-11-06 01:38:15 UTC
anaconda-18.25-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/anaconda-18.25-1.fc18

Comment 45 Fedora Update System 2012-11-07 02:10:16 UTC
anaconda-18.26-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/anaconda-18.26-1.fc18

Comment 46 Fedora Update System 2012-11-08 03:24:30 UTC
anaconda-18.27-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/anaconda-18.27-1.fc18

Comment 47 Adam Williamson 2012-11-08 09:31:35 UTC
18.26 went stable. Closing. (Bodhi closing of bugs when updates go stable is currently broken).


Note You need to log in before you can comment on or make changes to this bug.