Bug 858801

Summary: [zh_TW] Please add Chinese (Taiwan) language to language selection menu in anaconda
Product: [Fedora] Fedora Reporter: Cheng-Chia Tseng <pswo10680>
Component: anacondaAssignee: Chris Lumens <clumens>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 18CC: abelcheung, awilliam, ccheng, damage3025, fschwarz, g.kaviyarasu, guojunyu, i18n-bugs, jeff, jonathan, nphilipp, piotrdrag, robatino, stephent98, tagoh, vanmeeuwen+fedora
Target Milestone: ---Keywords: i18n, Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: AcceptedNTH
Fixed In Version: anaconda-18.22-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-11-08 04:20:10 EST Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On: 872791    
Bug Blocks: 752661, 752664    
Attachments:
Description Flags
proposed patch
none
test case
none
python session comparing language list alternatives
none
patch using os.listdir() to get a list of all translations in /usr/share/locale/ none

Description Cheng-Chia Tseng 2012-09-19 13:31:41 EDT
Description of problem:
I am testing F18 Alpha. After logging into the system, I tried to install F18 and found out that Chinese (Taiwan) language which is actively translated on Transifex.net is missing.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Boot F18 Alpha
2. Log in
3. Try to install F18
  
Actual results:
You see a language menu without showing Chinese (Taiwan).

Expected results:
You should see a language menu showing "中文 (臺灣)" and "Chinese (Taiwan)".

Additional info:
Comment 1 Chris Lumens 2012-09-19 16:25:33 EDT
"zh_TW" does not appear to be in the output of:

langdict = babel.Locale('en', 'US').languages
sorted(langdict.keys())
Comment 2 Felix Schwarz 2012-09-19 17:51:02 EDT
'zh_TW' does not show up in '.languages' because CLDR does not provide information about that locale. More specifically 'main/en.xml' from CLDR 1.6 does not include it in the list of languages.

I (also as upstream maintainer) don't know any short-term solution for that short of some hackery to fake the Babel locale.

There is a related feature ticket in Babel's issue tracker (http://babel.edgewall.org/ticket/258) but it's not on the top of my to-do list.

Please keep me in the loop if you plan on hacking Babel. For any upstream-related work please open a ticket at babel.edgewall.org, I'll respond there.
Comment 3 Cheng-Chia Tseng 2012-09-20 12:00:30 EDT
CLDR seems to use another naming system with locales. I just checked CLDR, they have zh, zh_Hans, zh_Hant, zh_Hans_CN, zh_Hant_TW, etc.

zh_TW should be either zh_Hant_TW or zh_Hant while zh_CN will be zh_Hans, zh_Hans_CN or zh.

Will it be the cause of not appearing to be in the output?
Comment 4 Felix Schwarz 2012-09-20 15:05:23 EDT
(In reply to comment #3)
> zh_TW should be either zh_Hant_TW or zh_Hant while zh_CN will be zh_Hans,
> zh_Hans_CN or zh.
> 
> Will it be the cause of not appearing to be in the output?

Honestly I have to read up on the source code myself - so far I did not work much in Babel's CLDR component.

Babel basically knows about 'zh_TW' (I have to revert my statement in comment 2 in that respect). 
  >>> from babel import localedata
  >>> localedata.list()
  … 'zh', 'zh_CN', 'zh_HK', 'zh_Hans', 'zh_Hans_CN', 'zh_Hans_HK', 'zh_Hans_MO', 'zh_Hans_SG', 'zh_Hant', 'zh_Hant_HK', 'zh_Hant_MO', 'zh_Hant_TW', 'zh_MO', 'zh_SG', 'zh_TW', 'zu', 'zu_ZA']
  >>>
  >>> l = Locale('zh', 'TW')
  >>> l.english_name
  u'Chinese (Taiwan)'

So if you need to list all of Babel's available locales, you should rely on localedata.list(). I think all known locales also have a good english name though CLDR is not always complete so I can't guarantee that we have a display name for a locale in all other languages.
Comment 5 Chris Lumens 2012-09-27 10:11:07 EDT
*** Bug 860904 has been marked as a duplicate of this bug. ***
Comment 6 Akira TAGOH 2012-10-24 05:38:03 EDT
How come this bug assigned to babel? according to comment#4, babel has the locale data for Traditional Chinese and the reason why it doesn't appear on anaconda is apparently anaconda's fault. as it pointed out there, replacing the code where mentioned at comment#1 looks better to me.

Please consider to fix this in anaconda again.
Comment 7 Akira TAGOH 2012-10-24 05:38:48 EDT
Created attachment 632670 [details]
proposed patch
Comment 8 Akira TAGOH 2012-10-24 05:40:58 EDT
Created attachment 632671 [details]
test case

after the patch applied, we can see two new languages in the list:

--- foo 2012-10-24 18:39:30.094685391 +0900
+++ bar 2012-10-24 18:39:30.135683545 +0900
@@ -5,6 +5,7 @@
 be.UTF-8, Belarusian
 bg_BG.UTF-8, Bulgarian (Bulgaria)
 bn_BD.UTF-8, Bengali (Bangladesh)
+bn_IN.UTF-8, Bengali (India)
 bs.UTF-8, Bosnian
 ca_ES.UTF-8, Catalan (Spain)
 cs_CZ.UTF-8, Czech (Czech Republic)
@@ -71,4 +72,5 @@
 ur_PK.UTF-8, Urdu (Pakistan)
 vi_VN.UTF-8, Vietnamese (Vietnam)
 zh_CN.UTF-8, Chinese (China)
+zh_TW.UTF-8, Chinese (Taiwan)
 zu_ZA.UTF-8, Zulu (South Africa)
Comment 9 Cheng-Chia Tseng 2012-10-24 10:28:38 EDT
Thank you for your investigation of this problem! Akira TAGOH.

We, Chinese (Traditional) users all appreciate this! :)
Comment 10 Chris Lumens 2012-10-24 11:14:19 EDT
Thanks for the patch, posting today.
Comment 11 Adam Williamson 2012-10-24 12:54:38 EDT
Discussed at the 2012-10-24 blocker review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-10-24/f18beta-blocker-review-5.2012-10-24-16.01.log.txt . This doesn't violate any criteria directly, it's more a conditional violation of 'able to install' for people who only understand Traditional Chinese, as we understand it. We agreed this is probably not significant enough to block Beta, but it is enough to block Final. The bug is accepted as a Final blocker and Beta NTH, rejected as a Beta blocker.

As a side note, would we expect there to be a Chinese (Hong Kong) option? I was kind of expecting one and was surprised that there isn't one. Should there be a new bug for that?
Comment 12 Adam Williamson 2012-10-24 12:55:27 EDT
Whoops, the above is incorrect, we agreed it was not a Beta blocker but was Beta NTH, but left the Final blocker question for later discussion. Adjusting.
Comment 13 Akira TAGOH 2012-10-24 21:44:04 EDT
(In reply to comment #11)
> As a side note, would we expect there to be a Chinese (Hong Kong) option? I
> was kind of expecting one and was surprised that there isn't one. Should
> there be a new bug for that?

I guess Chris can tell you technical details though, AFAIK current code is capable to do that but we are just missing a translation for zh_HK, i.e. a translation issue. so it would be up to the zh_HK community in Fedora and good to file a separate bug if necessary.
Comment 14 Steve Tyler 2012-10-24 21:55:05 EDT
There is 0% zh_HK translation coverage for anaconda at Transifex:
https://fedora.transifex.com/projects/p/fedora/language/zh_HK/?project=2059
Comment 15 Adam Williamson 2012-10-24 21:55:16 EDT
I'm no Chinese expert, but AIUI, the zh_HK translation would basically be identical to the zh_TW one, as HK and Taiwan both use the traditional form of written Chinese (while mainland China uses the simplified form). However, the zh_HK _locale_ may have different properties from the zh_TW one, in terms of the other locale-specific stuff like how numbers are written and so on, given HK's recent English heritage. I might be wrong, but it seems like something we should check with Someone Who Knows.
Comment 16 Steve Tyler 2012-10-24 22:12:59 EDT
Locale files are text files, so you can look at them with "less". Here is one difference:

$ LANG=zh_HK date
�T 10�� 24 19:11:24 PDT 2012

$ LANG=zh_CN date
2012�� 10�� 24�� ������ 19:11:27 PDT

(You may need some additional fonts -- I'm seeing a lot of "?" marks.)

$ ls -1 /usr/share/i18n/locales/zh*
/usr/share/i18n/locales/zh_CN
/usr/share/i18n/locales/zh_HK
/usr/share/i18n/locales/zh_SG
/usr/share/i18n/locales/zh_TW
Comment 17 Steve Tyler 2012-10-28 08:57:01 EDT
Created attachment 634517 [details]
python session comparing language list alternatives

This attachment shows that babel.localedata.list() returns a list containing elements that are not the names of translations in /usr/share/locale/, e.g. 'az_Latn', 'en_Dsrt_US'. So, calling gettext.find() with this list amounts to calling it with the wrong data type. The fundamental problem, though, is that gettext.find() does not support a way to return a list of _all_ translations for 'anaconda'. If it accepted regular expressions or globs, we could say:

    messagefiles = gettext.find(domain, localedir, '*', all=True)

>>> l1 = sorted(babel.Locale('en', 'US').languages.keys())
>>> len(l1)
506
>>> l2 = sorted(babel.localedata.list())
>>> len(l2)
451
>>> sorted(l1)
...
>>> sorted(l2)
...


https://lists.fedorahosted.org/pipermail/anaconda-patches/2012-October/001766.html
Comment 18 Steve Tyler 2012-10-28 09:35:45 EDT
Created attachment 634531 [details]
patch using os.listdir() to get a list of all translations in /usr/share/locale/

This patch calls os.listdir() to get a list of all translation directories in /usr/share/locale/. When that list is passed to gettext.find(), it returns a complete list of anaconda translations. This patch is not a complete solution, however, because the list contains ['en@boldquot', 'en@quot'], which results in 'en' being returned twice by get_available_translations().

$ find /usr/share/locale/ -name anaconda.mo | wc -l
81

$ find /usr/share/locale/ -name anaconda.mo | grep 'en'
/usr/share/locale/en@quot/LC_MESSAGES/anaconda.mo
/usr/share/locale/en@boldquot/LC_MESSAGES/anaconda.mo
/usr/share/locale/en_GB/LC_MESSAGES/anaconda.mo
Comment 19 Steve Tyler 2012-10-31 00:44:21 EDT
(In reply to comment #17)
...
> This attachment shows that babel.localedata.list() returns a list containing
> elements that are not the names of translations in /usr/share/locale/, e.g.
> 'az_Latn', 'en_Dsrt_US'. So, calling gettext.find() with this list amounts
> to calling it with the wrong data type.
...

babel.localedata.list() does not contain 'sr@latin'.

>>> 'sr@latin' in babel.localedata.list()
False
>>> 'sr' in babel.localedata.list()
True

However, there is an 'sr@latin' anaconda translation:

$ find /usr/share/locale -name anaconda.mo | grep '/sr'
/usr/share/locale/sr/LC_MESSAGES/anaconda.mo
/usr/share/locale/sr@latin/LC_MESSAGES/anaconda.mo
Comment 20 Steve Tyler 2012-10-31 07:58:43 EDT
(In reply to comment #19)
...
> babel.localedata.list() does not contain 'sr@latin'.
> 
> >>> 'sr@latin' in babel.localedata.list()
> False
> >>> 'sr' in babel.localedata.list()
> True
...

Yet, we still get 'sr@latin'. Here's why:

>>> 'sr_YU' in babel.localedata.list()
True
>>> gettext.find('anaconda', '/usr/share/locale', ['sr_YU'], all=True)
['/usr/share/locale/sr@latin/LC_MESSAGES/anaconda.mo', '/usr/share/locale/sr/LC_MESSAGES/anaconda.mo']
>>> gettext._expand_lang('sr_YU')
['sr_RS.UTF-8@latin', 'sr_RS@latin', 'sr.UTF-8@latin', 'sr@latin', 'sr_RS.UTF-8', 'sr_RS', 'sr.UTF-8', 'sr']
Comment 21 guojunyu 2012-10-31 22:49:50 EDT
Don't  flicker  foreigners,they are  all same
zh_HK = zh_SG = zh_TW
Comment 22 Fedora Update System 2012-10-31 22:52:44 EDT
anaconda-18.22-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/anaconda-18.22-1.fc18
Comment 23 Adam Williamson 2012-11-01 00:10:33 EDT
Confirming the Taiwanese option appears in 18.22. Still not sure whether we ought to add options for Hong Kong / Singapore.
Comment 24 Cheng-Chia Tseng 2012-11-01 08:28:35 EDT
guojunyu, you are mistaken. They are all not the same.

Most Hong Kong users use zh_TW because zh_HK locale is not maintained well. Singapre users use zh_CN because the same problem: there is no one actively maintaining zh_SG.

The translation for zh_HK of GNOME is using a script to modify some technical and widely spoken terms which is different from Taiwan. The translators from GNOME could provide the script to help zh_HK. We can do the same thing for zh_HK.

In my opinion, Hong Kong could be added. On the other hand, there is almost no user for zh_SG nowadays, we can just drop Singapore.
Comment 25 Cheng-Chia Tseng 2012-11-01 08:34:12 EDT
I have consulted some Hong Kong users, they think that adding Hong Kong is good and happy for them but there are few people who will help maintain those translations.

If there is no Hong Kong user pick up those translations these days, I will use the script form GNOME Chinese (Traditional) translators to help the conversion form zh_TW to zh_HK.
Comment 26 Fedora Update System 2012-11-01 14:28:44 EDT
Package anaconda-18.22-1.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing anaconda-18.22-1.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-17432/anaconda-18.22-1.fc18
then log in and leave karma (feedback).
Comment 27 Adam Williamson 2012-11-01 17:30:20 EDT
Filed https://bugzilla.redhat.com/show_bug.cgi?id=872282 for zh_HK, I cc'ed you Cheng-Chia.
Comment 28 guojunyu 2012-11-01 21:27:32 EDT
(In reply to comment #24)
> guojunyu, you are mistaken. They are all not the same.
> 
> Most Hong Kong users use zh_TW because zh_HK locale is not maintained well.
> Singapre users use zh_CN because the same problem: there is no one actively
> maintaining zh_SG.
> 
> The translation for zh_HK of GNOME is using a script to modify some
> technical and widely spoken terms which is different from Taiwan. The
> translators from GNOME could provide the script to help zh_HK. We can do the
> same thing for zh_HK.
> 
> In my opinion, Hong Kong could be added. On the other hand, there is almost
> no user for zh_SG nowadays, we can just drop Singapore.

they are  99.99% same,about this problem ,everyone can google it
as you say, I'm in ChangSha City, my ChangSha language has some different with zh_CN, 
Should I add a zh_ChangSha
Comment 29 Adam Williamson 2012-11-01 21:31:30 EDT
the zh_HK locale already exists. your example doesn't. they are different cases.

Doing an F17 test install, I noticed that we used to just list 'Chinese (Traditional)' and 'Chinese (Simplified)' (and the Chinese translations of each), without referring specifically to a country for either. I'm assuming these resulted in zh_TW and zh_CN respectively. Somewhere between F17 and F18 we decided to list the associated country name for them. So that makes it more obvious that SG and HK are missing.
Comment 30 Cheng-Chia Tseng 2012-11-01 22:40:10 EDT
(In reply to comment #28)
 
> they are  99.99% same,about this problem ,everyone can google it
> as you say, I'm in ChangSha City, my ChangSha language has some different
> with zh_CN, 
> Should I add a zh_ChangSha

It is not that high, especially in IT field. Many widely spoken terms and technical terms used in Computer Science by Hong Kong are different from Taiwan.
Plus, zh_HK has been already existed for a long time.

By the way, here is not a forum to discuss the differences of Chinese translations (or other locale things) between HK ,TW and blah blah, but for the bug to add Chinese (Taiwan) in anaconda. I think that we'd better not to talk about this later. It is out of the subject.
Comment 31 Steve Tyler 2012-11-01 23:21:05 EDT
(In reply to comment #29)
> the zh_HK locale already exists. your example doesn't. they are different
> cases.
> 
> Doing an F17 test install, I noticed that we used to just list 'Chinese
> (Traditional)' and 'Chinese (Simplified)' (and the Chinese translations of
> each), without referring specifically to a country for either. I'm assuming
> these resulted in zh_TW and zh_CN respectively. Somewhere between F17 and
> F18 we decided to list the associated country name for them. So that makes
> it more obvious that SG and HK are missing.

anaconda-17.29-1 uses a language table that has these entries:[1]
Chinese(Simplified)     zh_CN   False   zh_CN.UTF-8 us  Asia/Shanghai
Chinese(Traditional)    zh_TW   False   zh_TW.UTF-8 us  Asia/Taipei

With F18, the language table was removed, and that functionality is now provided by python-babel[2], in part. python-babel provides an interface to the Unicode CLDR locale data.[3] With python-babel, applications can retrieve various display names for a language:

>>> import babel
>>> babel.Locale('zh', 'CN').english_name
u'Chinese (China)'
>>> babel.Locale('zh', 'TW').english_name
u'Chinese (Taiwan)'
>>> babel.Locale('zh', 'HK').english_name
u'Chinese (Hong Kong SAR China)'
>>> babel.Locale('zh', 'SG').english_name
u'Chinese (Singapore)'

[1] http://git.fedorahosted.org/cgit/anaconda.git/tree/data/lang-table?id=anaconda-17.29-1
[2] http://babel.edgewall.org/
[3] http://cldr.unicode.org/
    $ rpm -ql python-babel | grep localedata
Comment 32 Fedora Update System 2012-11-02 00:07:00 EDT
anaconda-18.23-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/anaconda-18.23-1.fc18
Comment 33 guojunyu 2012-11-02 01:13:27 EDT
(In reply to comment #30)
> (In reply to comment #28)
>  
> > they are  99.99% same,about this problem ,everyone can google it
> > as you say, I'm in ChangSha City, my ChangSha language has some different
> > with zh_CN, 
> > Should I add a zh_ChangSha
> 
> It is not that high, especially in IT field. Many widely spoken terms and
> technical terms used in Computer Science by Hong Kong are different from
> Taiwan.
> Plus, zh_HK has been already existed for a long time.
> 
> By the way, here is not a forum to discuss the differences of Chinese
> translations (or other locale things) between HK ,TW and blah blah, but for
> the bug to add Chinese (Taiwan) in anaconda. I think that we'd better not to
> talk about this later. It is out of the subject.

Sorry, I  forgot  the point, I  just  think  too  many choice make  people  confusion,
Just like  I  see English、English(USA)、English(Canada)、English(Ireland)。。。。。
How much difference between them ? 10% 、20%、30% ........I don't kwon
maybe in anaconda language choice should have a standard line:above 50% difference 
not depend there is contributor use this language
Comment 34 Steve Tyler 2012-11-02 01:43:01 EDT
For F18, the available install languages are the ones for which there is an anaconda translation[1], support in python-babel[2], and a locale file[3].

[1] $ find /usr/share/locale/ -name anaconda.mo | sort

[2] $ python
...
>>> import babel
>>> sorted(babel.localedata.list())

[3] $ ls /usr/share/i18n/locales/

This table has a fairly accurate summary: Attachment 635315 [details]
Comment 35 Adam Williamson 2012-11-02 02:20:54 EDT
guo: the locales don't relate only to translations. They also affect things like how dates are shown (is today 2012-11-02? 02/11/12? 11/02/12?), how numbers are rendered (in French, they use . and , the other way around from English - so one million is 1.000.000 in French, and zero point two is 0,2), units used for measurements (England and the U.S. like old-fashioned 'Imperial' measurements, like the mile for distance and the pound for weight; Canada uses the km for distance), and a few other things like that.
Comment 36 guojunyu 2012-11-02 03:31:53 EDT
To Adam Williamson :
sorry, I don't realize it, but it can say habit is a part of difference,maybe set up a percentage number?
Comment 37 Steve Tyler 2012-11-02 11:42:51 EDT
(In reply to comment #31)
...
> anaconda-17.29-1 uses a language table that has these entries:[1]
> Chinese(Simplified)     zh_CN   False   zh_CN.UTF-8 us  Asia/Shanghai
> Chinese(Traditional)    zh_TW   False   zh_TW.UTF-8 us  Asia/Taipei
...

Babel supports something similar:

>>> print babel.Locale('zh', 'CN', script='Hans').english_name
Chinese (Simplified Han, China)
>>> print babel.Locale('zh', 'TW', script='Hant').english_name
Chinese (Traditional Han, Taiwan)
>>> 
>>> print babel.Locale('zh', 'CN', script='Hans').display_name
中文 (简体中文, 中国)
>>> print babel.Locale('zh', 'TW', script='Hant').display_name
中文 (繁體中文, 臺灣)
Comment 38 Fedora Update System 2012-11-02 21:06:34 EDT
anaconda-18.24-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/anaconda-18.24-1.fc18
Comment 39 Abel Cheung 2012-11-04 16:03:58 EST
Let me provide some background info here, hope they can aid in decision of zh_* support for each district.

When speaking about translation, the best analogy to zh_* situation is en_*: think like the relation between en_US, en_GB and en_CA (and en_AU?). en_US and en_GB are two extremes, while en_CA falls in between with a little twist of itself.

For chinese, zh_CN and zh_TW are two extremes with zh_HK falling in between; but Hong Kong people feel much more comfortable reading traditional Chinese, so zh_TW is a much better fallback -- historically software used by Hong Kong people were translated by Taiwan people. I expect the situation of fallback preference to slowly change in the following 10-15 years though.

zh_SG is another somewhere between zh_CN and zh_TW; but since people there are mostly using simplified Chinese, zh_CN is a more sensible fallback for zh_SG. But something is different for zh_SG -- personally I have never seen any OSS project with zh_SG translation. Probably they exist but should be very scarce. My suspicion is, since most computer literate users are also proficient in english, they are actually using en_SG.
Comment 40 Fedora Update System 2012-11-05 20:41:39 EST
anaconda-18.25-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/anaconda-18.25-1.fc18
Comment 41 Fedora Update System 2012-11-06 21:13:47 EST
anaconda-18.26-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/anaconda-18.26-1.fc18
Comment 42 Adam Williamson 2012-11-08 04:20:10 EST
18.26 went stable. Closing. (Bodhi closing of bugs when updates go stable is currently broken).