951321 – No country names for Asia and Europe

Bug 951321 - No country names for Asia and Europe

Summary: No country names for Asia and Europe

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	tzdata
Sub Component:
Version:	19
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Patsy Griffin
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:	F19L10nTestday
Depends On:	958376
Blocks:	1042876 1042896
TreeView+	depends on / blocked

Reported:	2013-04-12 04:59 UTC by Noriko Mizumoto
Modified:	2013-12-13 16:49 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Clones:	1042876 1042896 (view as bug list)
Environment:
Last Closed:	2013-12-13 14:35:07 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
cities localized in Kanji are appeared at the bottom of the list (893.64 KB, image/png) 2013-06-04 05:24 UTC, Noriko Mizumoto	no flags	Details
icu-japanese-sorting.png (162.31 KB, image/png) 2013-06-04 22:12 UTC, Mike FABIAN	no flags	Details
View All

Description Noriko Mizumoto 2013-04-12 04:59:15 UTC

Description of problem:
During F19 L10n QA Test Day event, found that in Timezone tab, many cities are placed under Region without country names while there are America and Australia. It does not look good to see that country names are omitted.

Version-Release number of selected component (if applicable):
system-config-date-1.10.5-2.fc19.noarch

How reproducible:
always

Steps to Reproduce:
1. Run system-config-date
2.
3.
  
Actual results:
Beijin, Tokyo, Seoul, etc are placed under 'Asia' with no country name.
Monaco, Paris, Madrid, London etc are placed under 'Europe' with no country name.

Expected results:
Country names should be appeared, such as Japan, China, Korean, France, Germany, Spain, etc.

Additional info:

Comment 1 Nils Philippsen 2013-04-12 09:28:16 UTC

System-config-date orders cities exactly as listed in /usr/share/zoneinfo/zone.tab and "America" and "Australia" are used as continent names here. This file comes from the tzdata package and any changes would need to be acceptable upstream... I suspect that a proposed change to introduce a "country level" wouldn't be accepted because there are disputes about e.g. to which country some regions belong and so forth, but I'll change the component to tzdata regardless, perhaps its maintainer can provide some more details.

Comment 2 Petr Machata 2013-04-12 11:01:18 UTC

The logic behind this is that cities are more stable than country borders.  The border disputes that Nils mentions are also a good reason.  Where there are individual countries mentioned, they are often for backward compatibility with ancient systems.  Sometimes there's grouping where that's practical, e.g. we have a bunch of zones under America/Indiana/ and America/Argentina/.  On the other hand we don't have America/Mexico/ even though there's a bunch of Mexican zones.  It's really quite arbitrary.

The reason this can be arbitrary is that individual users shouldn't need to care about the zone naming at all.  Those are just internal identifiers, much like names of ABI symbols in a library.  We might just as well name those things with UUID strings and it would be perfectly OK.  What users should care about is what's in zone.tab, which is what system-config-date should present to the user.

Comment 3 Nils Philippsen 2013-05-16 08:41:41 UTC

(Sorry for the delay, was on vacation.)

(In reply to comment #2)
> what users should care about is what's in zone.tab, which is what
> system-config-date should present to the user.

Which is what it does, the arbitrary identifiers are in zone.tab (as well) ;-).

Comment 4 Noriko Mizumoto 2013-06-04 05:24:26 UTC

Created attachment 756617 [details]
cities localized in Kanji are appeared at the bottom of the list

Ok, I can understand the border disputes for some regions.
Let me report just one thing I noticed for Japanese locale user.
Using locale 'Japanese', Cities localized in Kanji always come at the bottom of the list, while most of cities localized in Katakana are appeared in 50-on order as expected. This may be Japanese specific issue though.

The following are cities localized in Kanji and appeared at the bottom of each continent.
Asia - Hong Kong, Chungking, Shanghai, Taipei, Tokyo and Pyongyang (see attached screenshot)
America - Southern Prince'
Antarctica - Showa station and south pole

Comment 5 Petr Machata 2013-06-04 10:44:12 UTC

I'm sorry, but I don't seem to follow.  Are you saying that ordering of Katakana words should be different with respect to Kanji words (I.e. they should appear at the top of the list, or should be intermixed according to pronunciation, etc.)?  Or that those locations have an existing Kanji rendering, and shouldn't be transcribed using Katakana?

Comment 6 Nils Philippsen 2013-06-04 12:31:08 UTC

(In reply to Noriko Mizumoto from comment #4)
> Created attachment 756617 [details]
> cities localized in Kanji are appeared at the bottom of the list
> 
> Ok, I can understand the border disputes for some regions.
> Let me report just one thing I noticed for Japanese locale user.
> Using locale 'Japanese', Cities localized in Kanji always come at the bottom
> of the list, while most of cities localized in Katakana are appeared in
> 50-on order as expected. This may be Japanese specific issue though.

The sorting (within geographic time zones) is done by the translated name of the time zone, and depends on the collation rules of the locale set -- apparently the rules for Japanese are to sort Kanji after Katakana.

Comment 7 Mike FABIAN 2013-06-04 21:38:11 UTC

How would you sort words written in Kanji?  If you want to sort them
phonetically, it would be necessary to know the pronunciation of the
city names written in Kanji.

But Japanese pronunciation is not very regular. For example, to sort a
phonebook with person names phonetically, it is necessary to add the
readings as well.

Is 河野 pronounced かわの or こうの or こおの? How should the sort algorithm
know that if the correct reading is not given as well?

In my Android phone, the Japanese names are only sorted correctly if
I enter not only the Kanji but also the readings in Hiragana. Without
that it is not really possible to sort Japanese well, I think.

Comment 8 Mike FABIAN 2013-06-04 22:00:50 UTC

(In reply to Noriko Mizumoto from comment #4)
> Created attachment 756617 [details]
> cities localized in Kanji are appeared at the bottom of the list
> 
> Ok, I can understand the border disputes for some regions.
> Let me report just one thing I noticed for Japanese locale user.
> Using locale 'Japanese', Cities localized in Kanji always come at the bottom
> of the list, while most of cities localized in Katakana are appeared in
> 50-on order as expected. This may be Japanese specific issue though.
> 
> The following are cities localized in Kanji and appeared at the bottom of
> each continent.
> Asia - Hong Kong, Chungking, Shanghai, Taipei, Tokyo and Pyongyang (see
> attached screenshot)
> America - Southern Prince'
> Antarctica - Showa station and south pole

$ echo -e "上海\nラングーン\n香港\nリヤド\n重慶\n平壌\n台北\n東京\nヴィエンチャン\n" | LC_ALL=ja_JP.UTF-8 sort 

ラングーン
リヤド
ヴィエンチャン
香港
重慶
上海
台北
東京
平壌
mfabian@ari:~

So that is just the way glibc sorts this in ja_JP.UTF-8 locale.
Of course this is not nice for the names written in Kanji, but
how to do this better without knowing the readings?

But glibc doesn’t even sort kana only nicely:

mfabian@ari:~
$ echo -e "ウ\nう\nた\nカ\nラ\nら\nワ\nわ\nヲ\nを\nヴ\nゔ\nヤ\nや\nか\nモ\nも\nヒ\nひ\nナ\\nな\nア\nあ\nエ\nえ\nイ\nい\nサ\nさ\n" | LC_ALL=ja_JP.UTF-8 sort 

ゔ
あ
い
う
え
か
さ
た
な
ひ
も
や
ら
わ
を
ア
イ
ウ
エ
カ
サ
ナ
ヒ
モ
ヤ
ラ
ワ
ヲ
ヴ
mfabian@ari:~
$

All the hiragana are above all the katakana, this is not very nice.
And in hiragana, ゔ is at the top above あ, in katakana ヴ is at the
end after ヲ. So not even the kana are sorted nicely.

Correct order would be:

1 	あ
2 	ア
3 	い
4 	イ
5 	う
6 	ウ
7 	ゔ
8 	ヴ
9 	え
10 	エ
11 	か
12 	カ
13 	さ
14 	サ
15 	た
16 	な
17 	ナ
18 	ひ
19 	ヒ
20 	も
21 	モ
22 	や
23 	ヤ
24 	ら
25 	ラ
26 	わ
27 	ワ
28 	を
29 	ヲ

(Sorted  with libicu using http://minaret.info/test/sort.msp)

So glibc does not even sort the kana correctly.

libicu does a better job sorting Japanese, but to sort words in Kanji
correctly, I think one needs to know the correct readings.

Comment 9 Mike FABIAN 2013-06-04 22:07:16 UTC

You can also try this
to see how libicu sorts Japanese:

http://demo.icu-project.org/icu-bin/locexp?_=ja&d_=en&x=col

Comment 10 Mike FABIAN 2013-06-04 22:12:05 UTC

Created attachment 756966 [details]
icu-japanese-sorting.png

Comment 11 Nils Philippsen 2013-06-05 12:58:01 UTC

The strings in the list are eventually sorted using g_utf8_collate() from glib (I don't use a custom sorting function), which in turn uses wcscoll() from glibc on current Linux systems (i.e. here). I'm not sure if there is a viable way to fix collation in glibc for even just for kana. An alternative way would be to use libicu for collating if it is present (i.e. try to dlopen() it), but that would result in different sorting depending on if libicu is installed.

Comment 12 Fedora Admin XMLRPC Client 2013-09-05 01:48:49 UTC

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 13 Fedora Admin XMLRPC Client 2013-09-05 01:49:53 UTC

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 14 Noriko Mizumoto 2013-12-11 06:38:06 UTC

Ping,
Is any progress on this problem?

Comment 15 Nils Philippsen 2013-12-12 14:57:35 UTC

Which issue do you mean exactly?

Talking about the original issue (no country names), this is not a bug as I've explained in comment #1. This is basically just waiting for confirmation from the tzdata maintainer to be closed IMO. Patsy, welcome aboard BTW and what's your take?

If you refer to the Kanji vs. Katakana sorting issue from comment #4 ff., this is a different bug that should be opened (or cloned from this one for reference) against glibc (I guess).

Comment 16 Mike FABIAN 2013-12-12 15:27:34 UTC

(In reply to Nils Philippsen from comment #15)
> Which issue do you mean exactly?
[...]
> If you refer to the Kanji vs. Katakana sorting issue from comment #4 ff.,
> this is a different bug that should be opened (or cloned from this one for
> reference) against glibc (I guess).

Yes, a different bug against glibc for that problem would be nice.

Comment 17 Patsy Griffin 2013-12-13 14:35:07 UTC

Thanks for the input folks - and for the "welcome"!

I'm going to close this and open a glibc bug based on comment 4.

-Patsy

Comment 18 Patsy Griffin 2013-12-13 16:49:36 UTC

Here's a link to the glibc BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1042896

Note You need to log in before you can comment on or make changes to this bug.