Bug 1042896 - incorrect sorting of cities localized in Kanji using locale 'Japanese'
Summary: incorrect sorting of cities localized in Kanji using locale 'Japanese'
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 29
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: glibc team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: F19L10nTestday
Depends On: 951321 958376
Blocks: 1042876
TreeView+ depends on / blocked
 
Reported: 2013-12-13 15:11 UTC by Patsy Griffin
Modified: 2019-09-10 13:52 UTC (History)
9 users (show)

Fixed In Version:
Clone Of: 951321
Environment:
Last Closed: 2019-09-10 13:52:11 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Screenshot submitted by Noriko Mizumoto (893.64 KB, image/png)
2013-12-13 15:11 UTC, Patsy Griffin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Sourceware 22669 0 None None None 2019-09-10 13:52:11 UTC

Description Patsy Griffin 2013-12-13 15:11:55 UTC
Created attachment 836354 [details]
Screenshot submitted by Noriko Mizumoto

+++ This bug was initially created as a clone of Bug #951321 +++

I cloned this bug from a tzdata bug in order to preserve the discussions/explanations.   -Patsy

Description of problem:
During F19 L10n QA Test Day event, found that in Timezone tab, many cities are placed under Region without country names while there are America and Australia. It does not look good to see that country names are omitted.

Version-Release number of selected component (if applicable):
system-config-date-1.10.5-2.fc19.noarch

How reproducible:
always

Steps to Reproduce:
1. Run system-config-date
2.
3.
  
Actual results:
Beijin, Tokyo, Seoul, etc are placed under 'Asia' with no country name.
Monaco, Paris, Madrid, London etc are placed under 'Europe' with no country name.

Expected results:
Country names should be appeared, such as Japan, China, Korean, France, Germany, Spain, etc.

Additional info:

--- Additional comment from Nils Philippsen on 2013-04-12 05:28:16 EDT ---

System-config-date orders cities exactly as listed in /usr/share/zoneinfo/zone.tab and "America" and "Australia" are used as continent names here. This file comes from the tzdata package and any changes would need to be acceptable upstream... I suspect that a proposed change to introduce a "country level" wouldn't be accepted because there are disputes about e.g. to which country some regions belong and so forth, but I'll change the component to tzdata regardless, perhaps its maintainer can provide some more details.

--- Additional comment from Petr Machata on 2013-04-12 07:01:18 EDT ---

The logic behind this is that cities are more stable than country borders.  The border disputes that Nils mentions are also a good reason.  Where there are individual countries mentioned, they are often for backward compatibility with ancient systems.  Sometimes there's grouping where that's practical, e.g. we have a bunch of zones under America/Indiana/ and America/Argentina/.  On the other hand we don't have America/Mexico/ even though there's a bunch of Mexican zones.  It's really quite arbitrary.

The reason this can be arbitrary is that individual users shouldn't need to care about the zone naming at all.  Those are just internal identifiers, much like names of ABI symbols in a library.  We might just as well name those things with UUID strings and it would be perfectly OK.  What users should care about is what's in zone.tab, which is what system-config-date should present to the user.

--- Additional comment from Nils Philippsen on 2013-05-16 04:41:41 EDT ---

(Sorry for the delay, was on vacation.)

(In reply to comment #2)
> what users should care about is what's in zone.tab, which is what
> system-config-date should present to the user.

Which is what it does, the arbitrary identifiers are in zone.tab (as well) ;-).

--- Additional comment from Noriko Mizumoto on 2013-06-04 01:24:26 EDT ---

Ok, I can understand the border disputes for some regions.
Let me report just one thing I noticed for Japanese locale user.
Using locale 'Japanese', Cities localized in Kanji always come at the bottom of the list, while most of cities localized in Katakana are appeared in 50-on order as expected. This may be Japanese specific issue though.

The following are cities localized in Kanji and appeared at the bottom of each continent.
Asia - Hong Kong, Chungking, Shanghai, Taipei, Tokyo and Pyongyang (see attached screenshot)
America - Southern Prince'
Antarctica - Showa station and south pole

--- Additional comment from Petr Machata on 2013-06-04 06:44:12 EDT ---

I'm sorry, but I don't seem to follow.  Are you saying that ordering of Katakana words should be different with respect to Kanji words (I.e. they should appear at the top of the list, or should be intermixed according to pronunciation, etc.)?  Or that those locations have an existing Kanji rendering, and shouldn't be transcribed using Katakana?

--- Additional comment from Nils Philippsen on 2013-06-04 08:31:08 EDT ---

(In reply to Noriko Mizumoto from comment #4)
> Created attachment 756617 [details]
> cities localized in Kanji are appeared at the bottom of the list
> 
> Ok, I can understand the border disputes for some regions.
> Let me report just one thing I noticed for Japanese locale user.
> Using locale 'Japanese', Cities localized in Kanji always come at the bottom
> of the list, while most of cities localized in Katakana are appeared in
> 50-on order as expected. This may be Japanese specific issue though.

The sorting (within geographic time zones) is done by the translated name of the time zone, and depends on the collation rules of the locale set -- apparently the rules for Japanese are to sort Kanji after Katakana.

--- Additional comment from Mike FABIAN on 2013-06-04 17:38:11 EDT ---

How would you sort words written in Kanji?  If you want to sort them
phonetically, it would be necessary to know the pronunciation of the
city names written in Kanji.

But Japanese pronunciation is not very regular. For example, to sort a
phonebook with person names phonetically, it is necessary to add the
readings as well.

Is 河野 pronounced かわの or こうの or こおの? How should the sort algorithm
know that if the correct reading is not given as well?

In my Android phone, the Japanese names are only sorted correctly if
I enter not only the Kanji but also the readings in Hiragana. Without
that it is not really possible to sort Japanese well, I think.

--- Additional comment from Mike FABIAN on 2013-06-04 18:00:50 EDT ---

(In reply to Noriko Mizumoto from comment #4)
> Created attachment 756617 [details]
> cities localized in Kanji are appeared at the bottom of the list
> 
> Ok, I can understand the border disputes for some regions.
> Let me report just one thing I noticed for Japanese locale user.
> Using locale 'Japanese', Cities localized in Kanji always come at the bottom
> of the list, while most of cities localized in Katakana are appeared in
> 50-on order as expected. This may be Japanese specific issue though.
> 
> The following are cities localized in Kanji and appeared at the bottom of
> each continent.
> Asia - Hong Kong, Chungking, Shanghai, Taipei, Tokyo and Pyongyang (see
> attached screenshot)
> America - Southern Prince'
> Antarctica - Showa station and south pole

$ echo -e "上海\nラングーン\n香港\nリヤド\n重慶\n平壌\n台北\n東京\nヴィエンチャン\n" | LC_ALL=ja_JP.UTF-8 sort 

ラングーン
リヤド
ヴィエンチャン
香港
重慶
上海
台北
東京
平壌
mfabian@ari:~

So that is just the way glibc sorts this in ja_JP.UTF-8 locale.
Of course this is not nice for the names written in Kanji, but
how to do this better without knowing the readings?

But glibc doesn’t even sort kana only nicely:

mfabian@ari:~
$ echo -e "ウ\nう\nた\nカ\nラ\nら\nワ\nわ\nヲ\nを\nヴ\nゔ\nヤ\nや\nか\nモ\nも\nヒ\nひ\nナ\\nな\nア\nあ\nエ\nえ\nイ\nい\nサ\nさ\n" | LC_ALL=ja_JP.UTF-8 sort 

ゔ
あ
い
う
え
か
さ
た
な
ひ
も
や
ら
わ
を
ア
イ
ウ
エ
カ
サ
ナ
ヒ
モ
ヤ
ラ
ワ
ヲ
ヴ
mfabian@ari:~
$

All the hiragana are above all the katakana, this is not very nice.
And in hiragana, ゔ is at the top above あ, in katakana ヴ is at the
end after ヲ. So not even the kana are sorted nicely.

Correct order would be:

1 	あ
2 	ア
3 	い
4 	イ
5 	う
6 	ウ
7 	ゔ
8 	ヴ
9 	え
10 	エ
11 	か
12 	カ
13 	さ
14 	サ
15 	た
16 	な
17 	ナ
18 	ひ
19 	ヒ
20 	も
21 	モ
22 	や
23 	ヤ
24 	ら
25 	ラ
26 	わ
27 	ワ
28 	を
29 	ヲ

(Sorted  with libicu using http://minaret.info/test/sort.msp)

So glibc does not even sort the kana correctly.

libicu does a better job sorting Japanese, but to sort words in Kanji
correctly, I think one needs to know the correct readings.

--- Additional comment from Mike FABIAN on 2013-06-04 18:07:16 EDT ---

You can also try this
to see how libicu sorts Japanese:

http://demo.icu-project.org/icu-bin/locexp?_=ja&d_=en&x=col

--- Additional comment from Mike FABIAN on 2013-06-04 18:12:05 EDT ---



--- Additional comment from Nils Philippsen on 2013-06-05 08:58:01 EDT ---

The strings in the list are eventually sorted using g_utf8_collate() from glib (I don't use a custom sorting function), which in turn uses wcscoll() from glibc on current Linux systems (i.e. here). I'm not sure if there is a viable way to fix collation in glibc for even just for kana. An alternative way would be to use libicu for collating if it is present (i.e. try to dlopen() it), but that would result in different sorting depending on if libicu is installed.

--- Additional comment from Fedora Admin XMLRPC Client on 2013-09-04 21:48:49 EDT ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from Fedora Admin XMLRPC Client on 2013-09-04 21:49:53 EDT ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from Noriko Mizumoto on 2013-12-11 01:38:06 EST ---

Ping,
Is any progress on this problem?

--- Additional comment from Nils Philippsen on 2013-12-12 09:57:35 EST ---

Which issue do you mean exactly?

Talking about the original issue (no country names), this is not a bug as I've explained in comment #1. This is basically just waiting for confirmation from the tzdata maintainer to be closed IMO. Patsy, welcome aboard BTW and what's your take?

If you refer to the Kanji vs. Katakana sorting issue from comment #4 ff., this is a different bug that should be opened (or cloned from this one for reference) against glibc (I guess).

--- Additional comment from Mike FABIAN on 2013-12-12 10:27:34 EST ---

(In reply to Nils Philippsen from comment #15)
> Which issue do you mean exactly?
[...]
> If you refer to the Kanji vs. Katakana sorting issue from comment #4 ff.,
> this is a different bug that should be opened (or cloned from this one for
> reference) against glibc (I guess).

Yes, a different bug against glibc for that problem would be nice.

--- Additional comment from Patsy Franklin on 2013-12-13 09:35:07 EST ---

Thanks for the input folks - and for the "welcome"!

I'm going to close this and open a glibc bug based on comment 4.

-Patsy

Comment 2 Fedora End Of Life 2015-01-09 20:51:07 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 3 Jaroslav Reznik 2015-03-03 15:19:40 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22

Comment 4 will bradshaw 2015-04-27 14:22:01 UTC
We could define Hiragana and Katakana as capitol/lowercase to each other. This would at least partially solve the sorting issue even if it is not technically correct from a linguistic perspective.

Although before any change like this is made it would be advisable to check with a native Japanese speaker.

Comment 5 Fedora End Of Life 2016-07-19 10:47:09 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 6 Jan Kurik 2016-07-26 04:08:36 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 25 development cycle.
Changing version to '25'.

Comment 7 Fedora End Of Life 2017-02-28 09:36:06 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Comment 8 Fedora End Of Life 2018-05-03 08:55:17 UTC
This message is a reminder that Fedora 26 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 26. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '26'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 26 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 9 Florian Weimer 2018-05-03 10:56:07 UTC
I'm surprised the CLDR collation update (bug 1551009) did not fix that.  Mike?

Comment 10 Mike FABIAN 2018-05-08 09:29:16 UTC
(In reply to Florian Weimer from comment #9)
> I'm surprised the CLDR collation update (bug 1551009) did not fix that. 
> Mike?

I have not yet touched the Japanese locale, the ja_JP locale does not
base its collation on the iso file.

I’ll try to work on that soon. 

What was  originally requested in this bug (sorting names of cities
written in Kanji correctly) is impossible though, one cannot sort them without knowing their pronunciation, which is irregular.

But at least the kana should sort correctly.

There is also an upstream bug requesting an update of the Japanese locale, 
unfortunately again not basing the collation on the iso file.

https://sourceware.org/bugzilla/show_bug.cgi?id=22669

Comment 11 Jan Kurik 2018-08-14 10:24:39 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 29 development cycle.
Changing version to '29'.

Comment 12 Carlos O'Donell 2019-09-10 13:52:11 UTC
The original request to sort the cities is not possible.

However we should sort the kana correctly.

The kana sorting should be fixed upstream.

We are closing this bug and tracking it upstream via this bug: https://sourceware.org/bugzilla/show_bug.cgi?id=22669


Note You need to log in before you can comment on or make changes to this bug.