RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1042876 - incorrect sorting of cities localized in Kanji using locale 'Japanese'
Summary: incorrect sorting of cities localized in Kanji using locale 'Japanese'
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: glibc
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Carlos O'Donell
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard: F19L10nTestday
Depends On: 951321 958376 1042896
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-13 14:55 UTC by Patsy Griffin
Modified: 2016-11-24 16:15 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 951321
Environment:
Last Closed: 2013-12-13 16:24:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Patsy Griffin 2013-12-13 14:55:46 UTC
+++ This bug was initially created as a clone of Bug #951321 +++

I cloned this bug from a tzdata bug in order to preserve the discussions/explanations.   -Patsy


Description of problem:
During F19 L10n QA Test Day event, found that in Timezone tab, many cities are placed under Region without country names while there are America and Australia. It does not look good to see that country names are omitted.

Version-Release number of selected component (if applicable):
system-config-date-1.10.5-2.fc19.noarch

How reproducible:
always

Steps to Reproduce:
1. Run system-config-date
2.
3.
  
Actual results:
Beijin, Tokyo, Seoul, etc are placed under 'Asia' with no country name.
Monaco, Paris, Madrid, London etc are placed under 'Europe' with no country name.

Expected results:
Country names should be appeared, such as Japan, China, Korean, France, Germany, Spain, etc.

Additional info:

--- Additional comment from Nils Philippsen on 2013-04-12 05:28:16 EDT ---

System-config-date orders cities exactly as listed in /usr/share/zoneinfo/zone.tab and "America" and "Australia" are used as continent names here. This file comes from the tzdata package and any changes would need to be acceptable upstream... I suspect that a proposed change to introduce a "country level" wouldn't be accepted because there are disputes about e.g. to which country some regions belong and so forth, but I'll change the component to tzdata regardless, perhaps its maintainer can provide some more details.

--- Additional comment from Petr Machata on 2013-04-12 07:01:18 EDT ---

The logic behind this is that cities are more stable than country borders.  The border disputes that Nils mentions are also a good reason.  Where there are individual countries mentioned, they are often for backward compatibility with ancient systems.  Sometimes there's grouping where that's practical, e.g. we have a bunch of zones under America/Indiana/ and America/Argentina/.  On the other hand we don't have America/Mexico/ even though there's a bunch of Mexican zones.  It's really quite arbitrary.

The reason this can be arbitrary is that individual users shouldn't need to care about the zone naming at all.  Those are just internal identifiers, much like names of ABI symbols in a library.  We might just as well name those things with UUID strings and it would be perfectly OK.  What users should care about is what's in zone.tab, which is what system-config-date should present to the user.

--- Additional comment from Nils Philippsen on 2013-05-16 04:41:41 EDT ---

(Sorry for the delay, was on vacation.)

(In reply to comment #2)
> what users should care about is what's in zone.tab, which is what
> system-config-date should present to the user.

Which is what it does, the arbitrary identifiers are in zone.tab (as well) ;-).

--- Additional comment from Noriko Mizumoto on 2013-06-04 01:24:26 EDT ---

Ok, I can understand the border disputes for some regions.
Let me report just one thing I noticed for Japanese locale user.
Using locale 'Japanese', Cities localized in Kanji always come at the bottom of the list, while most of cities localized in Katakana are appeared in 50-on order as expected. This may be Japanese specific issue though.

The following are cities localized in Kanji and appeared at the bottom of each continent.
Asia - Hong Kong, Chungking, Shanghai, Taipei, Tokyo and Pyongyang (see attached screenshot)
America - Southern Prince'
Antarctica - Showa station and south pole

--- Additional comment from Petr Machata on 2013-06-04 06:44:12 EDT ---

I'm sorry, but I don't seem to follow.  Are you saying that ordering of Katakana words should be different with respect to Kanji words (I.e. they should appear at the top of the list, or should be intermixed according to pronunciation, etc.)?  Or that those locations have an existing Kanji rendering, and shouldn't be transcribed using Katakana?

--- Additional comment from Nils Philippsen on 2013-06-04 08:31:08 EDT ---

(In reply to Noriko Mizumoto from comment #4)
> Created attachment 756617 [details]
> cities localized in Kanji are appeared at the bottom of the list
> 
> Ok, I can understand the border disputes for some regions.
> Let me report just one thing I noticed for Japanese locale user.
> Using locale 'Japanese', Cities localized in Kanji always come at the bottom
> of the list, while most of cities localized in Katakana are appeared in
> 50-on order as expected. This may be Japanese specific issue though.

The sorting (within geographic time zones) is done by the translated name of the time zone, and depends on the collation rules of the locale set -- apparently the rules for Japanese are to sort Kanji after Katakana.

--- Additional comment from Mike FABIAN on 2013-06-04 17:38:11 EDT ---

How would you sort words written in Kanji?  If you want to sort them
phonetically, it would be necessary to know the pronunciation of the
city names written in Kanji.

But Japanese pronunciation is not very regular. For example, to sort a
phonebook with person names phonetically, it is necessary to add the
readings as well.

Is 河野 pronounced かわの or こうの or こおの? How should the sort algorithm
know that if the correct reading is not given as well?

In my Android phone, the Japanese names are only sorted correctly if
I enter not only the Kanji but also the readings in Hiragana. Without
that it is not really possible to sort Japanese well, I think.

--- Additional comment from Mike FABIAN on 2013-06-04 18:00:50 EDT ---

(In reply to Noriko Mizumoto from comment #4)
> Created attachment 756617 [details]
> cities localized in Kanji are appeared at the bottom of the list
> 
> Ok, I can understand the border disputes for some regions.
> Let me report just one thing I noticed for Japanese locale user.
> Using locale 'Japanese', Cities localized in Kanji always come at the bottom
> of the list, while most of cities localized in Katakana are appeared in
> 50-on order as expected. This may be Japanese specific issue though.
> 
> The following are cities localized in Kanji and appeared at the bottom of
> each continent.
> Asia - Hong Kong, Chungking, Shanghai, Taipei, Tokyo and Pyongyang (see
> attached screenshot)
> America - Southern Prince'
> Antarctica - Showa station and south pole

$ echo -e "上海\nラングーン\n香港\nリヤド\n重慶\n平壌\n台北\n東京\nヴィエンチャン\n" | LC_ALL=ja_JP.UTF-8 sort 

ラングーン
リヤド
ヴィエンチャン
香港
重慶
上海
台北
東京
平壌
mfabian@ari:~

So that is just the way glibc sorts this in ja_JP.UTF-8 locale.
Of course this is not nice for the names written in Kanji, but
how to do this better without knowing the readings?

But glibc doesn’t even sort kana only nicely:

mfabian@ari:~
$ echo -e "ウ\nう\nた\nカ\nラ\nら\nワ\nわ\nヲ\nを\nヴ\nゔ\nヤ\nや\nか\nモ\nも\nヒ\nひ\nナ\\nな\nア\nあ\nエ\nえ\nイ\nい\nサ\nさ\n" | LC_ALL=ja_JP.UTF-8 sort 

ゔ
あ
い
う
え
か
さ
た
な
ひ
も
や
ら
わ
を
ア
イ
ウ
エ
カ
サ
ナ
ヒ
モ
ヤ
ラ
ワ
ヲ
ヴ
mfabian@ari:~
$

All the hiragana are above all the katakana, this is not very nice.
And in hiragana, ゔ is at the top above あ, in katakana ヴ is at the
end after ヲ. So not even the kana are sorted nicely.

Correct order would be:

1 	あ
2 	ア
3 	い
4 	イ
5 	う
6 	ウ
7 	ゔ
8 	ヴ
9 	え
10 	エ
11 	か
12 	カ
13 	さ
14 	サ
15 	た
16 	な
17 	ナ
18 	ひ
19 	ヒ
20 	も
21 	モ
22 	や
23 	ヤ
24 	ら
25 	ラ
26 	わ
27 	ワ
28 	を
29 	ヲ

(Sorted  with libicu using http://minaret.info/test/sort.msp)

So glibc does not even sort the kana correctly.

libicu does a better job sorting Japanese, but to sort words in Kanji
correctly, I think one needs to know the correct readings.

--- Additional comment from Mike FABIAN on 2013-06-04 18:07:16 EDT ---

You can also try this
to see how libicu sorts Japanese:

http://demo.icu-project.org/icu-bin/locexp?_=ja&d_=en&x=col

--- Additional comment from Mike FABIAN on 2013-06-04 18:12:05 EDT ---



--- Additional comment from Nils Philippsen on 2013-06-05 08:58:01 EDT ---

The strings in the list are eventually sorted using g_utf8_collate() from glib (I don't use a custom sorting function), which in turn uses wcscoll() from glibc on current Linux systems (i.e. here). I'm not sure if there is a viable way to fix collation in glibc for even just for kana. An alternative way would be to use libicu for collating if it is present (i.e. try to dlopen() it), but that would result in different sorting depending on if libicu is installed.

--- Additional comment from Fedora Admin XMLRPC Client on 2013-09-04 21:48:49 EDT ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from Fedora Admin XMLRPC Client on 2013-09-04 21:49:53 EDT ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from Noriko Mizumoto on 2013-12-11 01:38:06 EST ---

Ping,
Is any progress on this problem?

--- Additional comment from Nils Philippsen on 2013-12-12 09:57:35 EST ---

Which issue do you mean exactly?

Talking about the original issue (no country names), this is not a bug as I've explained in comment #1. This is basically just waiting for confirmation from the tzdata maintainer to be closed IMO. Patsy, welcome aboard BTW and what's your take?

If you refer to the Kanji vs. Katakana sorting issue from comment #4 ff., this is a different bug that should be opened (or cloned from this one for reference) against glibc (I guess).

--- Additional comment from Mike FABIAN on 2013-12-12 10:27:34 EST ---

(In reply to Nils Philippsen from comment #15)
> Which issue do you mean exactly?
[...]
> If you refer to the Kanji vs. Katakana sorting issue from comment #4 ff.,
> this is a different bug that should be opened (or cloned from this one for
> reference) against glibc (I guess).

Yes, a different bug against glibc for that problem would be nice.

--- Additional comment from Patsy Franklin on 2013-12-13 09:35:07 EST ---

Thanks for the input folks - and for the "welcome"!

I'm going to close this and open a glibc bug based on comment 4.

-Patsy

Comment 3 Patsy Griffin 2013-12-13 16:24:52 UTC
This needs more investigation on RHEL 7.0.   I'll re-open if needed.


Note You need to log in before you can comment on or make changes to this bug.