Bug 227792

Summary: Chewing - similar glyphs in different locales are all listed in candidate list.
Product: [Fedora] Fedora Reporter: Caius Chance <me>
Component: ibus-chewingAssignee: Ding-Yi Chen <dchen>
Status: ASSIGNED --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: dchen, i18n-bugs, mfabian, petersen, tagoh, triage, yshao
Target Milestone: ---Keywords: FutureFeature, i18n, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: bzcl34nup
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On: 227473    
Bug Blocks:    

Description Caius Chance 2007-02-08 00:40:15 EST
+++ This bug was initially created as a clone of Bug #227473 +++

Description of problem:


Version-Release number of selected component (if applicable):
scim-chewing-0.3.1-10.el5

How reproducible:
always

Steps to Reproduce:
1.log in zh_TW locale
2.start gedit
3.ctrl-space to bring up Chewing IM and type 'gji' 
4.SPACE SPACE to bring up candiate window
  
Actual results:
there are two 說 in candidate list

Expected results:


Additional info:

-- Additional comment from xwang@redhat.com on 2007-02-06 03:31 EST --
Created an attachment (id=147448)
screenshot


-- Additional comment from zhu@redhat.com on 2007-02-06 04:05 EST --
Well, these two chracters look like the same, but their value are not the same
in fact, one is for simplified Chinese, and the other is the traditional Chinese
version.
If you type this in simplified Chinese environment, you will see the second one
will show as garbage, which indicates that the two are different.

So this should not being a bug.


-- Additional comment from cchance@redhat.com on 2007-02-06 18:57 EST --
I agree with Hu. They are two different characters that having different codepoint:

說: U+8AAA
説: U+8AAC

There are so many similar cases that Unicode treats characters with different
strokes or from different cultures (China/Taiwan/Japan/Korea/Hong Kong/etc) as
different characters.

So probably this is not a bug.

-- Additional comment from tagoh@redhat.com on 2007-02-07 05:20 EST --
Well, just add a comment from the usability aspect, what's actually the target
language for chewing? if it's only Traditional Chinese, one should be removed
from dictionary. if both is the target, does chewing really want to support both
input in one input style or layout?  since there are different codepoint for
similar glyphs, leading different character that isn't comfortable with current
language isn't a good usability.  for example, one assumes inputing U+8AAA but
it was actually U+8AAC. one may not realizes that thing since it looks similar.
but it might causes trouble sooner or later. what does it sound to you? is it
still NOTABUG?

We should prevent any confusions as far as possible.


-- Additional comment from cchance@redhat.com on 2007-02-07 17:44 EST --
This behavior is not only existing in chewing, but also in Changjie at least
AFAIK. The Chinese input is deeper meaning than just deciding whether just
Traditional Chinese or Simplified to be used. 

Firstly, how about it if there are some users who is doing business in both
China and Taiwan, who just understand Chewing input method? Picking either
language to support obviously is not the full solution. I am not too sure if we
should use a hotkey or a preference option to let user switch between code
ranges is a good idea.

Secondly, there are some characters are shared by both Simp Chinese and Trad
Chinese. If we need to get the hide the characters that non Simp/Trad Chinese,
we have to had the list of which characters are used in such language/locale.

Thirdly, there are some characters that are non Simp/Trad Chinese but be used in
both of them. Such as Japanese Hirakana Katagana, Hangul, etc. Furthermore,
Kanji (Japanese Chinese characters) and Hanja (Korean Chinese characters) are
needed by someone who need Chinese and such languages.

Fourthly, when we created two tables for two Chinese languages, any changes on
the input combination keys may double the expense of maintance resources.

A wider coverage should be the fundamental direction of improvement for all
input method (or IME). Generally character frequency is the current solution
developed by upstream which tries to be more flexible.

I need to consult with upstream. This should be recognized as a new feature
instead of issue resolution. It should be in devel branch but not in RHEL. IMHO.

-- Additional comment from cchance@redhat.com on 2007-02-07 17:49 EST --
If we analysis and IF it is positive for us to go for that, it might be good
idea to clone this feature request to scim-tables and other IMEs.

-- Additional comment from tagoh@redhat.com on 2007-02-07 22:46 EST --
Ok, just dealing with this as a feature would be good too. as a suggestion, if
both characters appears in the candidate list at the same time, how about
managing to show up if the character is Traditional Chinese or Simplified
Chinese in the candidate list?  It should be less confusion and they can choose
one easily.

Anyway just dealing with this as NOTABUG so that it's likely to happen according
to that usage doesn't make sense to me. or I just feel like that because I'm not
a native speaker?

-- Additional comment from cchance@redhat.com on 2007-02-07 23:21 EST --
Apart from how much we need to invest for achieving the feature, all I am trying
to express is 'There are many Chinese/Kanji/Hanji codepoints are actually the
same word.'.

Though they might have even same meaning, they are treated by Unicode
organization as different characters only because stroke has minor differences.

If we want Chewing to be smart, it has to be smart by:

1. Learns from user through frequency of usage.

2. Bases on charset standard that which characters are in which locale.
(Question: should we refer to Unicode standard about categories of chars, or
should we refer to localized encoding? e.g. big5/gbk/gb18030,ISO2022-JP,JIS,etc)

3. We might improve to let user customize their preferred character (e.g. some
of them might preferred typing Trad Chinese but with specific academic words in
Simp Chinese.)

4. Put a note next to the characters in candidate list is good idea. OS X has
some icons to indicate to user that certain characters are not in current locale
code range. We just need to spend time to research how to implement, especially
to discover exactly which characters current in Chewing candidate list are only
in either Simp Chinese or Trad Chinese or even are foreign characters such as
Japanese or Hangul. (i.e. We might need to analysis all characters in Chinese to
see each character whether just appears in 1 language/locale, or shared by both
Trad and Simp Chinese even Japanese/Korean.)

-- Additional comment from cchance@redhat.com on 2007-02-08 00:13 EST --
Bug# 227466 has similar relationship with this bug: The cover range of chewing
characters are wider than font glyph range.

-- Additional comment from cchance@redhat.com on 2007-02-08 00:38 EST --
Clone this feature request to new devel bug, please follow-up at there.
Comment 1 Bug Zapper 2008-04-03 15:03:19 EDT
Based on the date this bug was created, it appears to have been reported
against rawhide during the development of a Fedora release that is no
longer maintained. In order to refocus our efforts as a project we are
flagging all of the open bugs for releases which are no longer
maintained. If this bug remains in NEEDINFO thirty (30) days from now,
we will automatically close it.

If you can reproduce this bug in a maintained Fedora version (7, 8, or
rawhide), please change this bug to the respective version and change
the status to ASSIGNED. (If you're unable to change the bug's version
or status, add a comment to the bug and someone will change it for you.)

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.
Comment 2 Caius Chance 2008-04-30 20:09:51 EDT
IMHO firstly we should separate the table file into smaller ones in different
charset.

We need some docu about charset range. I am not sure if the charsets such as
big5, gbk/gb18030, iso-2022-jp/shift-jis are purely subset of UTF-8. If so, it
would be a simpler case. 
Comment 3 Bug Zapper 2008-05-13 22:35:51 EDT
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 4 Caius Chance 2008-05-15 03:34:40 EDT
FYI,

Currently libchewing-[v-r]/data/phone.cin* contains the table.

Also, there is TC <-> SC conversion filter available in SCIM atm.
Comment 5 Ding-Yi Chen 2008-05-22 01:39:16 EDT
Hi,

I am thinking of a more generalized solution. We can label each character by 
the locale they appeared. For example:
說: U+8AAA (zh_TW, kr) variant 説,说
説: U+8AAC (zh_CN, jp) variant 說,说
说: U+8BF4 (zh_CN) variant 説,说

In SCIM setting, there should be a set of check boxes for user to toggle the 
output he/she desire. What say you?

Comment 6 Caius Chance 2008-05-22 02:35:29 EDT
Exactly.
Comment 7 Akira TAGOH 2008-05-22 02:49:12 EDT
If there are any requirements one wants to see characters in the same time,
which looks same, you could as one of options in IME's preference. I'm not sure
if there are. but seems not for the above case at least.

well, I'd rather prefer IME itself deals with it against current locale or input
layout - is there any layout both zh_CN and zh_TW uses? if not, I don't think
that option really helps for the kind of this problem. speaking of the above
characters, does people in zh_TW locale really wants to see U+8AAC and U+8BF4 in
their input according to that option you suggest?
Comment 8 Ding-Yi Chen 2008-07-02 02:47:34 EDT
I agree with the view of Tagoh, as the functionality just benefits Hanzi users, 

Anyway, what I will do is: 
IME enables  common IRG sources for locale default.
For example, default for CN is G0; TW is T1, T2; HK is T1, T2, H,
JP is J0.

There will be also GUI in IME to enable the reset of the common sources.
A "Advance setting" button will be in this GUI for showing advance IRG sources.
Comment 9 Tony Fu 2008-09-09 23:12:30 EDT
requested by Jens Petersen (#27995)
Comment 10 Bug Zapper 2009-06-09 18:26:46 EDT
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 11 Jens Petersen 2009-10-13 02:45:16 EDT
Moving to ibus-chewing.
Comment 12 Bug Zapper 2009-11-16 02:54:24 EST
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 13 Bug Zapper 2010-11-04 08:12:48 EDT
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 14 Akira TAGOH 2010-11-19 03:01:48 EST
Any progress on this?
guess we should add FutureFeature tag to avoid house-keeping?