Bug 1058029 - ibus-table-chinese-cangjie has many corruptions in common characters
Summary: ibus-table-chinese-cangjie has many corruptions in common characters
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: ibus-table-chinese
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Mike FABIAN
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-26 14:38 UTC by Bo-Yin Yang
Modified: 2015-06-30 01:34 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-06-30 01:34:21 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Bo-Yin Yang 2014-01-26 14:38:51 UTC
Description of problem:
ibus table/database or character selection corrupted, 
a more accurate description of bug#1050753

Version-Release number of selected component (if applicable):
ibus-table-chinese-cangjie-1.4.6-3.fc20

How reproducible:
100%

Steps to Reproduce:
1. start with a fresh install or uninstall ibus-table-chinese-cangjie,
   remove ~/.ibus/tables/cangjie5-user.db
2. install ibus-table-chinese-cangjie, choose Cangjie5
3. try to type any of the common characters

么	HI	
制	HBLN	
只	RC	
同	BMR	
致	MGOK	
表	QMV	
里	WG

Actual results:
* The correct character does not appear as default (first character)
* The correct character appears after other incomplete sequences,
  for example, for RC, the IM will rank rare characters which have
  encoding RCI, RCL, RCO before the common character
* The correct character does not become the default after you choose it
  a few times, does not after 100 times by my testing.

Expected results:
The correct character should be the default (first character offered).
It should become the default if by some chance it wasn't listed as such.

Additional info:
* MGOK offers up the wrong character (Unicode : 0x26936) instead of 
  the correct character (Unicode : 0x81F4)
* HGI never offers up the correct character (Unicode : 0x4E1F) but
  instead a wrong character (Unicode: 0x4E22) as a very late choice;
  Unicode 0x4E1F is what a traditional Chinese speaker using Zhuyin 
  or Chewing (it is also available under Cangjie5 as MGI or XXMGI)
  would get

Comment 1 Bo-Yin Yang 2014-01-28 14:59:47 UTC
More errors:
板	DHE
向	HBR

Comment 2 Mike FABIAN 2014-03-19 08:59:13 UTC
(In reply to Bo-Yin Yang from comment #0)

> * HGI never offers up the correct character (Unicode : 0x4E1F) but
>   instead a wrong character (Unicode: 0x4E22) as a very late choice;
>   Unicode 0x4E1F is what a traditional Chinese speaker using Zhuyin 
>   or Chewing (it is also available under Cangjie5 as MGI or XXMGI)
>   would get

That HGI *never* offers the correct character looks like a bug
in ibus-table-chinese:

mfabian@ari:/local/mfabian/src/ibus-table-chinese/tables/cangjie (master)
$ grep ^hgi cangjie5.txt 
hgi	丢	1000
hgi	U+25B14	1000
hgii	U+25D3E	1000
hgik	U+24875	1000
hgin	U+25BBB	1000
hgit	篕	1000
mfabian@ari:/local/mfabian/src/ibus-table-chinese/tables/cangjie (master)
$ 

U+4E22 丢 is not in the cangjie5.txt table used by ibus-table-chinese.

The other problems reported (characters offered in the wrong
order, ibus-table doesn’t seem to learn from user input, ...)
are bugs in ibus-table.

I’ll look into these when I am back from my vacation on 2014-03-31.

Comment 3 Bo-Yin Yang 2014-03-22 00:56:41 UTC
My hands hurt too much to type anything else now, but thank you (or anyone else from Red Hat) for actually paying attention.

Comment 4 Bo-Yin Yang 2014-03-22 01:01:28 UTC
I switched for the time being to ibus-cangjie, but I copy an incomplete listing of common Chinese characters not showing up as the first of the list

么	HI	0x4E48	#25 
刮	HRLN	0x522E	#5 
制	HBLN	0x5236	#2 
厘	MWG	0x5398	#4 
只	RC	0x53EA	#7 
同	BMR	0x540C	#10 
向	HBR	0x5411	#5 
板	DHE	0x677F	#10 
致	MGOK	0x81F4	#4 char_#2(0x26936)_not_81F4
表	QMV	0x8868	#9 
里	WG	0x91CC	#8 
恥	SJP	0x6065	#2 this_one_eventually_flipped to #1
丟	HGI	0x4E1F  NEVER char_#6 is 0x4E22 _not_ 4E1F
压	MGI	0x538B	#15 
个	OL	0x4EF2  #10 
采	BD	0x91C7  #15

Comment 5 Mike FABIAN 2014-05-27 21:16:11 UTC
(In reply to Bo-Yin Yang from comment #4)

Testing your problematic characters using the new packages mentioned in

https://bugzilla.redhat.com/show_bug.cgi?id=1050753#c11


https://admin.fedoraproject.org/updates/ibus-table-1.5.0.20140527-1.fc20

https://admin.fedoraproject.org/updates/ibus-table-chinese-1.4.6-4.fc20

https://admin.fedoraproject.org/updates/ibus-table-others-1.3.0.20140512-2.fc20

I use Chinese mode 5 “All Chinese characters” for this test.

> I switched for the time being to ibus-cangjie, but I copy an incomplete
> listing of common Chinese characters not showing up as the first of the list
> 
> 么	HI	0x4E48	#25

1st match.

> 刮	HRLN	0x522E	#5

1st match.

> 制	HBLN	0x5236	#2

1st match.

> 厘	MWG	0x5398	#4

1st match.

> 只	RC	0x53EA	#7

1st match

> 同	BMR	0x540C	#10

2nd match, 1st match is U+26688

> 向	HBR	0x5411	#5

1st match

> 板	DHE	0x677F	#10

2nd match, 1st match is 皮

> 致	MGOK	0x81F4	#4 char_#2(0x26936)_not_81F4

1st match.

> 表	QMV	0x8868	#9

1st match.

> 里	WG	0x91CC	#8

1st match.

> 恥	SJP	0x6065	#2 this_one_eventually_flipped to #1

2nd match, 1st match is 憵

> 丟	HGI	0x4E1F  NEVER char_#6 is 0x4E22 _not_ 4E1F

丢 U+4E22 is the 1st match. 丟 U+4E1F is in cangjie5.txt as “mgi”:

cangjie5.txt, line 32861> mgi	丟	1000

> 压	MGI	0x538B	#15

1st match.

> 个	OL	0x4EF2  #10

1st match.

> 采	BD	0x91C7  #15

1st match.

So most of your problematic characters work fine now.

Except:

> 同	BMR	0x540C	#10

2nd match, 1st match is U+26688
> 板	DHE	0x677F	#10

2nd match, 1st match is 皮

> 恥	SJP	0x6065	#2 this_one_eventually_flipped to #1

2nd match, 1st match is 憵

> 丟	HGI	0x4E1F  NEVER char_#6 is 0x4E22 _not_ 4E1F

丢 U+4E22 is the 1st match. 丟 U+4E1F is in cangjie5.txt as “mgi”:

cangjie5.txt, line 32861> mgi	丟	1000

> 压	MGI	0x538B	#15

1st match.

> 个	OL	0x4EF2  #10

1st match.

> 采	BD	0x91C7  #15

1st match.

So most of your problematic characters work fine now.

Except:

> 同	BMR	0x540C	#10

2nd match, 1st match is U+26688
> 板	DHE	0x677F	#10

2nd match, 1st match is 皮

> 丟	HGI	0x4E1F  NEVER char_#6 is 0x4E22 _not_ 4E1F

丢 U+4E22 is the 1st match. 丟 U+4E1F is in cangjie5.txt as “mgi”:

cangjie5.txt, line 32861> mgi	丟	1000

> 压	MGI	0x538B	#15

1st match.

> 个	OL	0x4EF2  #10

1st match.

> 采	BD	0x91C7  #15

1st match.

So most of your problematic characters work fine now.

Except:

> 同	BMR	0x540C	#10

2nd match, 1st match is U+26688
> 板	DHE	0x677F	#10

2nd match, 1st match is 皮

> 恥	SJP	0x6065	#2 this_one_eventually_flipped to #1

2nd match, 1st match is 憵

> 丟	HGI	0x4E1F  NEVER char_#6 is 0x4E22 _not_ 4E1F

丢 U+4E22 is the 1st match. 丟 U+4E1F is in cangjie5.txt as “mgi”:

cangjie5.txt, line 32861> mgi	丟	1000

> 压	MGI	0x538B	#15

1st match.

> 个	OL	0x4EF2  #10

1st match.

> 采	BD	0x91C7  #15

1st match.

So most of your problematic characters work fine now.

======================================================
Remaining problems:
======================================================

Only in 2 cases the characters you mention are the 2nd match:

> 同	BMR	0x540C	#10

2nd match, 1st match is U+26688

> 板	DHE	0x677F	#10

2nd match, 1st match is 皮

See https://bugzilla.redhat.com/show_bug.cgi?id=1050753#c12

for ideas how to make these the first matches by default.

I guess there is no other way but changing the “system frequencies”
manually in cangjie5.txt for characters like this as the order in
cangjie5.txt doesn’t seem to have the character you want as the first
match first always. And avoiding characters with code points > U+FFFF
as the first matches does not solve all such problems either. So
probably we really need to manually edit cangjie5.txt here and decide
which of the characters with the identical input character sequence is
the most common one.

> 丟	HGI	0x4E1F  NEVER char_#6 is 0x4E22 _not_ 4E1F

丢 U+4E22 is the 1st match. 丟 U+4E1F is in cangjie5.txt as “mgi”:

That might be a bug in cangjie5.txt.

Comment 6 Mike FABIAN 2014-05-28 21:08:28 UTC
(In reply to Mike FABIAN from comment #5)

> ======================================================
> Remaining problems:
> ======================================================
> 
> Only in 2 cases the characters you mention are the 2nd match:
> 
> > 同	BMR	0x540C	#10
> 
> 2nd match, 1st match is U+26688

I fixed this with this patch:

https://github.com/mike-fabian/ibus-table/commit/063a61ae9923de14ce880a1a444af81cff22e00e

That does not fix all such cases, but many of them and it has
no disadvantages.

On top of that, I also updated the copy of Unihan_Variants.txt in
ibus-table from Update Unihan_Variants.txt from “2011-08-08 Unicode 6.1.0” to “2013-02-25 Unicode 6.3.0”:

https://github.com/mike-fabian/ibus-table/commit/93e4781d91e94be3327bf2dae4be4b8859e41bff

The new Unihan_Variants.txt fixes a few traditional/simplified Variants
problems, but it still has 同 as a “simplified only” character.

So I added one more patch on top of the update of Unihan_Variants.txt:

https://github.com/mike-fabian/ibus-table/commit/9598e12ef26407edad1241921eb7750b561d9d2e

which makes ibus-table consider 同 as both, simplified *and* traditional
Chinese now. I should probably try to report that as a bug against Unihan_Variants.txt.

So 同 is the first match when typing “bmr” now and it is shown in
traditional Chinese mode as well.

> > 板	DHE	0x677F	#10
> 
> 2nd match, 1st match is 皮

ibus-cangjie also gives 皮 as the first match when typing “dhe”
and 板 as the second match. I.e. the current behaviour of
ibus-table and ibus-cangjie agree here.

  
> > 丟	HGI	0x4E1F  NEVER char_#6 is 0x4E22 _not_ 4E1F
> 
> 丢 U+4E22 is the 1st match. 丟 U+4E1F is in cangjie5.txt as “mgi”:
> 
> That might be a bug in cangjie5.txt.

Hm, but this also agrees with ibus-cangjie. The table.txt database
used by libcangjie which is used by ibus-cangjie contains:

   丟 丢 1 1 0 0 1 0 0 0 0 hgi mgi,xxmgi NA 22813
   丢 丢 1 0 0 0 1 0 0 0 0 hgi hgi NA 0

The 12th column contains the cangjie3 code,  the 13th column the
cangjie5 code.

I.e. when using cangjie5, ibus-cangjie also does never match
丟 U+4E1F when typing “hgi”, it does that only when using cangjie3.
When using cangjie5, one has to type “mgi” or “xxmgi” to get 丟 U+4E1F.
That is the same in the current version of ibus-table with cangjie5.

Comment 7 Mike FABIAN 2014-05-28 21:19:30 UTC
Here is an update with the fixes I mentioned in comment#6:

https://admin.fedoraproject.org/updates/ibus-table-1.5.0.20140528-1.fc20

The ibus-table-chinese-* packages are unchanged as the database
format is unchanged, you can still use these:

https://admin.fedoraproject.org/updates/ibus-table-chinese-1.4.6-4.fc20

https://admin.fedoraproject.org/updates/ibus-table-others-1.3.0.20140512-2.fc20

Comment 8 Mike FABIAN 2014-06-02 13:55:44 UTC
Dear Bo-Yin Yang,

did you have a chance to try my updates from comment#7 already?

I am happy about any feedback. If these work for you, you can
also leave karma.

Comment 9 Bo-Yin Yang 2014-06-02 15:11:03 UTC
Hi Mike:

I had been sideline for a few days due to other obligations, and have not tried out those patches (and will do so ASAP) .... but I have to confess that I do not find the location to leave karma.

Bo-Yin

Comment 10 Mike FABIAN 2014-06-02 15:38:03 UTC
(In reply to Bo-Yin Yang from comment #9)
> I had been sideline for a few days due to other obligations, and have not
> tried out those patches (and will do so ASAP) .... 

Thank you!

> but I have to confess that I do not find the location to leave
> karma.

If you go to this page:

https://admin.fedoraproject.org/updates/FEDORA-2014-6855/ibus-table-1.5.0.20140528-1.fc20

You can download the ibus-table packages there by clicking on
the link after

    Builds:

then on

   Descendants build
                |
		-------- buildArch ...

which will take you to

    http://koji.fedoraproject.org/koji/taskinfo?taskID=6904858
    
where you will find 

    Output
            ...
	    
           ibus-table-1.5.0.20140528-1.fc20.noarch.rpm
           ibus-table-1.5.0.20140528-1.fc20.src.rpm
           ibus-table-devel-1.5.0.20140528-1.fc20.noarch.rpm


Here one can download the packages.

(It is easier to get the updated packages by

    sudo yum --enablerepo=updates-testing install ibus-table

)

On the first page I mentioned, i.e.

https://admin.fedoraproject.org/updates/FEDORA-2014-6855/ibus-table-1.5.0.20140528-1.fc20

there is a link:

     Add a comment >>

at the bottom.

If you have tested the update and if works for you, you can
add a comment like “It works!” and check the “Works for me”
radio button. This is called “giving karma”, when a package
has 3 karma, it can go from testing to stable.

Comment 11 Mike FABIAN 2014-06-03 06:41:17 UTC
New version:

https://admin.fedoraproject.org/updates/ibus-table-1.8.0-1.fc20

No big changes to ibus-table-1.5.0.20140528-1.fc20 , just a
proper version number for indicating that this release is a major
update.

Comment 12 Mike FABIAN 2014-06-06 17:46:27 UTC
New version:

https://admin.fedoraproject.org/updates/FEDORA-2014-7098/ibus-table-1.8.1-1.fc20

adds wildcard support (For example, you can now type “s*f” or “s?sf” to match
馬 which has the cangjie5 code sqsf).

It also fixes the bug that the cangjie prompts 日, 月, 金 ...
were shown instead of Latin even in pinyin mode. For pinyin mode,
of course the Latin prompts should be used.

Comment 13 Mike FABIAN 2014-09-21 15:03:58 UTC
Bo-Yin,does it work well for you now?

Comment 14 Mike FABIAN 2015-05-19 07:44:33 UTC
Bo-Yin,does it work well for you now?

Comment 15 Fedora End Of Life 2015-05-29 10:41:15 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 16 Fedora End Of Life 2015-06-30 01:34:21 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.