488203 – Multi-segment input: －てください ("-te kudasai") form incorrectly split for godan verbs.

Bug 488203 - Multi-segment input: －てください ("-te kudasai") form incorrectly split for godan verbs.

Summary: Multi-segment input: －てください ("-te kudasai") form incorrectly split for godan ...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	anthy
Sub Component:
Version:	10
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Akira TAGOH
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-03-03 06:32 UTC by Peter Gordon
Modified:	2009-03-04 08:44 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-03-03 07:40:25 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Peter Gordon 2009-03-03 06:32:14 UTC

[I wasn't quite sure where to file this; as I don't know if it's an issue with SCIM's handling of the Romaji input or if it's Anthy's handling of Kana-parsing, et al. Please reassign it as appropriate if necessary.]

Using the てください form of a verb is not properly seen in Anthy as its one (or two) segments. With multi-segment input, it parses as three seperate things the verb root, the てくだ, and the さい. 

However, it seems to correctly keep it as one big segment if a sentence-final  particle (such as よ or ね) is appended.

For example, 頑張（がんば）ってください　is parsed as がんばっ・てくだ・さい but if you add a particle to it, such as ね, it is properly seen as one big phrase: "頑張ってくださいね". (Here, "・" denotes where the seperations happen as Anthy splits it automatically.)

I've noticed that this also happens with other verbs, but only if they are of godan conjugations. Ichidan verbs, such as 食べる（たべる）, 見る（みる）, and 上げる（あげる）, exhibit normal behavior here.

For example, 話（はな）してください is seen as はなし・てくだ・さい instead of はなしてください (or perhaps 話して・ください) but 話してくださいよ - again with that final particle - properly is seen as one larger segment, "はなしてください".

---

NEVRAs of related packages:
scim-libs-1.4.7-35.fc10.x86_64
scim-anthy-1.2.7-1.fc10.x86_64
scim-lang-japanese-1.4.7-35.fc10.x86_64
scim-bridge-gtk-0.4.15-8.fc10.x86_64
scim-bridge-0.4.15-8.fc10.x86_64
scim-tomoe-0.6.0-5.fc10.x86_64
scim-gtk-1.4.7-35.fc10.x86_64
scim-1.4.7-35.fc10.x86_64
anthy-9100h-1.fc10.x86_64

Comment 1 Akira TAGOH 2009-03-03 07:40:25 UTC

The above examples works for me on fresh install. I guess you might have any kind of bad learning data. try again after disable IM with im-chooser and remove $HOME/.anthy and enable then.

Generally a conversion result could be broken in any IMEs with even commercial IMEs unexceptionally if it learns an input and a segment etc that isn't commonly used. as a result, they basically have a feature to not learn them automatically but do that when they want to. scim-anthy has this feature too.
Aside from that, I could patch the corpus data out to make a specific-segment a priority for you though, it might affects others if it won't happens commonly, i.e. on anthy with a clean dictionary. so if it's the case, I'm afraid I can't fix that.

Otherwise please feel free to file a kind of this bug then.

Comment 2 Peter Gordon 2009-03-04 08:44:45 UTC

Akira-san, I just removed the ~/.anthy directory as you suggested and that did indeed fix the problem for me; which means it probably was something weird that I had inadvertently trained it with. 

Thanks very much, and many apologies for the bug-spam. :)

Note You need to log in before you can comment on or make changes to this bug.