Bug 856903

Summary:	The ipa-x-sampa input method does not work at all anymore with newer versions of ibus-table
Product:	[Fedora] Fedora	Reporter:	Mike FABIAN <mfabian>
Component:	ibus-table	Assignee:	Mike FABIAN <mfabian>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	18	CC:	bochecha, i18n-bugs, K9, kent.neo, mfabian, pwu, shawn.p.huang
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-11-28 11:53:56 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Mike FABIAN 2012-09-13 05:52:54 UTC

The problem is caused by this code in tabcreatedb.py:

    def parse_source (f):
        _attri = []
        _table = []
        _gouci = []
        patt_com = re.compile(r'^###.*')
        patt_blank = re.compile(r'^[ \t]*$')
        patt_conf = re.compile(r'[^\t]*=[^\t]*')
        patt_table = re.compile(r' *([^\s]+) *\t *([^\s]+)\t *[^\s]+ *$')
        patt_gouci = re.compile(r' *[^\s]+ *\t *[^\s]+ *$')
        patt_s = re.compile(r' *([^\s]+) *\t *([\x00-\xff]{3}) *\t *[^\s]+ *$')


The  ipa-x-sampa.txt table contains lines like:
        
    ### Begin Table data.
    BEGIN_TABLE
    !	↓	0	# U+2193 DOWNWARDS ARROW
    !\	!	0	# U+0021 EXCLAMATION MARK

and the above patt_table regular expression does not match because
of the trailing comments.

Therefore, the generated table is empty.

Comment 1 Mike FABIAN 2012-09-13 07:33:08 UTC

I asked Yuwei Yu how to fix this best, here is our chat log:

07:56:21

me I found another small problem in ibus-table, tabcreatedb.py
07:56:34

me https://bugzilla.redhat.com/show_bug.cgi?id=856903
07:56:49

me     def parse_source (f):
        print "****mike in parse_source(%s)" %f
        _attri = []
        _table = []
        _gouci = []
        patt_com = re.compile(r'^###.*')
        patt_blank = re.compile(r'^[ \t]*$')
        patt_conf = re.compile(r'[^\t]*=[^\t]*')
        patt_table = re.compile(r' *([^\s]+) *\t *([^\s]+)\t *[^\s]+\s*(#.*)*$')
        patt_gouci = re.compile(r' *[^\s]+ *\t *[^\s]+ *$')
        patt_s = re.compile(r' *([^\s]+) *\t *([\x00-\xff]{3}) *\t *[^\s]+ *$')
07:57:05

me The  ipa-x-sampa.txt table contains lines like:
        
    ### Begin Table data.
    BEGIN_TABLE
    !	↓	0	# U+2193 DOWNWARDS ARROW
    !\	!	0	# U+0021 EXCLAMATION MARK

and the above patt_table regular expression does not match because
of the trailing comments.

Therefore, the generated table is empty.
07:57:29

me Ah, sorry, above is already my attempt to fix it:
07:57:37

me patt_table = re.compile(r' *([^\s]+) *\t *([^\s]+)\t *[^\s]+\s*(#.*)*$')
07:57:46

me This does match on the lines with the comments.
07:58:21

me The original regexp was: patt_table = re.compile(r' *([^\s]+) *\t *([^\s]+)\t *[^\s]+ *$')
08:01:25

me Is the allowed format of the tables exactly defined somewhere?
08:02:02

me I the the 3rd column, frequency, is always a decimal number, right.
08:08:01

me Usually the columns are separated by a tab but your regexp allows for extra space. Maybe one 
should allow for any amount and type of extra space then?
08:08:23

me Maybe like this: patt_table = re.compile(r'\s*([^\s]+)\s*\t\s*([^\s]+)\s*\t\s*[^\s]+\s*(#.*)?$')
08:12:45

钰炜 Yuwei 余 YU well, the comments in table source is start with 3#
08:13:18

me Yes, this is the patt_com = re.compile(r'^###.*') regexp.
08:14:06

me But the comments after the lines in the table defining phrases used to work as well.
08:14:24

me Since the introduction of the above regexp, this broke.
08:14:43

me I have already tested the improved regexp 
08:14:49

me patt_table = re.compile(r' *([^\s]+) *\t *([^\s]+)\t *[^\s]+\s*(#.*)*$')
08:14:58

me which seems to fix this problem.
08:15:31

me But I wonder whether this is "good enough" or whether the regexp could be done better.
08:16:14

钰炜 Yuwei 余 YU so if we want to support the trailing comment, it should match the one which start 
from 3#
08:16:26

钰炜 Yuwei 余 YU not the single #
08:16:29

me OK.
08:16:38

me Then maybe the regexp like this:
08:17:31

me  patt_table = re.compile(r'\s*([^\s]+)\s*\t\s*([^\s]+)\s*\t\s*[^\s]+\s*(###.*)?$')
08:17:32

me ?
08:18:14

钰炜 Yuwei 余 YU yes
08:19:11

me I also changed " *\t *" between the columns to "\s*\t\s*".
08:19:26

me Does that make sense or should we leave this as it was.
08:24:49

钰炜 Yuwei 余 YU we'd better leave this as it was, since there are some phrase actually a tiny 
sentence with several spaces
08:27:21

me So maybe change the original:

        patt_table = re.compile(r' *([^\s]+) *\t *([^\s]+)\t *[^\s]+ *$')

into:

        patt_table = re.compile(r' *([^\s]+) *\t *([^\s]+)\t *[^\s]+ *(###.*)?$')
08:28:30

me But I think a "tiny sentence with several spaces"  would not be matched by the current regexp.
08:29:14

钰炜 Yuwei 余 YU I have not test it yet
08:29:58

me The phrase is matched by ([^\s]+), which does not allow space.
08:30:23

钰炜 Yuwei 余 YU you can add ' *' before $
08:30:37

me There are tables which have spaces in the phrases? 
08:31:10

me Isn't the " *" before $ redundant?
08:31:51

me I mean "(###.*)?$" and "(###.*)? *$" matches the same.
08:32:08

钰炜 Yuwei 余 YU just a little user friendly
08:32:30

me Because the ".*" after the "###" already matches anything, including the spaces.
08:32:53

钰炜 Yuwei 余 YU ok
08:33:39

钰炜 Yuwei 余 YU blind to see the .
08:42:54

me If there are phrases with spaces inside the phrases, I think they won't work with the current 
regexp.
08:45:44

钰炜 Yuwei 余 YU yes, current reg not allow spaces in phrase
08:47:06

me OK， I just checked that the following phrases from the emoji-table cannot be input at the 
moment:
08:47:14

me shiluo	(。_。) [失落]	0
shui	(′д｀ )…彡…彡[衰]	0
shy	shy~ o(*////▽////*)q	0
08:47:21

me Because they contain spaces in the phrasese.
08:47:24

me phrases.
08:48:59

me Can the key contain spaces as well? I guess not, am I right?
09:31:26

me Can I paste our chat in to the bugzilla for reference?
09:31:51

钰炜 Yuwei 余 YU yes you can
09:31:58

me OK, thank you!
09:32:06

钰炜 Yuwei 余 YU u are welcome

Comment 2 Fedora Update System 2012-09-13 14:39:34 UTC

ibus-table-1.4.99.20120907-3.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/ibus-table-1.4.99.20120907-3.fc18

Comment 3 Fedora Update System 2012-09-13 16:47:08 UTC

Package ibus-table-1.4.99.20120907-3.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing ibus-table-1.4.99.20120907-3.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-13927/ibus-table-1.4.99.20120907-3.fc18
then log in and leave karma (feedback).

Comment 4 Fedora Update System 2012-11-12 08:01:44 UTC

ibus-table-1.4.99.20120907-3.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/ibus-table-1.4.99.20120907-3.fc17

Comment 5 Fedora Update System 2012-11-13 12:50:40 UTC

ibus-table-1.4.99.20121113-1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/ibus-table-1.4.99.20121113-1.fc17

Comment 6 Fedora Update System 2012-11-28 11:53:58 UTC

ibus-table-1.4.99.20121113-1.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.