Bug 2125153

Summary: ibus does not support special Arabic compose sequences
Product: [Fedora] Fedora Reporter: Mike FABIAN <mfabian>
Component: ibusAssignee: fujiwara <tfujiwar>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 36CC: avidseeker7, i18n-bugs, mfabian, petersen, shawn.p.huang, tfujiwar
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ibus-1.5.27-9.fc38 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-12 10:15:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike FABIAN 2022-09-08 07:01:31 UTC
The following compose sequences do not work in ibus:

    $ grep -i arabic.*ligature /usr/share/X11/locale/en_US.UTF-8/Compose
    # Arabic Lam-Alef ligatures
    <UFEFB>	:   "لا" # ARABIC LIGATURE LAM WITH ALEF
    <UFEF7>	:   "لأ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE
    <UFEF9>	:   "لإ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA BELOW
    <UFEF5>	:   "لآ" # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE

They are needed because the Arabic keyboard layout outputs UFEFB on some key:

    $ grep -i fefb /usr/share/X11/xkb/symbols/ara 
        key <AB05> {  [           UFEFB,                UFEF5,                  NoSymbol,            NoSymbol ]};  // ‎ﻻ‎ ‎ﻵ‎
        key <AB05> {  [           UFEFB,                UFEF5,                     U06AB,               U06AD ]};  // ‎ﻻ‎ ‎ﻵ‎     ‎ګ‎ ‎ڭ‎

but the UFEFB characters is not what is desired, what one really wants is U+0644 U+0627. But xkb keyboard layouts can only output one keysym when typing a key, not two. So compose was used as a hack to work around this limitation of xkb:

The keyboard produces UFEFB and then the compose support replaces  this with U+0644 U+0627.

This works when the compose support in Xorg is used but not when the compose support in ibus is used.

How to reproduce:

1) First show that it works when using the Xorg compose support:

Start xterm like this (to disable ibus and use the Xorg compose support):

    env XMODIFIERS=@im=none xterm &

Then in the xterm, type

    $ echo -n b | iconv -f utf8 -t utf16le | od -x
    0000000 0062
    0000002

and we see that the b produces U+0062, which is correct.

Switch to the Arabic keyboard

    setxkbmap  ara

type “arrow up” to get the echo -n b | iconv -f utf8 -t utf16le | od -x line back, go back to the b with “arrow left”, type b and now one gets:

    echo -n لا | iconv -f utf8 -t utf16le | od -x 
    0000000 0644 0627
    0000004

I.e. even though the keyboard surely outputs only U+FEFB, the Compose support of Xorg transforms this into U+0644 U+0627

2) Now repeat the same test using the compose support in ibus:

Start xterm like this to use the ibus compose support:

    env XMODIFIERS=@im=ibus xterm &

Then in the xterm, type

    $ echo -n b | iconv -f utf8 -t utf16le | od -x
    0000000 0062
    0000002

and we see that the b produces U+0062, which is correct.

Switch to the Arabic keyboard

    setxkbmap  ara

type “arrow up” to get the echo -n b | iconv -f utf8 -t utf16le | od -x line back, go back to the b with “arrow left”, type b and now one gets:

    echo -n ﻻ | iconv -f utf8 -t utf16le | od -x 
    0000000 fefb
    0000002

Comment 1 Mike FABIAN 2022-09-09 16:00:06 UTC
For ibus-typing-booster I fixed it like this (was surprisingly easy):

https://github.com/mike-fabian/ibus-typing-booster/issues/379
https://github.com/mike-fabian/ibus-typing-booster/commit/c788401c794843a6b99c91a51f9cb67b32ffc86e

I just had to allow other keys than Multi_key and dead keys to start a sequence and I had to add 0x01000000 to when calculating the key value of keysyms of the type <UXXXX>, that was all.

I hope it is equally easy in Gtk and ibus ...

Comment 2 Mike FABIAN 2022-09-11 18:02:15 UTC
The problem in ibus seems to be something different, sequences not starting with Multi_key and not with dead keys do work.
It is not the length of the sequence either, even sequences of length 1 work **if** they are written in ~/.XCompose 

So adding a ~/.XCompose like this makes typing the `b` key with the Arabic keyboard layout do the right thing:

$ cat ~/.XCompose
include "/%L"
<UFEFB>	:   "لا" # ARABIC LIGATURE LAM WITH ALEF
<U263A> : "single char compose sequence worked"
$ 

But that sequence is already defined in the system wide compose file:

$ grep '^<UFEFB>' /usr/share/X11/locale/en_US.UTF-8/Compose
<UFEFB>	:   "لا" # ARABIC LIGATURE LAM WITH ALEF

So it should work without having to add a ~/.XCompose file, but it doesn't.

Comment 3 Mike FABIAN 2022-09-12 07:21:23 UTC
The same trick works for Gtk3, I was confused for a moment because I happened to have a ~/.config/gtk-3.0/Compose file and then Gtk3 reads **only** that, see: https://docs.gtk.org/gtk3/class.IMContextSimple.html

But after I deleted ~/.config/gtk3/Compose, the sequences 

    $ cat ~/.XCompose
    include "/%L"
    <UFEFB>	:   "لا" # ARABIC LIGATURE LAM WITH ALEF
    <U263A> : "single char compose sequence worked"
    $ 
    
worked in

    env GTK_IM_MODULE=gtk-im-context-simple gedit

as well.

But it **still** does **not** work in Gtk4, although I made sure that ~/.config/gtk-4.0/Compose did not exist and confirmed that other compose sequences like 

    <Multi_key> <m> <o> <u> <s> <e> : "🐭" # U+1F42D MOUSE FACE

defined in ~/.XCompose worked with 

    env GTK_IM_MODULE=gtk-im-context-simple gnome-text-editor

and

    env GTK_IM_MODULE=gtk-im-context-simple gtk4-demo


the compose sequences

    <UFEFB>	:   "لا" # ARABIC LIGATURE LAM WITH ALEF
    <U263A> : "single char compose sequence worked"

still did not work with Gtk4.

Comment 4 fujiwara 2023-01-12 07:41:29 UTC
I think the issue is fixed in rawhide and I back port the fixes to ibus-1.5.27-11.1.fc37 in the copr repo.
https://copr.fedorainfracloud.org/coprs/fujiwara/ibus/

Could you please test it?

Comment 6 fujiwara 2023-01-12 10:15:34 UTC
Thank you for your quick tests.
Fedora rawhide has this fix but Fedora 37 does not and ibus copr 37 is available.