The compose support in ibus has the same problem, see: https://bugzilla.redhat.com/show_bug.cgi?id=2125153 The following compose sequences do not work in ibus: $ grep -i arabic.*ligature /usr/share/X11/locale/en_US.UTF-8/Compose # Arabic Lam-Alef ligatures <UFEFB> : "لا" # ARABIC LIGATURE LAM WITH ALEF <UFEF7> : "لأ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE <UFEF9> : "لإ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA BELOW <UFEF5> : "لآ" # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE They are needed because the Arabic keyboard layout outputs UFEFB on some key: $ grep -i fefb /usr/share/X11/xkb/symbols/ara key <AB05> { [ UFEFB, UFEF5, NoSymbol, NoSymbol ]}; // ﻻ ﻵ key <AB05> { [ UFEFB, UFEF5, U06AB, U06AD ]}; // ﻻ ﻵ ګ ڭ but the UFEFB characters is not what is desired, what one really wants is U+0644 U+0627. But xkb keyboard layouts can only output one keysym when typing a key, not two. So compose was used as a hack to work around this limitation of xkb: The keyboard produces UFEFB and then the compose support replaces this with U+0644 U+0627. This works when the compose support in Xorg is used but not when the compose support in ibus is used. How to reproduce: 1) First show that it works when using the Xorg compose support: Start xterm like this (to disable ibus and use the Xorg compose support): env XMODIFIERS=@im=none xterm & Then in the xterm, type $ echo -n b | iconv -f utf8 -t utf16le | od -x 0000000 0062 0000002 and we see that the b produces U+0062, which is correct. Switch to the Arabic keyboard setxkbmap ara type “arrow up” to get the echo -n b | iconv -f utf8 -t utf16le | od -x line back, go back to the b with “arrow left”, type b and now one gets: echo -n لا | iconv -f utf8 -t utf16le | od -x 0000000 0644 0627 0000004 I.e. even though the keyboard surely outputs only U+FEFB, the Compose support of Xorg transforms this into U+0644 U+0627 2) Now try to do this in gedit (Gtk3) and gnome-text-editor (Gtk4) using the Gtk compose support Gtk3: start gedit like this: env GTK_IM_MODULE=gtk-im-context-simple gedit Gtk4: start gnome-text-editor like this: env GTK_IM_MODULE=gtk-im-context-simple gnome-text-editor Switch to the Arabic keyboard layout and type the key which has the label `b` on the US English layout. In gedit and gnome-text-editor you get the character ﻻ U+FEFB ARABIC LIGATURE LAM WITH ALEF ISOLATED FORM You can see that this is one character by moving the cursor over the character and you see that only one cursor step is needed when moving over the character and when typing Backspace the whole character goes away at once. The correct result would be the sequence ل U+0644 ARABIC LETTER LAM ا U+0627 ARABIC LETTER ALEF which looks like: لا I.e. this looks very similar to ﻻ U+FEFB ARABIC LIGATURE LAM WITH ALEF ISOLATED FORM but it is two characters instead of one, you notice that when moving the cursor over it (two arrow keys are needed to move over that sequence) and when deleting with one Backspace, you delete only ا U+0627 ARABIC LETTER ALEF and ل U+0644 ARABIC LETTER LAM remains. I tried to look at the code for Compose support in Gtk4 and did not understand why it does not work. My guess is, that only Compose sequences starting with either Multi_key or a dead key work and Compose sequences starting with other characters do not work. But I could not understand when browsing the code why it is like that. But using the above Arabic example it is easy to reproduce that it does not work.
Related old bugs: https://bugzilla.gnome.org/show_bug.cgi?id=537457 Bug 537457 - Support compose sequences that produce two+ codepoints https://gitlab.gnome.org/GNOME/gtk/-/issues/186 Support compose sequences that produce two+ codepoints
For ibus-typing-booster I fixed it like this (was surprisingly easy): https://github.com/mike-fabian/ibus-typing-booster/issues/379 https://github.com/mike-fabian/ibus-typing-booster/commit/c788401c794843a6b99c91a51f9cb67b32ffc86e I just had to allow other keys than Multi_key and dead keys to start a sequence and I had to add 0x01000000 to when calculating the key value of keysyms of the type <UXXXX>, that was all. I hope it is equally easy in Gtk and ibus ...
<UFEFB> <UFEFB> : "لا" # ARABIC LIGATURE LAM WITH ALEF This sequence actually works with ibus. Doesn’t make sense of course because it makes you press the 'b' key twice to get the desired result of U+0644 U+0627. But it shows the the problem in ibus is not that it cannot handle sequence starting with other keys than Multi_key and dead keys and the problems is also not that it cannot handle <UXXXX> keysyms. The problem in ibus seems to be that sequences which consist of only one single keysym don’t work. https://github.com/ibus/ibus/blob/main/src/ibuscomposetable.c#L198 has: if (g_strv_length (words) < 2) { g_warning ("too few words; key sequence format is <a> <b>...: %s", line); goto fail; } Hm, maybe that is why sequences of length 1 don’t work?
No, if (g_strv_length (words) < 2) { is not the reason why sequences of length 1 don't work. I didn’t read carefully: char **words = g_strsplit (seq, "<", -1); int i; int n = 0; if (g_strv_length (words) < 2) { So g_strv_length (words) is already >= 2 if there is at least one "<" in the sequence.
I found that the following workaround works for Gtk3 (and ibus) but **not** for Gtk4: - create a ~/.XCompose file (and make sure ~/.config/gtk-3.0/Compose does not exist so ~/.XCompose is really read!) with the following contents: $ cat ~/.XCompose include "/%L" <UFEFB> : "لا" # ARABIC LIGATURE LAM WITH ALEF - Now try to reproduce as described in comment#0 : https://bugzilla.redhat.com/show_bug.cgi?id=2125163#c0 The bug is now "fixed" for Gtk3 and ibus (but not for Gtk4).
Fujiwara San found this suspicious line in Gtk4: https://gitlab.gnome.org/GNOME/gtk/-/blob/main/gtk/gtkcomposetable.c#L507 if (sequence[1] == 0) { remove_sequence = TRUE; goto next; } Looks like this is removing compose sequences with a length of only 1 key.
Yes, sequences of length 1 are not useful. Except for this hack that I knew nothing about. And it only works recently, since I added support for multi-character results.
Really ugly to use Compose sequences to patch up deficiencies in Unicode and xkb. If all you have is a hammer...
Its not trivial to fix either, because the table compression we use relies on splitting off the first key from the rest of the sequence. For single-key sequences, there's no rest...
(In reply to Matthias Clasen from comment #7) > Yes, sequences of length 1 are not useful. Except for this hack that I knew > nothing about. And it only works recently, since I added support for > multi-character results. In the X11 compose support it works as well, apparently for a long time already.
(In reply to Matthias Clasen from comment #8) > Really ugly to use Compose sequences to patch up deficiencies in Unicode and > xkb. If all you have is a hammer... Yes, I agree. I was extremely surprised when I discovered this hack. When I first looked at the original bug that the Arabic keyboard doesn’t work well: https://bugzilla.redhat.com/show_bug.cgi?id=2122899 I thought that this was unfixable because xkb can only output one char per keystroke. https://freedesktop.org/wiki/Software/XKeyboardConfig/XKB2Dreams/ (Random collection of ideas related to XKB2) mentions: 3. upport for scenarios "multiple keypresses - one keysym" and "single keypress - multiple keysyms". But that is a "dream" which might never be really implemented. Very unlikely any time soon I guess. So as I thought there was no way to fix this, I suggested using the ar-kbd input methods with ibus-m17n instead as a workaround. But that has its own problems: - Does not help if the Arabic user uses an AZERTY variant of the Arabic layout - typing Western digits 0, 1, 2, ... becomes impossible, only Arabic digits can be typed with ar-kbd (The Arabic xkb layouts can type both Western and Arabic digits) Therefore, when I accidentally discovered this hack, I thought it was better to make this hack work than to change the default input methods for all Arabic locales.
https://bugzilla.redhat.com/show_bug.cgi?id=2125163
Grr, mispaste. What I wanted to say: Fixed upstream in gtk4
I think this is fixed in Fedora 37.
I tested https://download.copr.fedorainfracloud.org/results/fujiwara/ibus/fedora-rawhide-x86_64/05219560-ibus/ibus-1.5.27-11.1.fc38.x86_64.rpm in a Fedora rawhide VM and it fixes the problem.
Now I also tested https://download.copr.fedorainfracloud.org/results/fujiwara/ibus/fedora-37-x86_64/05219560-ibus/ibus-1.5.27-11.1.fc37.x86_64.rpm on Fedora 37, it fixes the problem there as well.
comment#15 and comment#16 were intended for https://bugzilla.redhat.com/show_bug.cgi?id=2125153 ibus had the same problem with Arabic compose sequences, now it is fixed in ibus as well.