Bug 2122899 - [ar] ibus "m17n:ar:kbd" should be preferred Input Source for Arabic
Summary: [ar] ibus "m17n:ar:kbd" should be preferred Input Source for Arabic
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: gnome-desktop3
Version: 37
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Richard Hughes
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-31 08:40 UTC by AvidSeeker
Modified: 2023-08-21 08:34 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-21 08:24:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
GNOME Gitlab GNOME gnome-desktop issues 212 0 None opened default-input-sources.h: Arabic should default to "m17n:ar:kbd" 2022-09-01 13:29:59 UTC

Description AvidSeeker 2022-08-31 08:40:06 UTC
SUMMARY

Video: (https://i.imgur.com/mjz9xrc.mp4)

KDE sends [Arabic ligature glyphs](https://en.wikipedia.org/wiki/Arabic_alphabet#Ligatures) as a single glyph. For example, Laa+Alif ligature "لا" (U+0644, U+0627) is sent as "ﻻ" (U+FEFB), and similarly for (ﻷ، ﻵ، ﻹ).


STEPS TO REPRODUCE
1. Settings > Keyboard > Add default Arabic layout
2. Type "ﻻ" (i.e: "b" in QWERTY keyboards)

OBSERVED RESULT
Output is ﻻ (U+FEFB)

EXPECTED RESULT
Output is لا (U+0644, U+0627). 



SOFTWARE/OS VERSIONS
All latest update

ADDITIONAL INFORMATION
To clarify for English, this is like having a key to type a ligature. Say that you want to press "b" to type two characters: "fi" but instead you get "fi" as a single character. This is exactly what's happening with Arabic.

Comment 1 Jens Petersen 2022-08-31 11:57:16 UTC
Is this a regression relative to Fedora 36?

Perhaps you could attach a screenshot too, please.
(I am actually having trouble getting the imgur videos to play)

Comment 2 Mike FABIAN 2022-08-31 12:38:51 UTC
This is a limitation of the xkb keyboard layouts, each key can output only a single keysym, not more. So it is impossible to output U+0644, U+0627 with a single key, therefore the ligature U+FEFB was used in the layout as the workaround which comes closest to what one wants.

As an alternative which does not have this limitation I suggest to use /usr/share/m17n/ar-kbd.mim either with ibus-typing-booster or with ibus-m17n.

ar-kbd.mim simulates an Arabic keyboard layout on top of a US English layout as the basis.

It does not have this limitation of xkb.

$ grep '("b"' /usr/share/m17n/ar-kbd.mim 
  ("b" "لا")
$ grep '("b"' /usr/share/m17n/ar-kbd.mim | iconv -f utf-8 -t utf16le | od -x
0000000 0020 0020 0028 0022 0062 0022 0020 0022
0000020 0644 0627 0022 0029 000a
0000032

You can see the desired U+0644, U+0627 in the grep output converted to hexadecimal.

Comment 3 AvidSeeker 2022-08-31 15:42:20 UTC
(In reply to Jens Petersen from comment #1)
> Is this a regression relative to Fedora 36?

I don't think so. There's an old bug report related to this problem: https://bugzilla.redhat.com/show_bug.cgi?id=1076945.

> 
> Perhaps you could attach a screenshot too, please.
> (I am actually having trouble getting the imgur videos to play)

Here's another mirror for video: https://reddit.com/link/x1tqqe/video/er935s9wvwk91/player
Here's a screenshot: https://i.imgur.com/qXU3wgN.png

Comment 4 Mike FABIAN 2022-08-31 15:49:21 UTC
Yes, this problem has been there *forever*. And it cannot be fixed with the current implementation of xkb because this is limited to output only **one** keysym per key.

I doubt that this will ever be fixed in xkb, therefore I suggested  to use ibus-m17n or ibus-typing-booster with ar-kbd.mim and an English US keyboard layout. 

That works and it fixes the problem.

Comment 5 Jens Petersen 2022-08-31 16:00:03 UTC
(Okay, thanks, I just wanted to check that it is not a rendering issue,
since for example the newly added VazirMatn font in F37 seems to be
affecting Arabic rendering, in Firefox at least.)

Comment 6 AvidSeeker 2022-08-31 16:05:58 UTC
(In reply to Mike FABIAN from comment #2)
> This is a limitation of the xkb keyboard layouts, each key can output only a
> single keysym, not more. So it is impossible to output U+0644, U+0627 with a
> single key, therefore the ligature U+FEFB was used in the layout as the
> workaround which comes closest to what one wants.

Can you please elaborate more on how to use ibus-m17n?

Comment 7 AvidSeeker 2022-08-31 16:17:22 UTC
It could be easier if we can chat elsewhere (e.g: https://matrix.to/#/@avidseeker:matrix.org)

Comment 8 Jens Petersen 2022-08-31 16:22:14 UTC
1. Open gnome Setup (gnome-control-center)
2. Select the Keyboard pane
3. under Input Sources, click on "+" to add
4. choose Arabic
5. scroll to the bottom and select ar-kbd(m17n).
6. press Add
7. The input method is now available the Input Source menu (use Super+Period to switch).

If you still have trouble you can ask us in the matrix Fedora i18n room.

Comment 9 Jens Petersen 2022-08-31 16:23:54 UTC
We should make ar-kbd(m17n) appear first in the list.

Comment 10 AvidSeeker 2022-08-31 16:41:49 UTC
Beautiful. It worked well. Why isn't ar-kbd(m17n) the default, then?

and can you send the matrix room URL for Fedora i18n?

Comment 12 Mike FABIAN 2022-09-07 15:32:06 UTC
While discussing with AvidSeeker in Matrix we found very surprising new information. I wrote it up in detail here:

https://github.com/mike-fabian/ibus-typing-booster/issues/379

In short:

/usr/share/X11/locale/en_US.UTF-8/Compose

contains

<UFEFB>	:   "لا" # ARABIC LIGATURE LAM WITH ALEF

replacing the U+FEFB produced by the Arabic keyboard with U+0644 U+0627.

But this hack works **only** in the original compose support of X11, it does **not** work in the Compose support in Gtk3, Gtk4, ibus, and ibus-typing-booster.

If we could improve these Compose implementations, Arabic XKB keyboard layout would work fine.

Comment 13 Mike FABIAN 2022-09-09 19:49:33 UTC
For ibus-typing-booster I fixed it like this (was surprisingly easy):

https://github.com/mike-fabian/ibus-typing-booster/issues/379
https://github.com/mike-fabian/ibus-typing-booster/commit/c788401c794843a6b99c91a51f9cb67b32ffc86e

I just had to allow other keys than Multi_key and dead keys to start a sequence and I had to add 0x01000000 to when calculating the key value of keysyms of the type <UXXXX>, that was all.

I hope it is equally easy in Gtk and ibus ...

Comment 14 Michael Catanzaro 2022-09-10 02:02:12 UTC
Do you still want to change the default input method, then?

Comment 15 Mike FABIAN 2022-09-10 06:50:29 UTC
I am inclined not to change it *if* we can fix the problem in ibus and gtk. I think that should be the top priority and  the thing to do first.

But I am also confused about the current contents of default-input-sources.h, we might need to do changes there as well, I am not sure:

As Jens wrote, currently it contains only an entry for ar_DZ:

{ "ar_DZ",    "xkb",          "ara+azerty" },

glibc has these Arabic locales:

ar_AE.utf8 ar_BH.utf8 ar_DZ.utf8 ar_EG.utf8 ar_IN.utf8 ar_IQ.utf8 ar_JO.utf8 ar_KW.utf8 ar_LB.utf8 ar_LY.utf8 ar_MA.utf8 ar_OM.utf8 ar_QA.utf8 ar_SA.utf8 ar_SD.utf8 ar_SS.utf8 ar_SY.utf8 ar_TN.utf8 ar_YE.utf8

With only a line for ar_DZ in default-input-sources.h, what would be the default for the others?

Maybe no default is necessary because anaconda already sets the keyboard layout?

But why is ar_DZ there then in default-input-sources.h specifying a keyboard layout? Shouldn’t this be anaconda’s job as well then?

And another problem for ar_DZ, it specifies the “AZERTY” variation of the Arabic layout (because of Algeria’s past as a French colony?) and the ar-kbd.mim is based on the English US QWERTY layout. So we *cannot* replace "ara+azerty" with "m17n:ar:kbd" because this would change it to need a QWERTY layout underneath.

Comment 16 Mike FABIAN 2022-09-10 07:00:34 UTC
Here are the keyboard layouts anaconda currently uses for the Arabic locales (anaconda uses langtable for this):

$ python3
Python 3.10.6 (main, Aug  2 2022, 00:00:00) [GCC 12.1.1 20220507 (Red Hat 12.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import langtable
>>> for lang in ("ar_AE", "ar_BH", "ar_DZ", "ar_EG", "ar_IN", "ar_IQ", "ar_JO", "ar_KW", "ar_LB", "ar_LY", "ar_MA", "ar_OM", "ar_QA", "ar_SA", "ar_SD", "ar_SS", "ar_SY", "ar_TN", "ar_YE"):
...     print(f'Layout for {lang} is {langtable.list_keyboards(languageId=lang)}')
... 

Layout for ar_AE is ['ara']
Layout for ar_BH is ['ara']
Layout for ar_DZ is ['ara(azerty)']
Layout for ar_EG is ['ara']
Layout for ar_IN is ['in(eng)', 'ara', 'ara(azerty)', 'iq', 'ma', 'sy']
Layout for ar_IQ is ['iq']
Layout for ar_JO is ['ara']
Layout for ar_KW is ['ara']
Layout for ar_LB is ['ara']
Layout for ar_LY is ['ara']
Layout for ar_MA is ['ma']
Layout for ar_OM is ['ara']
Layout for ar_QA is ['ara']
Layout for ar_SA is ['ara']
Layout for ar_SD is ['ara']
Layout for ar_SS is ['ara']
Layout for ar_SY is ['sy']
Layout for ar_TN is ['ara']
Layout for ar_YE is ['ara']

Comment 17 Mike FABIAN 2022-09-10 07:09:17 UTC
So anaconda/langtable also set 'ara(azerty)' for ar_DZ just like default-input-sources.h

'iq' is actually the same as 'ara', /usr/share/X11/xkb/symbols/iq starts with:

    // Iraque keyboard layout,
    
    // 3-Level layout
    
    default partial alphanumeric_keys
    xkb_symbols "basic" {
        include "ara(basic)"
        name[Group1]= "Iraqi";
    };

'sy' is also the same as 'ara', /usr/share/X11/xkb/symbols/sy starts with:

    default partial alphanumeric_keys
    xkb_symbols "basic" {
        include "ara(basic)"
        name[Group1]= "Arabic (Syria)";
    };

'ma' is the same as 'ara(azerty)',  /usr/share/X11/xkb/symbols/ma starts with:

// Arabic AZERTY with modern Latin digits 
default partial alphanumeric_keys
xkb_symbols "arabic" {
    include "ara(azerty)"

    name[Group1]="Arabic (Morocco)";
};

'ar_IN' defaults to 'in(eng)',  i.e. the Indian English layout. I wonder whether that is a mistake which I should fix in langtable.

Comment 18 Mike FABIAN 2022-09-10 07:33:23 UTC
*If* we cannot fix ibus and Gtk to handle the Arabic compose sequence correctly *then* we probably need to change default-input-sources.h and we could use "m17n:ar:kbd" for most (or even all) Arabic locales but we would need to check carefully the differences between these layouts and maybe even create new variants of "m17n:ar:kbd" (Because the current "m17n:ar:kbd" would not work right if the underlying layout is not US English, so maybe a variant which would work on top of a French AZERTY layout would need to be added).

*And* langtable/anaconda would need to set the US English layout for Arabic locales then, assuming that the "m17n:ar:kbd" imput method on top of that would handle the Arabic.

Also, the current 'ara' Arabic keyboard layout uses western digits in the number row:

    key <AE01> {  [               1,               exclam,                  Arabic_1,            NoSymbol ]};  // 1 !     ١


The Arabic digit one is only produced by typing AltGr+1.

This is different from what "m17n:ar:kbd" does, /usr/share/m17n/ar-kbd.mim contains:

(map
 (generic
  ("1" "١")
  ("2" "٢")
  ("3" "٣")
  ("4" "٤")
  ("5" "٥")
  ("6" "٦")
  ("7" "٧")
  ("8" "٨")
  ("9" "٩")
  ("0" "٠")

i.e. it replaces the Western digit "1" *always* with the Arabic digit "١", when using "m17n:ar:kbd" you cannot type Western digits anymore, which might be a problem.

Comment 19 Mike FABIAN 2022-09-10 07:38:29 UTC
Trying to change default-input-sources.h using m17n input methods to work around bugs in the Arabic keyboard support in ibus and Gtk seems to open another can of worms, it does not look easy. 

If ibus and Gtk are fixed, we probably don’t need to change default-input-sources.h anymore.

The "normal" Arabic keyboard layouts "ara" and "ara(azerty)" should just made to work correctly, it is weird to have these keyboard layouts in a broken state.

Comment 20 Mike FABIAN 2022-09-13 09:33:17 UTC
Matthias Clasen fixed it already for Gtk4: https://gitlab.gnome.org/GNOME/gtk/-/issues/5172 😄

Comment 21 Mike FABIAN 2022-09-19 13:12:38 UTC
(In reply to Michael Catanzaro from comment #14)
> Do you still want to change the default input method, then?

I think we should close this bug here and do all necessary fixes to get the compose support working instead.

Comment 22 Luna Jernberg 2023-08-21 08:23:15 UTC
This was closed today in todays Fedora i18n meeting

Comment 23 Mike FABIAN 2023-08-21 08:24:01 UTC
ibus-1.5.28 now supports the single char Arabic compose sequences. So we should **not** change the default input method for Arabic. 

Closing this bug.

Comment 24 Mike FABIAN 2023-08-21 08:34:58 UTC
It is still broken in Gtk3 and Gtk4 though, so if Arabic users don’t add any input sources, then gtk-im-context-simple will be used and that still has the bug.

See: https://gitlab.gnome.org/GNOME/gtk/-/issues/5172#note_1811070


Note You need to log in before you can comment on or make changes to this bug.