Bug 1974452 - python3-enchant doesn’t work in rawhide for Czech
Summary: python3-enchant doesn’t work in rawhide for Czech
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: python-enchant
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Charalampos Stratakis
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-21 17:44 UTC by Mike FABIAN
Modified: 2021-08-03 19:41 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2021-08-03 19:41:30 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Mike FABIAN 2021-06-21 17:44:55 UTC
In Fedora 34:

    $ python3 
    Python 3.9.5 (default, May 14 2021, 00:00:00) 
    [GCC 11.1.1 20210428 (Red Hat 11.1.1-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import enchant
    >>> d = enchant.Dict("cs_CZ")
    >>> d.suggest('Praha')
    ['Praha', 'Prah', 'Praho', 'Prahu', 'Prahy', 'Vraha', 'Prah a', 'Prha', 'Prahla', 'Prahna', 'Prahů', 'Prahl', 'Prala']
    >>> 
    $ cat /etc/fedora-release
    Fedora release 34 (Thirty Four)
    $ rpm -q python3-enchant
    python3-enchant-3.2.0-3.fc34.noarch
    $ rpm -q enchant
    enchant-1.6.0-27.fc34.x86_64
    $ rpm -q hunspell-cs
    hunspell-cs-20080822-14.fc34.noarch

In rawhide:

    bash-5.1# python3
    Python 3.10.0b2 (default, Jun  1 2021, 00:00:00) [GCC 11.1.1 20210531 (Red Hat 11.1.1-3)] on 
    linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import enchant
    >>> d = enchant.Dict("cs_CZ")
    >>> d.suggest('Praha')
    []
    >>> 
    bash-5.1# cat /etc/fedora-release 
    Fedora release 35 (Rawhide)
    bash-5.1# rpm -q python3-enchant
    python3-enchant-3.2.0-4.fc35.noarch
    bash-5.1# rpm -q enchant
    enchant-1.6.0-27.fc34.x86_64
    bash-5.1# rpm -q hunspell-cs
    hunspell-cs-20080822-14.fc34.noarch
    bash-5.1#

For some other languages it still seems to work in rawhide, for example for German:

    bash-5.1# python3
    Python 3.10.0b2 (default, Jun  1 2021, 00:00:00) [GCC 11.1.1 20210531 (Red Hat 11.1.1-3)] on 
    linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import enchant
    >>> d = enchant.Dict("de_DE")
    >>> d.suggest('Prag')
    ['Prags', 'Trag', 'Frag', 'Prag', '-prag']
    >>>

Comment 1 Miro Hrončok 2021-06-21 17:57:40 UTC
Installing glibc-gconv-extra makes this work. See also bz1973663. I think this is a problem in glibc packaging.

Florian, I think this packaging change in glibc needs more communication in Fedora. WDYT?

Comment 2 Florian Weimer 2021-06-21 18:04:39 UTC
What does this line do?

>>> d = enchant.Dict("cs_CZ")

In particular I'm interested in the meaning of the string "cs_CZ". Is it a locale identifier? What does it come from?

As a locale identifier, cs_CZ refers to a non-UTF-8 locale:

$ LC_ALL=cs_CZ locale -k charmap
charmap="ISO-8859-2"

I'm not aware how to set up a system with such a locale (anaconda will use cs_CZ.utf8 if instructed, I believe). Furthermore, a default installation follows weak dependencies and will therefore include the glibc-gconv-extra package.

If this change is too disruptive, maybe we need to revert it.

Comment 3 Miro Hrončok 2021-06-21 18:09:56 UTC
> If this change is too disruptive, maybe we need to revert it.

I wouldn't go as far as reverting it (no yet anyway), but announcing it loudly on devel(-announce) might be a good idea.

Comment 4 Miro Hrončok 2021-06-21 18:13:15 UTC
(In reply to Florian Weimer from comment #2)
> What does this line do?
> 
> >>> d = enchant.Dict("cs_CZ")
> 
> In particular I'm interested in the meaning of the string "cs_CZ". Is it a
> locale identifier? What does it come from?

I believe it is a hunspell dictionary identifier. I.e. it comes from:

/usr/share/myspell/cs_CZ.aff
/usr/share/myspell/cs_CZ.dic

Comment 5 Florian Weimer 2021-06-21 18:20:59 UTC
(In reply to Miro Hrončok from comment #4)
> (In reply to Florian Weimer from comment #2)
> > What does this line do?
> > 
> > >>> d = enchant.Dict("cs_CZ")
> > 
> > In particular I'm interested in the meaning of the string "cs_CZ". Is it a
> > locale identifier? What does it come from?
> 
> I believe it is a hunspell dictionary identifier. I.e. it comes from:
> 
> /usr/share/myspell/cs_CZ.aff
> /usr/share/myspell/cs_CZ.dic

The file starts with:

SET ISO8859-2

Presumably that means it uses this charset. This is very worrying. When making this change, we assumed that all distribution files would be encoded as UTF-8.

Comment 6 Mike FABIAN 2021-06-21 21:56:16 UTC
Unfortunately many of the hunspell dictionaries seem to be not yet in UTF-8.

It should be rather easy to convert them to UTF-8 though.
Probably we should do that.

Comment 7 Mike FABIAN 2021-06-21 21:59:13 UTC
$ grep ^SET /usr/share/myspell/*.aff | grep -c UTF-8 
135
mfabian@taka:~
$ grep ^SET /usr/share/myspell/*.aff | grep -c -v UTF-8 
98
mfabian@taka:~
$

Comment 8 Siddhesh Poyarekar 2021-06-22 01:53:24 UTC
(In reply to Florian Weimer from comment #2)
> I'm not aware how to set up a system with such a locale (anaconda will use
> cs_CZ.utf8 if instructed, I believe). Furthermore, a default installation
> follows weak dependencies and will therefore include the glibc-gconv-extra
> package.

This really is the key question.  Perhaps the dependency on glibc-all-langpack is not sufficient and maybe we need a stronger dependency.  Mike could you please describe how you installed this system? 

(In reply to Miro Hrončok from comment #1)
> Florian, I think this packaging change in glibc needs more communication in
> Fedora. WDYT?

I'll send out an email but before I'd like to know how we end up with such an installation because if it is common enough then maybe it's not feasible to keep this split and the announcement would be pointless.

Comment 9 Siddhesh Poyarekar 2021-06-22 02:51:11 UTC
(In reply to Siddhesh Poyarekar from comment #8)
> I'll send out an email but before I'd like to know how we end up with such
> an installation because if it is common enough then maybe it's not feasible
> to keep this split and the announcement would be pointless.

Well I sent this out anyway because I discovered a use case (bug 1974466) that needs an explicit BuildRequires on only glibc-gconv-extra and not glibc-*langpacks.

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/JQTTSNHMSFV63KIDDPW4M7WV7CI6KZYW/

Comment 10 Mike FABIAN 2021-06-22 07:43:05 UTC
(In reply to Siddhesh Poyarekar from comment #8)
> (In reply to Florian Weimer from comment #2)
> > I'm not aware how to set up a system with such a locale (anaconda will use
> > cs_CZ.utf8 if instructed, I believe). Furthermore, a default installation
> > follows weak dependencies and will therefore include the glibc-gconv-extra
> > package.
> 
> This really is the key question.  Perhaps the dependency on
> glibc-all-langpack is not sufficient and maybe we need a stronger
> dependency.  Mike could you please describe how you installed this system? 

This was a buildroot.

I was building ibus-typing-booster for rawhide with “fedpkg mockbuild”.
Then a test case for python3-enchant for Czech failed.
I did a chroot into the buildroot and investigated and found that python3-enchant didn’t work at all for Czech.
I didn’t understand why not, therefore I reported  this bug.
After adding 

# to make the python3-enchant test work for hunspell dictionaries which are not yet UTF-8:
BuildRequires:   glibc-gconv-extra

it works again.

Comment 11 Florian Weimer 2021-06-22 07:46:53 UTC
(In reply to Mike FABIAN from comment #10)
> (In reply to Siddhesh Poyarekar from comment #8)
> > (In reply to Florian Weimer from comment #2)
> > > I'm not aware how to set up a system with such a locale (anaconda will use
> > > cs_CZ.utf8 if instructed, I believe). Furthermore, a default installation
> > > follows weak dependencies and will therefore include the glibc-gconv-extra
> > > package.
> > 
> > This really is the key question.  Perhaps the dependency on
> > glibc-all-langpack is not sufficient and maybe we need a stronger
> > dependency.  Mike could you please describe how you installed this system? 
> 
> This was a buildroot.

Ahh!

> After adding 
> 
> # to make the python3-enchant test work for hunspell dictionaries which are
> not yet UTF-8:
> BuildRequires:   glibc-gconv-extra
> 
> it works again.

Using glibc-all-langpacks should work as well. For non-English language processing, that's a good addition for other reasons as well. If redhat-rpm-config is installed, it will also install glibc-gconv-extra.

Comment 12 Siddhesh Poyarekar 2021-06-25 02:35:30 UTC
I've modified the dependency on glibc-gconv-extra to Requires until we can figure out a nicer solution that does not need all packages to add the dependency. The next build should fix this.

Comment 13 Charalampos Stratakis 2021-08-03 19:41:30 UTC
This is fixed on rawhide.


Note You need to log in before you can comment on or make changes to this bug.