In Fedora 34: $ python3 Python 3.9.5 (default, May 14 2021, 00:00:00) [GCC 11.1.1 20210428 (Red Hat 11.1.1-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import enchant >>> d = enchant.Dict("cs_CZ") >>> d.suggest('Praha') ['Praha', 'Prah', 'Praho', 'Prahu', 'Prahy', 'Vraha', 'Prah a', 'Prha', 'Prahla', 'Prahna', 'Prahů', 'Prahl', 'Prala'] >>> $ cat /etc/fedora-release Fedora release 34 (Thirty Four) $ rpm -q python3-enchant python3-enchant-3.2.0-3.fc34.noarch $ rpm -q enchant enchant-1.6.0-27.fc34.x86_64 $ rpm -q hunspell-cs hunspell-cs-20080822-14.fc34.noarch In rawhide: bash-5.1# python3 Python 3.10.0b2 (default, Jun 1 2021, 00:00:00) [GCC 11.1.1 20210531 (Red Hat 11.1.1-3)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import enchant >>> d = enchant.Dict("cs_CZ") >>> d.suggest('Praha') [] >>> bash-5.1# cat /etc/fedora-release Fedora release 35 (Rawhide) bash-5.1# rpm -q python3-enchant python3-enchant-3.2.0-4.fc35.noarch bash-5.1# rpm -q enchant enchant-1.6.0-27.fc34.x86_64 bash-5.1# rpm -q hunspell-cs hunspell-cs-20080822-14.fc34.noarch bash-5.1# For some other languages it still seems to work in rawhide, for example for German: bash-5.1# python3 Python 3.10.0b2 (default, Jun 1 2021, 00:00:00) [GCC 11.1.1 20210531 (Red Hat 11.1.1-3)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import enchant >>> d = enchant.Dict("de_DE") >>> d.suggest('Prag') ['Prags', 'Trag', 'Frag', 'Prag', '-prag'] >>>
Installing glibc-gconv-extra makes this work. See also bz1973663. I think this is a problem in glibc packaging. Florian, I think this packaging change in glibc needs more communication in Fedora. WDYT?
What does this line do? >>> d = enchant.Dict("cs_CZ") In particular I'm interested in the meaning of the string "cs_CZ". Is it a locale identifier? What does it come from? As a locale identifier, cs_CZ refers to a non-UTF-8 locale: $ LC_ALL=cs_CZ locale -k charmap charmap="ISO-8859-2" I'm not aware how to set up a system with such a locale (anaconda will use cs_CZ.utf8 if instructed, I believe). Furthermore, a default installation follows weak dependencies and will therefore include the glibc-gconv-extra package. If this change is too disruptive, maybe we need to revert it.
> If this change is too disruptive, maybe we need to revert it. I wouldn't go as far as reverting it (no yet anyway), but announcing it loudly on devel(-announce) might be a good idea.
(In reply to Florian Weimer from comment #2) > What does this line do? > > >>> d = enchant.Dict("cs_CZ") > > In particular I'm interested in the meaning of the string "cs_CZ". Is it a > locale identifier? What does it come from? I believe it is a hunspell dictionary identifier. I.e. it comes from: /usr/share/myspell/cs_CZ.aff /usr/share/myspell/cs_CZ.dic
(In reply to Miro Hrončok from comment #4) > (In reply to Florian Weimer from comment #2) > > What does this line do? > > > > >>> d = enchant.Dict("cs_CZ") > > > > In particular I'm interested in the meaning of the string "cs_CZ". Is it a > > locale identifier? What does it come from? > > I believe it is a hunspell dictionary identifier. I.e. it comes from: > > /usr/share/myspell/cs_CZ.aff > /usr/share/myspell/cs_CZ.dic The file starts with: SET ISO8859-2 Presumably that means it uses this charset. This is very worrying. When making this change, we assumed that all distribution files would be encoded as UTF-8.
Unfortunately many of the hunspell dictionaries seem to be not yet in UTF-8. It should be rather easy to convert them to UTF-8 though. Probably we should do that.
$ grep ^SET /usr/share/myspell/*.aff | grep -c UTF-8 135 mfabian@taka:~ $ grep ^SET /usr/share/myspell/*.aff | grep -c -v UTF-8 98 mfabian@taka:~ $
(In reply to Florian Weimer from comment #2) > I'm not aware how to set up a system with such a locale (anaconda will use > cs_CZ.utf8 if instructed, I believe). Furthermore, a default installation > follows weak dependencies and will therefore include the glibc-gconv-extra > package. This really is the key question. Perhaps the dependency on glibc-all-langpack is not sufficient and maybe we need a stronger dependency. Mike could you please describe how you installed this system? (In reply to Miro Hrončok from comment #1) > Florian, I think this packaging change in glibc needs more communication in > Fedora. WDYT? I'll send out an email but before I'd like to know how we end up with such an installation because if it is common enough then maybe it's not feasible to keep this split and the announcement would be pointless.
(In reply to Siddhesh Poyarekar from comment #8) > I'll send out an email but before I'd like to know how we end up with such > an installation because if it is common enough then maybe it's not feasible > to keep this split and the announcement would be pointless. Well I sent this out anyway because I discovered a use case (bug 1974466) that needs an explicit BuildRequires on only glibc-gconv-extra and not glibc-*langpacks. https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/JQTTSNHMSFV63KIDDPW4M7WV7CI6KZYW/
(In reply to Siddhesh Poyarekar from comment #8) > (In reply to Florian Weimer from comment #2) > > I'm not aware how to set up a system with such a locale (anaconda will use > > cs_CZ.utf8 if instructed, I believe). Furthermore, a default installation > > follows weak dependencies and will therefore include the glibc-gconv-extra > > package. > > This really is the key question. Perhaps the dependency on > glibc-all-langpack is not sufficient and maybe we need a stronger > dependency. Mike could you please describe how you installed this system? This was a buildroot. I was building ibus-typing-booster for rawhide with “fedpkg mockbuild”. Then a test case for python3-enchant for Czech failed. I did a chroot into the buildroot and investigated and found that python3-enchant didn’t work at all for Czech. I didn’t understand why not, therefore I reported this bug. After adding # to make the python3-enchant test work for hunspell dictionaries which are not yet UTF-8: BuildRequires: glibc-gconv-extra it works again.
(In reply to Mike FABIAN from comment #10) > (In reply to Siddhesh Poyarekar from comment #8) > > (In reply to Florian Weimer from comment #2) > > > I'm not aware how to set up a system with such a locale (anaconda will use > > > cs_CZ.utf8 if instructed, I believe). Furthermore, a default installation > > > follows weak dependencies and will therefore include the glibc-gconv-extra > > > package. > > > > This really is the key question. Perhaps the dependency on > > glibc-all-langpack is not sufficient and maybe we need a stronger > > dependency. Mike could you please describe how you installed this system? > > This was a buildroot. Ahh! > After adding > > # to make the python3-enchant test work for hunspell dictionaries which are > not yet UTF-8: > BuildRequires: glibc-gconv-extra > > it works again. Using glibc-all-langpacks should work as well. For non-English language processing, that's a good addition for other reasons as well. If redhat-rpm-config is installed, it will also install glibc-gconv-extra.
I've modified the dependency on glibc-gconv-extra to Requires until we can figure out a nicer solution that does not need all packages to add the dependency. The next build should fix this.
This is fixed on rawhide.