Red Hat Bugzilla – Bug 1316359
grep update broke German spell checking
Last modified: 2016-04-13 17:35:11 EDT
Updating from hunspell-de-0.20151222-2.fc24 to hunspell-de-0.20151222-3.fc24 breaks German spell checking.
Looking at the rpm files there is a big difference in file sizes. The .dic files in 0.20151222-3 are ~45kB while those in 0.20151222-2 are ~1.1MB. So probably large parts of the dictionaries are missing.
You are right, file size is dropped to just ~45kB. What I did now is just tried to scratch build older -2 release for F24 and see here http://koji.fedoraproject.org/koji/taskinfo?taskID=13292731 the results are same. so something required to build this package is causing this problem. The update change looks okay to me.
I quickly looked into this and noticed a small difference between
https://kojipkgs.fedoraproject.org//packages/hunspell-de/0.20151222/2.fc24/data/logs/noarch/build.log (the old build that works) and https://kojipkgs.fedoraproject.org//packages/hunspell-de/0.20151222/3.fc24/data/logs/noarch/build.log (the one that does not). In the latter one the areas with "Capital prefixes will be expanded: " are way shorter. That might be symptom only, but maybe that does ring a bell to anyone here…
if i build the rawhide package with "fedpkg local" on F23, it works.
with "fedpkg mockbuild --no-clean-all" it fails.
a diff contains a lot of stuff like:
+Binary file (standard input) matches
+Binary file hunspell-capmain-small_de_AT.tmp.tmp matches/m
smells like something invokes grep and rawhide grep thinks input is binary, some of the time.
apparently the problem is in bin/hunspell-capmain ...
on Fedora 23 this results in the expected output:
LC_ALL=C LANG=C grep "^[A-ZÄÖÜÉ]" igerman98-20151222/hunspell-capmain-small_de_AT.tmp.tmp
whereas using rawhide's grep produces the "Binary file hunspell-capmain-small_de_AT.tmp.tmp matches" at the first line with a non-ASCII char, "Praliné/S":
LC_ALL=C LANG=C /var/lib/mock/fedora-rawhide-x86_64/root/usr/bin/grep "^[A-ZÄÖÜÉ]" igerman98-20151222/hunspell-capmain-small_de_AT.tmp.tmp
so apparently the interpretation of the "C" locale changed between
grep-2.22-6.fc23.x86_64 and grep-2.24-1.fc25.x86_64
* Noteworthy changes in release 2.23 (2016-02-04) [stable]
** Bug fixes
Binary files are now less likely to generate diagnostics and more
likely to yield text matches. grep now reports "Binary file FOO
matches" and suppresses further output instead of outputting a line
containing an encoding error; hence grep can now report matching text
before a later binary match. Formerly, grep reported FOO to be
binary when it found an encoding error in FOO before generating
output for FOO, which meant it never reported both matching text and
matching binary data; this was less useful for searching text
containing encoding errors in non-matching lines.
[bug introduced in grep-2.21]
grep no longer outputs encoding errors in unibyte locales.
For example, if the byte '\x81' is not a valid character in a
unibyte locale, grep treats the byte as binary data.
[bug introduced in grep-2.21]
... sounds like an intentional change, and the build worked by accident before.
try to fix it by forcing more use of en_US.iso885915 locale.
> Fixed In Version: hunspell-de-0.20151222-4.fc24
Works for me. Many thx for taking care of this!
The C locale is the fallback locale and should always give reasonable results. The result that GNU grep gives back does not seem to be in line whaty you should expect according to http://pubs.opengroup.org/onlinepubs/009604499/utilities/grep.html
Setting some legacy locale to be able to work with iso8895-1 encoded files will only work when the given locale definitions exist on the system. But you cannot assume any locale definition to pre-exist except for C.
# echo -e "test\ntäst" | iconv -f utf8 -t latin1 | LC_ALL=C grep st ; echo $?
Binary file (standard input) matches
IMHO GNU grep ist heavily broken here. I will require a different grep implementation with the next igerman98 release. You might also switch to use busybox grep, which seems to work better.
(In reply to Thorsten Leemhuis from comment #4)
> > Fixed In Version: hunspell-de-0.20151222-4.fc24
> Works for me. Many thx for taking care of this!
Any chance of getting this pushed to Fedora 24 repositories? The -4 update only got pushed to Fedora 25.
or just update igerman98 to 20160407. This one now detects if we have a broken grep version and allows to use busybox' grep to build.
hunspell-de-0.20151222-4.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-e2e32afb32
sorry i usually only push to rawhide so i forgot that i need to "fedpkg update" for f24...
hunspell-de-0.20151222-4.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-e2e32afb32
hunspell-de-0.20151222-4.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.