1316359 – grep update broke German spell checking

Bug 1316359 - grep update broke German spell checking

Summary: grep update broke German spell checking

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	hunspell-de
Sub Component:
Version:	24
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Michael Stahl
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-03-10 03:56 UTC by Sebastian Keller
Modified:	2016-04-13 21:35 UTC (History)
CC List:	5 users (show)
Fixed In Version:	hunspell-de-0.20151222-4.fc24
Clone Of:
Environment:
Last Closed:	2016-04-13 21:35:11 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Sebastian Keller 2016-03-10 03:56:23 UTC

Updating from hunspell-de-0.20151222-2.fc24 to hunspell-de-0.20151222-3.fc24 breaks German spell checking.

Looking at the rpm files there is a big difference in file sizes. The .dic files in 0.20151222-3 are ~45kB while those in 0.20151222-2 are ~1.1MB. So probably large parts of the dictionaries are missing.

Comment 1 Parag Nemade 2016-03-10 05:08:57 UTC

You are right, file size is dropped to just ~45kB. What I did now is just tried to scratch build older -2 release for F24 and see here http://koji.fedoraproject.org/koji/taskinfo?taskID=13292731 the results are same. so something required to build this package is causing this problem. The update change looks okay to me.

Comment 2 Thorsten Leemhuis 2016-04-01 13:04:15 UTC

I quickly looked into this and noticed a small difference between
https://kojipkgs.fedoraproject.org//packages/hunspell-de/0.20151222/2.fc24/data/logs/noarch/build.log (the old build that works) and https://kojipkgs.fedoraproject.org//packages/hunspell-de/0.20151222/3.fc24/data/logs/noarch/build.log (the one that does not). In the latter one the areas with "Capital prefixes will be expanded: " are way shorter. That might be symptom only, but maybe that does ring a bell to anyone here…

Comment 3 Michael Stahl 2016-04-01 16:58:02 UTC

if i build the rawhide package with "fedpkg local" on F23, it works.
with "fedpkg mockbuild --no-clean-all" it fails.

a diff contains a lot of stuff like:

-Binaries/m
-Binary/m
-Binde/ijm
-Binden/SJm
-Binder/NFSm
-Binderei/Pm
-Bindestrick/dEPS
-Bindfäden/m
-Bindfaden/Sm
+Binary file (standard input) matches
+Binary file hunspell-capmain-small_de_AT.tmp.tmp matches/m
+Binde/hij
Bindungs/hij
-Bingen/Sm

smells like something invokes grep and rawhide grep thinks input is binary, some of the time.

apparently the problem is in bin/hunspell-capmain ...

on Fedora 23 this results in the expected output:

LC_ALL=C LANG=C grep "^[A-ZÄÖÜÉ]" igerman98-20151222/hunspell-capmain-small_de_AT.tmp.tmp

whereas using rawhide's grep produces the "Binary file hunspell-capmain-small_de_AT.tmp.tmp matches" at the first line with a non-ASCII char, "Praliné/S":

LC_ALL=C LANG=C /var/lib/mock/fedora-rawhide-x86_64/root/usr/bin/grep "^[A-ZÄÖÜÉ]" igerman98-20151222/hunspell-capmain-small_de_AT.tmp.tmp

so apparently the interpretation of the "C" locale changed between
grep-2.22-6.fc23.x86_64 and grep-2.24-1.fc25.x86_64

http://git.savannah.gnu.org/cgit/grep.git/tree/NEWS says:

* Noteworthy changes in release 2.23 (2016-02-04) [stable]

** Bug fixes

Binary files are now less likely to generate diagnostics and more
likely to yield text matches. grep now reports "Binary file FOO
matches" and suppresses further output instead of outputting a line
containing an encoding error; hence grep can now report matching text
before a later binary match. Formerly, grep reported FOO to be
binary when it found an encoding error in FOO before generating
output for FOO, which meant it never reported both matching text and
matching binary data; this was less useful for searching text
containing encoding errors in non-matching lines.
[bug introduced in grep-2.21]

grep no longer outputs encoding errors in unibyte locales.
For example, if the byte '\x81' is not a valid character in a
unibyte locale, grep treats the byte as binary data.
[bug introduced in grep-2.21]

... sounds like an intentional change, and the build worked by accident before.

try to fix it by forcing more use of en_US.iso885915 locale.

Comment 4 Thorsten Leemhuis 2016-04-04 05:38:30 UTC

> Fixed In Version: hunspell-de-0.20151222-4.fc24

Works for me. Many thx for taking care of this!

Comment 5 Björn Jacke 2016-04-05 23:24:42 UTC

The C locale is the fallback locale and should always give reasonable results. The result that GNU grep gives back does not seem to be in line whaty you should expect according to http://pubs.opengroup.org/onlinepubs/009604499/utilities/grep.html

Setting some legacy locale to be able to work with iso8895-1 encoded files will only work when the given locale definitions exist on the system. But you cannot assume any locale definition to pre-exist except for C.

# echo -e "test\ntäst" | iconv -f utf8 -t latin1 | LC_ALL=C grep st ; echo $?
Binary file (standard input) matches
0
#

IMHO GNU grep ist heavily broken here. I will require a different grep implementation with the next igerman98 release. You might also switch to use busybox grep, which seems to work better.

Comment 6 Christian Stadelmann 2016-04-08 13:03:03 UTC

(In reply to Thorsten Leemhuis from comment #4)
> > Fixed In Version: hunspell-de-0.20151222-4.fc24
> 
> Works for me. Many thx for taking care of this!

Any chance of getting this pushed to Fedora 24 repositories? The -4 update only got pushed to Fedora 25.

Comment 7 Björn Jacke 2016-04-08 13:07:47 UTC

or just update igerman98 to 20160407. This one now detects if we have a broken grep version and allows to use busybox' grep to build.

Comment 8 Fedora Update System 2016-04-08 13:12:43 UTC

hunspell-de-0.20151222-4.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-e2e32afb32

Comment 9 Michael Stahl 2016-04-08 13:16:52 UTC

sorry i usually only push to rawhide so i forgot that i need to "fedpkg update" for f24...

Comment 10 Fedora Update System 2016-04-09 18:52:42 UTC

hunspell-de-0.20151222-4.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-e2e32afb32

Comment 11 Fedora Update System 2016-04-13 21:35:06 UTC

hunspell-de-0.20151222-4.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.