Version-Release number of selected component (if applicable): grep-2.21-5.fc22.x86_64 How reproducible: easily Steps to Reproduce: 1. curl -O https://raw.githubusercontent.com/bagder/curl/curl-7_29_0/src/tool_getparam.c 2. file tool_getparam.c 3. grep ftpport tool_getparam.c Actual results: $ file tool_getparam.c tool_getparam.c: C source, ISO-8859 text $ grep ftpport tool_getparam.c Binary file tool_getparam.c matches Expected results: $ grep ftpport tool_getparam.c {"P", "ftpport", TRUE}, /* 'ftpport' old version */ Curl_safefree(config->ftpport); GetStr(&config->ftpport, nextarg); Additional info: grep-2.18-1.fc20 works fine.
I run into the same issue. I already opened a report at gnu grep: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20526 Affected: grep-2.21-3.fc21.x86_64 Not affected: grep-2.20-6.fc21.x86_64
Thanks for the reference! The upstream commit that changes the behavior should be reverted IMO. I use grep to search (usually ASCII) content in a huge amount of text files of mixed encoding. It is technically impossible to set the LC_ALL variable for each of them as in the suggested workaround. If GCC compiles a file in the same environment with no warning, the file hardly classifies as a binary file.
You can set grep alias to 'grep -a' and treat all files as text. I am not going to revert the fix if upstream not do.
(In reply to Jaroslav Škarvada from comment #3) > You can set grep alias to 'grep -a' and treat all files as text. That would not restore the behavior of grep-2.18-1.fc20. I want grep to treat text files as text files, regardless of their encoding. grep -a will cause grep to treat also binary files as text files, which is not that useful.
A text file, by POSIX definition, is one in which ALL bytes in the file comprise valid characters in the current locale encoding. A file may be text in one locale, and binary in another. If you try to grep a file that is binary because you did not use 'grep -a' and your locale differs from the encoding used by the file, then all bets are off, because you are attempting something non-portable. This is not a regression, but a documented change in undefined behavior, and arguably a bug fix (as grep outputting invalid encoding sequences when grepping a binary file that differs from the current locale may in turn result in the terminal displaying bogus content when it encounters those encoding errors). I see no reason to change anything in grep - people deserve to be educated about locale-dependence of their files.
(In reply to Kamil Dudka from comment #4) > (In reply to Jaroslav Škarvada from comment #3) > > You can set grep alias to 'grep -a' and treat all files as text. > > That would not restore the behavior of grep-2.18-1.fc20. I want grep to > treat text files as text files, regardless of their encoding. grep -a will > cause grep to treat also binary files as text files, which is not that > useful. The problem, though, is that not all "text files" are text files across all locales. What heuristics would you propose to determine that "this file, which has encoding errors in the current locale and is therefore not a text file, is nevertheless a text file in some other locale, so treat it as text instead of binary" without adding the expense of grepping over two locales instead of one for every file?
I usually need to grep all C source files, Makefiles, shell scripts etc. for some pattern in a project directory recursively while skipping object files and the like. The heuristic used in grep prior to commit cd36abd4 worked just fine for me. Source files in many SW projects simply are of mixed encoding. Even if you force all projects to unify the encoding now, it will remain mixed in the SCM history, making grep less useful during git-bisect etc.
(In reply to Kamil Dudka from comment #7) > I usually need to grep all C source files, Makefiles, shell scripts etc. for > some pattern in a project directory recursively while skipping object files > and the like. The heuristic used in grep prior to commit cd36abd4 worked > just fine for me. > > Source files in many SW projects simply are of mixed encoding. Even if you > force all projects to unify the encoding now, it will remain mixed in the > SCM history, making grep less useful during git-bisect etc. Maybe some new option or environment variable for the "compatibility mode"? I think this discussion should move upstream.
(In reply to Kamil Dudka from comment #7) > I usually need to grep all C source files, Makefiles, shell scripts etc. for > some pattern in a project directory recursively while skipping object files > and the like. The heuristic used in grep prior to commit cd36abd4 worked > just fine for me. But you were getting lucky, and relying on undefined behavior. The old heuristic was "encoding error? oh well, you get to keep the pieces if it hangs"; the new heuristic is "encoding error? file is binary, so treat it as such". But the point remains that either way, you are relying on undefined behavior. > > Source files in many SW projects simply are of mixed encoding. Even if you > force all projects to unify the encoding now, it will remain mixed in the > SCM history, making grep less useful during git-bisect etc. Ideally, it would be nice if files had attributes in the metadata stating what encoding the file is in, and if SCMs had metadata that tracked the encoding (and encoding changes, when a file is recoded). But that's a much bigger project to take on; one that would need cooperation across the entire software stack to bring to fruition.
That new behaviour is not that great from user point of view. Old one not correct enough. So what now, config file for grep with list of (user choosen) locales that should be considered "text" ? A bit crazy idea.
upstream commit: http://git.savannah.gnu.org/cgit/grep.git/commit/?id=85210016
(In reply to Kamil Dudka from comment #11) > upstream commit: > > http://git.savannah.gnu.org/cgit/grep.git/commit/?id=85210016 Thanks for the info.
grep-2.22-4.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-14956838f8
grep-2.21-7.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2016-8a95ebc411
grep-2.21-7.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-8a95ebc411
grep-2.22-4.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-14956838f8
grep-2.22-5.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-313013441f
grep-2.21-8.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2016-d41031d2d0
grep-2.22-5.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-313013441f
grep-2.21-8.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-d41031d2d0
grep-2.22-5.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.
grep-2.21-9.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2016-8883879d28
grep-2.21-9.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-8883879d28
grep-2.21-9.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.