Bug 1172804
| Summary: | grep behavior in processing binary files is changed in Fedora 21 | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Hedayat Vatankhah <hedayatv> |
| Component: | grep | Assignee: | Jaroslav Škarvada <jskarvad> |
| Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 21 | CC: | eggert, jskarvad, lkundrak |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-12-12 09:15:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1172405 | ||
|
Description
Hedayat Vatankhah
2014-12-10 19:31:22 UTC
This seems to be optimization feature. It now treats NULs as line ends in binary mode. Added by commit:
commit 8cc20c82a747460991305b0d8d72faf6830298f4
Author: Paul Eggert <eggert.edu>
Date: Mon Sep 15 17:15:06 2014 -0700
grep: non-text bytes in binary data may be treated as line ends
* NEWS, doc/grep.texi (File and Directory Selection):
Document this change.
* src/grep.c (zap_nuls): New function.
(grep): Use it.
* tests/null-byte: Relax to allow new behavior.
From the NEWS:
When searching binary data, grep now may treat non-text bytes as
line terminators. This can boost performance significantly.
I think this feature should be at least documented in manual page and there should be option to disable (or enable?) it. Forwarding upstream.
Thank you for the comment. I checked the man page and tried to find something in the net; but didn't checked NEWS file. Currently, it seems that the only way around this 'feature' is using '-a' (treating binary files as text); but it is very weird: you should treat a binary file as text, so that you can match non-text patterns! Or maybe they say you are not supposed to match binary patterns using grep at all. It's documented in the grep manual, under the node "File and Directory Selection", which says "When matching binary data, `grep' may treat non-text bytes as line terminators." With -a, grep bypasses its heuristics about whether a file is binary data or text, so that it treats null bytes as text. With -a, grep should work just fine for os-prober (the application in question), which searches non-text files as if they were text. I have filed a bug report for os-prober accordingly, at: http://bugs.debian.org/772901 The reason for the change in behavior in grep 2.21, by the way, was to forestall some denial-of-service attacks. For example: $ echo x | dd obs=1GiB seek=8000000000 >big 0+1 records in 0+1 records out 2 bytes (2 B) copied, 0.0124091 s, 0.2 kB/s $ ls -l big -rw-rw-r-- 1 eggert faculty 8589934592000000002 Dec 11 21:50 big $ time grep-2.21 x big Binary file big matches real 0m0.007s user 0m0.001s sys 0m0.005s Not too shabby, huh? grep 2.21 processed over 1 zettabyte per second here. Given the same input, grep 2.20 keels over and dies a painful death. Thanks for the explanation. I should have looked into more manuals rather than just 'man grep' (IMHO it'd be useful to add a note there too). And finally, for creating and sending the patch upstream! @Jaroslav thank you too for following this. This bug can be closed now; I'll apply the changes to os-prober. OK, closing per previous comments. |