Bug 1172804 - grep behavior in processing binary files is changed in Fedora 21
Summary: grep behavior in processing binary files is changed in Fedora 21
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: grep
Version: 21
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jaroslav Škarvada
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1172405
TreeView+ depends on / blocked
 
Reported: 2014-12-10 19:31 UTC by Hedayat Vatankhah
Modified: 2014-12-12 09:15 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-12 09:15:03 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Hedayat Vatankhah 2014-12-10 19:31:22 UTC
Description of problem:
grep in Fedora 21 behaves differently when searching inside binary files. For example, create a 3 byte file which contains: 'W', '\0' (zero byte) and 'i'.
Using Fedora 20 (and previous versions of) grub, it can match "W.i". However, Fedora 21 grub doesn't match this pattern in the file. But, if I add '-a' to grep command line, it matches the pattern in this file.

The following is a sample run of these commands. The binary file is called t3. And Fedora 20 rootfs is mounted in /mnt:

# File contents:
[hedayat@hvlap ~]% hexdump -C t3   
00000000  57 00 69                                          |W.i|
00000003

# Running (F 21) grub normally with no match:
[hedayat@hvlap ~]% grep -qs "W.i" t3
[hedayat@hvlap ~]% echo $?
1

# Running F 20 grub with the same options. matches.
[hedayat@hvlap ~]% /mnt/usr/bin/grep -qs "W.i" t3
[hedayat@hvlap ~]% echo $?                       
0

# Running F 21 grub with -a to treat the file as ASCII
[hedayat@hvlap ~]% grep -qsa "W.i" t3            
[hedayat@hvlap ~]% echo $?           
0

Comment 1 Jaroslav Škarvada 2014-12-11 15:05:32 UTC
This seems to be optimization feature. It now treats NULs as line ends in binary mode. Added by commit:

commit 8cc20c82a747460991305b0d8d72faf6830298f4
Author: Paul Eggert <eggert.edu>
Date:   Mon Sep 15 17:15:06 2014 -0700

    grep: non-text bytes in binary data may be treated as line ends
    
    * NEWS, doc/grep.texi (File and Directory Selection):
    Document this change.
    * src/grep.c (zap_nuls): New function.
    (grep): Use it.
    * tests/null-byte: Relax to allow new behavior.

From the NEWS:
When searching binary data, grep now may treat non-text bytes as
line terminators.  This can boost performance significantly.

I think this feature should be at least documented in manual page and there should be option to disable (or enable?) it. Forwarding upstream.

Comment 2 Hedayat Vatankhah 2014-12-11 20:39:08 UTC
Thank you for the comment. I checked the man page and tried to find something in the net; but didn't checked NEWS file. 

Currently, it seems that the only way around this 'feature' is using '-a' (treating binary files as text); but it is very weird: you should treat a binary file as text, so that you can match non-text patterns! Or maybe they say you are not supposed to match binary patterns using grep at all.

Comment 3 Paul Eggert 2014-12-12 06:01:37 UTC
It's documented in the grep manual, under the node "File and Directory Selection", which says "When matching binary data, `grep' may treat non-text bytes as line terminators."

With -a, grep bypasses its heuristics about whether a file is binary data or text, so that it treats null bytes as text.  With -a, grep should work just fine for os-prober (the application in question), which searches non-text files as if they were text.  I have filed a bug report for os-prober accordingly, at:

http://bugs.debian.org/772901

The reason for the change in behavior in grep 2.21, by the way, was to forestall some denial-of-service attacks.  For example:

$ echo x | dd obs=1GiB seek=8000000000 >big
0+1 records in
0+1 records out
2 bytes (2 B) copied, 0.0124091 s, 0.2 kB/s
$ ls -l big
-rw-rw-r-- 1 eggert faculty 8589934592000000002 Dec 11 21:50 big
$ time grep-2.21 x big
Binary file big matches

real    0m0.007s
user    0m0.001s
sys     0m0.005s

Not too shabby, huh?  grep 2.21 processed over 1 zettabyte per second here.  Given the same input, grep 2.20 keels over and dies a painful death.

Comment 4 Hedayat Vatankhah 2014-12-12 07:29:03 UTC
Thanks for the explanation. I should have looked into more manuals rather than just 'man grep' (IMHO it'd be useful to add a note there too). And finally, for creating and sending the patch upstream!

@Jaroslav thank you too for following this. This bug can be closed now; I'll apply the changes to os-prober.

Comment 5 Jaroslav Škarvada 2014-12-12 09:15:03 UTC
OK, closing per previous comments.


Note You need to log in before you can comment on or make changes to this bug.