Description of problem: grep in Fedora 21 behaves differently when searching inside binary files. For example, create a 3 byte file which contains: 'W', '\0' (zero byte) and 'i'. Using Fedora 20 (and previous versions of) grub, it can match "W.i". However, Fedora 21 grub doesn't match this pattern in the file. But, if I add '-a' to grep command line, it matches the pattern in this file. The following is a sample run of these commands. The binary file is called t3. And Fedora 20 rootfs is mounted in /mnt: # File contents: [hedayat@hvlap ~]% hexdump -C t3 00000000 57 00 69 |W.i| 00000003 # Running (F 21) grub normally with no match: [hedayat@hvlap ~]% grep -qs "W.i" t3 [hedayat@hvlap ~]% echo $? 1 # Running F 20 grub with the same options. matches. [hedayat@hvlap ~]% /mnt/usr/bin/grep -qs "W.i" t3 [hedayat@hvlap ~]% echo $? 0 # Running F 21 grub with -a to treat the file as ASCII [hedayat@hvlap ~]% grep -qsa "W.i" t3 [hedayat@hvlap ~]% echo $? 0
This seems to be optimization feature. It now treats NULs as line ends in binary mode. Added by commit: commit 8cc20c82a747460991305b0d8d72faf6830298f4 Author: Paul Eggert <eggert.edu> Date: Mon Sep 15 17:15:06 2014 -0700 grep: non-text bytes in binary data may be treated as line ends * NEWS, doc/grep.texi (File and Directory Selection): Document this change. * src/grep.c (zap_nuls): New function. (grep): Use it. * tests/null-byte: Relax to allow new behavior. From the NEWS: When searching binary data, grep now may treat non-text bytes as line terminators. This can boost performance significantly. I think this feature should be at least documented in manual page and there should be option to disable (or enable?) it. Forwarding upstream.
Thank you for the comment. I checked the man page and tried to find something in the net; but didn't checked NEWS file. Currently, it seems that the only way around this 'feature' is using '-a' (treating binary files as text); but it is very weird: you should treat a binary file as text, so that you can match non-text patterns! Or maybe they say you are not supposed to match binary patterns using grep at all.
It's documented in the grep manual, under the node "File and Directory Selection", which says "When matching binary data, `grep' may treat non-text bytes as line terminators." With -a, grep bypasses its heuristics about whether a file is binary data or text, so that it treats null bytes as text. With -a, grep should work just fine for os-prober (the application in question), which searches non-text files as if they were text. I have filed a bug report for os-prober accordingly, at: http://bugs.debian.org/772901 The reason for the change in behavior in grep 2.21, by the way, was to forestall some denial-of-service attacks. For example: $ echo x | dd obs=1GiB seek=8000000000 >big 0+1 records in 0+1 records out 2 bytes (2 B) copied, 0.0124091 s, 0.2 kB/s $ ls -l big -rw-rw-r-- 1 eggert faculty 8589934592000000002 Dec 11 21:50 big $ time grep-2.21 x big Binary file big matches real 0m0.007s user 0m0.001s sys 0m0.005s Not too shabby, huh? grep 2.21 processed over 1 zettabyte per second here. Given the same input, grep 2.20 keels over and dies a painful death.
Thanks for the explanation. I should have looked into more manuals rather than just 'man grep' (IMHO it'd be useful to add a note there too). And finally, for creating and sending the patch upstream! @Jaroslav thank you too for following this. This bug can be closed now; I'll apply the changes to os-prober.
OK, closing per previous comments.