Red Hat Bugzilla – Bug 198165
grep should not take all memory
Last modified: 2009-01-27 10:54:26 EST
When grep is given a file that is very long and has no new line characters
it's memory use grows without end.
For example create a sparse file 100G in size and try grepping it, any machine
that is commonly available will crash in such a situation.
While it is possible that grep could lose a potential match if it breaks a
line into a small chunk, I believe that is a better situation than having the
program crash entirely (it could even display a warning message about breaking
a line due to memory constraints).
A problem I have is the occasional sparse file in a directory full of
non-sparse files. The command "grep foo *" is thus impossible to run because
grep would abort at the first big sparse file.
I suggest making grep's buffer stop at 500M in size.
Fixed in update: grep-2.5.1-54.1.2.fc6.
First, the grep-mem-exhausted.patch used in Fedora since comment #2 until now was not a correct implementation of the idea presented in comment #0. See bug #481765 for details.
Second, the idea to take the risk that a possible match on the huge "line" is missed is IMHO not optimal:
Grep is a line-oriented tool to process text files. When processing general binary data (even in the so-called binary mode), grep searches for the occurences of the newline character, and processed the "lines" delimited by the occurences.
Grep is not meant to process binary files. Indeed, this bug shows that its implementation is not ready to process them.
In particular, grep internal matchers work only with lines which are fully loaded into the memory. Unless that assumption is relaxed, grep cannot correctly process files with "lines" with size close to or bigger than the amount of available virtual memory (and it is slow to process lines longer than the amount of available RAM). But relaxing the assumption would require a substantial redesign of the matchers.
It is plausible if grep exits with an error message and exit code 2 in that situation, "giving up".
But it is less accurate if grep prints an incorrect result (though in rare situations), without any indication that a problem occured.
Consequently, grep might err out as soon as the buffer size reaches the limit, or it might simply allocate as much memory as the OS allows.
For Fedora rawhide, the latter seems better aligned with the GNU credo "no arbitrary limits". (No matter that 500 MB seems reasonable today, it may become ridiculous over the time. Traditional UNIX defines text files with lines of maximal length of 1024. And 640K must be enough for everybody.)
IOW, as of grep-2.5.3-3, I'm removing grep-mem-exhausted.patch.