198165 – grep should not take all memory

Bug 198165 - grep should not take all memory

Summary: grep should not take all memory

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	grep
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Tim Waugh
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	198167 FC6Update
TreeView+	depends on / blocked

Reported:	2006-07-10 12:39 UTC by Russell Coker
Modified:	2009-01-27 15:54 UTC (History)
CC List:	2 users (show)
Fixed In Version:	2.5.1-54.1.2.fc6
Clone Of:
Environment:
Last Closed:	2006-12-12 16:22:30 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Russell Coker 2006-07-10 12:39:45 UTC

When grep is given a file that is very long and has no new line characters 
it's memory use grows without end.

For example create a sparse file 100G in size and try grepping it, any machine 
that is commonly available will crash in such a situation.

While it is possible that grep could lose a potential match if it breaks a 
line into a small chunk, I believe that is a better situation than having the 
program crash entirely (it could even display a warning message about breaking 
a line due to memory constraints).

A problem I have is the occasional sparse file in a directory full of 
non-sparse files.  The command "grep foo *" is thus impossible to run because 
grep would abort at the first big sparse file.

I suggest making grep's buffer stop at 500M in size.

Comment 2 Fedora Update System 2006-12-12 16:11:42 UTC

Fixed in update: grep-2.5.1-54.1.2.fc6.

Comment 3 Stepan Kasal 2009-01-27 15:54:26 UTC

First, the grep-mem-exhausted.patch used in Fedora since comment #2 until now was not a correct implementation of the idea presented in comment #0. See bug #481765 for details.

Second, the idea to take the risk that a possible match on the huge "line" is missed is IMHO not optimal:

Grep is a line-oriented tool to process text files. When processing general binary data (even in the so-called binary mode), grep searches for the occurences of the newline character, and processed the "lines" delimited by the occurences.
Grep is not meant to process binary files. Indeed, this bug shows that its implementation is not ready to process them.

In particular, grep internal matchers work only with lines which are fully loaded into the memory. Unless that assumption is relaxed, grep cannot correctly process files with "lines" with size close to or bigger than the amount of available virtual memory (and it is slow to process lines longer than the amount of available RAM). But relaxing the assumption would require a substantial redesign of the matchers.

It is plausible if grep exits with an error message and exit code 2 in that situation, "giving up".
But it is less accurate if grep prints an incorrect result (though in rare situations), without any indication that a problem occured.

Consequently, grep might err out as soon as the buffer size reaches the limit, or it might simply allocate as much memory as the OS allows.

For Fedora rawhide, the latter seems better aligned with the GNU credo "no arbitrary limits". (No matter that 500 MB seems reasonable today, it may become ridiculous over the time. Traditional UNIX defines text files with lines of maximal length of 1024. And 640K must be enough for everybody.)

IOW, as of grep-2.5.3-3, I'm removing grep-mem-exhausted.patch.

Note You need to log in before you can comment on or make changes to this bug.