Bug 198165 - grep should not take all memory
grep should not take all memory
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: grep (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Tim Waugh
:
Depends On:
Blocks: 198167 FC6Update
  Show dependency treegraph
 
Reported: 2006-07-10 08:39 EDT by Russell Coker
Modified: 2009-01-27 10:54 EST (History)
2 users (show)

See Also:
Fixed In Version: 2.5.1-54.1.2.fc6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-12-12 11:22:30 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Russell Coker 2006-07-10 08:39:45 EDT
When grep is given a file that is very long and has no new line characters 
it's memory use grows without end.

For example create a sparse file 100G in size and try grepping it, any machine 
that is commonly available will crash in such a situation.

While it is possible that grep could lose a potential match if it breaks a 
line into a small chunk, I believe that is a better situation than having the 
program crash entirely (it could even display a warning message about breaking 
a line due to memory constraints).

A problem I have is the occasional sparse file in a directory full of 
non-sparse files.  The command "grep foo *" is thus impossible to run because 
grep would abort at the first big sparse file.

I suggest making grep's buffer stop at 500M in size.
Comment 2 Fedora Update System 2006-12-12 11:11:42 EST
Fixed in update: grep-2.5.1-54.1.2.fc6.
Comment 3 Stepan Kasal 2009-01-27 10:54:26 EST
First, the grep-mem-exhausted.patch used in Fedora since comment #2 until now was not a correct implementation of the idea presented in comment #0.  See bug #481765 for details.

Second, the idea to take the risk that a possible match on the huge "line" is missed is IMHO not optimal:

Grep is a line-oriented tool to process text files.  When processing general binary data (even in the so-called binary mode), grep searches for the occurences of the newline character, and processed the "lines" delimited by the occurences.
Grep is not meant to process binary files.  Indeed, this bug shows that its implementation is not ready to process them.

In particular, grep internal matchers work only with lines which are fully loaded into the memory.  Unless that assumption is relaxed, grep cannot correctly process files with "lines" with size close to or bigger than the amount of available virtual memory (and it is slow to process lines longer than the amount of available RAM).  But relaxing the assumption would require a substantial redesign of the matchers.

It is plausible if grep exits with an error message and exit code 2 in that situation, "giving up".
But it is less accurate if grep prints an incorrect result (though in rare situations), without any indication that a problem occured.

Consequently, grep might err out as soon as the buffer size reaches the limit, or it might simply allocate as much memory as the OS allows.

For Fedora rawhide, the latter seems better aligned with the GNU credo "no arbitrary limits".  (No matter that 500 MB seems reasonable today, it may become ridiculous over the time.  Traditional UNIX defines text files with lines of maximal length of 1024. And 640K must be enough for everybody.)

IOW, as of grep-2.5.3-3, I'm removing grep-mem-exhausted.patch.

Note You need to log in before you can comment on or make changes to this bug.