Bug 198165

Summary: grep should not take all memory
Product: [Fedora] Fedora Reporter: Russell Coker <russell>
Component: grepAssignee: Tim Waugh <twaugh>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: kasal, staubach
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.5.1-54.1.2.fc6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-12-12 16:22:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 198167, 207681    

Description Russell Coker 2006-07-10 12:39:45 UTC
When grep is given a file that is very long and has no new line characters 
it's memory use grows without end.

For example create a sparse file 100G in size and try grepping it, any machine 
that is commonly available will crash in such a situation.

While it is possible that grep could lose a potential match if it breaks a 
line into a small chunk, I believe that is a better situation than having the 
program crash entirely (it could even display a warning message about breaking 
a line due to memory constraints).

A problem I have is the occasional sparse file in a directory full of 
non-sparse files.  The command "grep foo *" is thus impossible to run because 
grep would abort at the first big sparse file.

I suggest making grep's buffer stop at 500M in size.

Comment 2 Fedora Update System 2006-12-12 16:11:42 UTC
Fixed in update: grep-2.5.1-54.1.2.fc6.

Comment 3 Stepan Kasal 2009-01-27 15:54:26 UTC
First, the grep-mem-exhausted.patch used in Fedora since comment #2 until now was not a correct implementation of the idea presented in comment #0.  See bug #481765 for details.

Second, the idea to take the risk that a possible match on the huge "line" is missed is IMHO not optimal:

Grep is a line-oriented tool to process text files.  When processing general binary data (even in the so-called binary mode), grep searches for the occurences of the newline character, and processed the "lines" delimited by the occurences.
Grep is not meant to process binary files.  Indeed, this bug shows that its implementation is not ready to process them.

In particular, grep internal matchers work only with lines which are fully loaded into the memory.  Unless that assumption is relaxed, grep cannot correctly process files with "lines" with size close to or bigger than the amount of available virtual memory (and it is slow to process lines longer than the amount of available RAM).  But relaxing the assumption would require a substantial redesign of the matchers.

It is plausible if grep exits with an error message and exit code 2 in that situation, "giving up".
But it is less accurate if grep prints an incorrect result (though in rare situations), without any indication that a problem occured.

Consequently, grep might err out as soon as the buffer size reaches the limit, or it might simply allocate as much memory as the OS allows.

For Fedora rawhide, the latter seems better aligned with the GNU credo "no arbitrary limits".  (No matter that 500 MB seems reasonable today, it may become ridiculous over the time.  Traditional UNIX defines text files with lines of maximal length of 1024. And 640K must be enough for everybody.)

IOW, as of grep-2.5.3-3, I'm removing grep-mem-exhausted.patch.