Bug 495521

Summary:	grep segfaults when grepping files of certain sizes
Product:	Red Hat Enterprise Linux 5	Reporter:	Mattias Slabanja <slabanja>
Component:	grep	Assignee:	Stepan Kasal <kasal>
Status:	CLOSED DUPLICATE	QA Contact:	BaseOS QE <qe-baseos-auto>
Severity:	medium	Docs Contact:
Priority:	low
Version:	5.3	CC:	redhat-bugzilla
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2009-09-22 12:31:34 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Mattias Slabanja 2009-04-13 16:25:45 UTC

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.8) Gecko/2009032608 Firefox/3.0.8

When using grep on multiple files (e.g. "grep something file1 file2 ..."), grep  segfaults when the files (file1 file2 ...) have certain combinations of sizes.

Reproducible: Always

Steps to Reproduce:
1. dd if=/dev/zero of=file1 bs=1 count=0 seek=31715200 
2. dd if=/dev/zero of=file2 bs=1 count=0 seek=289401200
3. grep something first second
Actual Results:  

$ grep something file1 file2
Segmentation fault
$

Expected Results:  

$ grep something file1 file2
$

The files does not need to be sparse (as in the provided example), it works equally well with ordinary files.


The bug was first encountered when searching through files that were not identically zero.


If file1 is 31195133 bytes or less, no segfault occures. 
If file1 is 31195134 bytes, the bug is triggered.


The bug is reproducible also on CentOS-systems and on RHEL4-systems and on i386-systems.


From gdb:
(gdb) set args something file1 file2
(gdb) run
Starting program: /bin/grep something file1 file2

Program received signal SIGSEGV, Segmentation fault.
grepfile (file=0x7fff35a93b92 "second", stats=0x6145a0) at grep.c:790
790	      oldc = beg[-1];

Comment 1 Vesa-Matti Kari 2009-08-28 15:03:53 UTC

It is simply UNBELIEVABLY PATHETIC that YEARS have passed, Red Hat still hasn't fixed this!! Please see Bug #237518.

Fedora 11 does ship a newer version grep, and more importantly, they have removed the TOTALLY ABSURD patch 

  grep-mem-exhausted.patch

that RHEL 5.3 still (proudly???) ships!! 

How on earth can there be patches like that? That worthless piece of "fix" just truncates the requested amount of memory and then returns "OKAY, I have allocated the amount you requested. Good luck." 

It is no wonder these greps are totally BROKEN when they apply patches like this and won't remove them when people tell them they're are BROKEN!!

How is one supposed to do any serious sysadmin job on RHEL when, for example, one cannot use grep to scan big log files?

This is like breaking a hammer or a screwdriver in someone's toolbox. These basic utilities are FUNDAMENTAL, they're supposed to ALWAYS WORK!!!

Shame, shame, shame!!!

Comment 2 Stepan Kasal 2009-09-22 11:35:08 UTC

(In reply to comment #1)
> basic utilities are FUNDAMENTAL, they're supposed to ALWAYS WORK!!!

grep is a text processing utility, thus it is supposed to work on text files.

If a file contains excessive segments with no newline character, grep, as a line oriented tool, can not work reliably.
(A reliable way to process such a file might be a pipe consisting of commands strings and grep.)

Comment 3 Stepan Kasal 2009-09-22 12:31:34 UTC

(In reply to comment #0)
> Actual Results:  
> $ grep something file1 file2
> Segmentation fault
> $
> 
> Expected Results:  
> $ grep something file1 file2
> $

First, please note that grep is a text processing tool, thus it has to process a 16GB or 138GB "line".  This is far, far beyond what would you call a "text file".

Anyway, segfault is not an adequate behavior.

If we followed the GNU mantra "no arbitrary limits", then grep would happily process the multi-gigabyte line, loading it whole to memory, and after a substantial amount of time, the correct answer would come.
If the virtual memory (swap space) is not big enough, the computation would end with an error message that the memory was exhausted.
This would be the natural behavior for a text processing tool when it is fed with binary data.
And this is also the approach used in recent Fedora releases.  (Recent Fedora builds of grep does not contained the grep-mem-exhausted.patch, as mentioned in comment #1.)

But this behavior, though correct from the theoretic viewpoint, has caused problems previously, see RHEL4 bug #198167.  Consequently, a limit was imposed on the memory consumed.  Unfortunately, earlier versions of the patch might cause segmentation faults.  Updated version of the patch (grep-2.5.1-55.el5, released as part of RHEL 5.4, see #483073) eliminates this issue.

With this latest update, the result is:

$ grep something file1 file2; echo $?
grep: line too long
2
$

And this answer is printed promptly; grep does not spend ages allocating exhaustive amounts of memory.  This is the optimal solution for RHEL 4 and 5 with respect to backward compatibility.

*** This bug has been marked as a duplicate of bug 483073 ***

Comment 4 Vesa-Matti Kari 2009-09-24 10:26:19 UTC

Hello! 

I have already sent a private apology to Stepan, but I think it is necessary to repeat it here. My moronic ranting was a very bad bug report indeed. There is no excuse for such behavior. I should have taken the time to cool down and then write a neutral bug report. 

Sorry about the inapproriate outburst and many thanks for the Red Hat crew for replying and fixing grep.

Regards,
vmk