Bug 210876

Summary: 'grep' takes minutes to complete the operation if using invert-match and "or"
Product: Red Hat Enterprise Linux 4 Reporter: theresa.chin
Component: grepAssignee: Tim Waugh <twaugh>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2   
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-17 13:39:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description theresa.chin 2006-10-16 13:45:20 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727)

Description of problem:
To be clear, this appears to be a RHEL4 Intel issue. This has been reproduced on IA64 servers and EMT64 workstation.  It does *not* occur on an AMD Opteron WS.

If a grep is constructed such that an extended "or" and invert-match which causes a large number of files to be omitted are used, the time it takes to complete the operation is in minutes even though it should take sub-seconds.

Below are some examples showing the individual operations taking a few seconds, but when adding the "or", it takes minutes.  This was done against a 5mb text file.  
On the AMD opteron using the same command/file, it takes .3 seconds!

[root@can16 ~]# time egrep -iv '100' /boot/System.map|wc
     61     183    2601

real    0m0.232s
user    0m0.228s
sys     0m0.003s
[root@can16 ~]# time egrep -iv '200' /boot/System.map|wc
  21183   63549  837563

real    0m0.164s
user    0m0.267s
sys     0m0.002s
[root@can16 ~]# time egrep -iv '100|200' /boot/System.map|wc
     60     180    2553

real    0m40.504s
user    0m40.398s
sys     0m0.018s
[root@can16 ~]#

We need a fix for this from Red Hat.

Version-Release number of selected component (if applicable):
grep-2.5.1-31

How reproducible:
Always


Steps to Reproduce:
1. Create a large text file (e.g. 5 MB text file) to search pattern.
2. Do the 'egrep' operation on the file with "or" and invert-match to cause a large number of files to be omitted.
3. Time the operation for completion.  See the examples in the bug Description.

Actual Results:
The grep operation with -E (egrep) and "or" and "invert-match" took minutes to complete.

Expected Results:
The operation should have taken sub-seconds.

Additional info:

Comment 1 Tim Waugh 2006-10-16 16:14:35 UTC
Please try the package from the latest update (Update 4), which contains several
changes to address performance problems.

Alternatively if your search pattern contains no '.' or non-ASCII characters,
you can speed the operation up significantly by getting grep to process the
input as ASCII instead of the default UTF-8.  Just set 'LC_CTYPE=C' in the
environment ('export LC_CTYPE=C' in bash).

Comment 2 theresa.chin 2006-10-17 13:09:17 UTC
Thank you for the solution.  We've tried the latest update version of 
grep 'grep-2.5.1-32.2.ia64.rpm' and the slow operation problem has been 
solved.  

Best regards,

Theresa Chin