Bug 133932 - grep is very slow when searching for ASCII text
Summary: grep is very slow when searching for ASCII text
Keywords:
Status: CLOSED DUPLICATE of bug 142807
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: grep
Version: 3.0
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Tim Waugh
QA Contact: Mike McLean
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-09-28 15:27 UTC by Paul Zmuda
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-12-14 16:34:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Paul Zmuda 2004-09-28 15:27:50 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; 
Q312461; .NET CLR 1.1.4322)

Description of problem:
When grep searches for ASCII text in a large file (~500M), it can 
take as long as 2 minutes before the command completes.

When LANG is set to en_US.UTF-8, the command takes approximately 2 
minutes to complete.

When LANG is set to en_US, the command takes approximately 22 seconds 
to complete.

Version-Release number of selected component (if applicable):
grep-2.5.1-24.1

How reproducible:
Always

Steps to Reproduce:
1. Ensure LANG is set to en_US.UTF-8
2. Have available a large ASCII text file
3. time grep <text string> <filename>
    

Actual Results:  [root]# ls -l tsp04.log
-rw-r--r--    1 cpstsp   tsp      520034711 Sep 24 11:57 tsp04.log

[root]# export LANG=en_US.UTF-8 
[root]# echo $LANG
en_US.UTF-8

[root]# time grep Excess tsp04.log
real    2m3.489s
user    2m1.850s
sys     0m0.850s


Expected Results:  [root]# ls -l tsp04.log
-rw-r--r--    1 cpstsp   tsp      520034711 Sep 24 11:57 tsp04.log

[root]# export LANG=en_US
[root]# echo $LANG
en_US

[root]# time grep Excess tsp04.log
real    0m21.371s
user    0m0.950s
sys     0m1.260s


Additional info:


The system has been updated with all of the latest available packages 
using up2date.

Comment 1 Tim Waugh 2004-09-28 15:38:27 UTC
To put it more precisely: grep -F is slower than it need be.  Have I
understood correctly?

(If there are any special characters, for example ".", there is extra
processing to do in UTF-8 which cannot really be avoided.  For fixed
strings I think it might be possible to avoid UTF-8 processing.)


Comment 2 Paul Zmuda 2004-09-28 17:34:41 UTC
That is correct.

Issuing the above grep command on a Red Hat Linux 7.2 system produces 
basically the results as Red Hat Enterprise Linux 3 with LANG=en_US.  
That is it completes in approximately 22 seconds.  

Comment 3 mike 2004-10-05 20:31:20 UTC
see bug 116145, errata was supposed to resolve but does not at least 
in this case I can reproduce always (on any system with rhel3):

rhel3, grep-2.5.1-24.1

eg:
3.3M file which contains "word" a lot

$ export LANG=en_US
$ time grep -w word file >/dev/null
real    0m0.027s
user    0m0.030s
sys     0m0.000s

$ export LANG=en_US.UTF-8
$ time grep -w word file >/dev/null
real    0m47.790s
user    0m47.770s
sys     0m0.000s


Comment 4 Tim Waugh 2004-11-09 12:38:59 UTC
Please try the experimental package 2.5.1-24.1.0.1, located here:

  ftp://people.redhat.com/twaugh/tmp/grep/rhel3/

Thanks.


Comment 5 Paul Zmuda 2004-11-09 13:43:57 UTC

[root@tspes3dev2 up2date]# rpm -Uv grep-2.5.1-24.1.0.1.i386.rpm
Preparing packages for installation...
grep-2.5.1-24.1.0.1
[root@tspes3dev2 up2date]# rpm -q -a |grep grep
grep-2.5.1-24.1.0.1

[root@tspes3dev2 cpstsp]# echo $LANG
en_US.UTF-8

[root@tspes3dev2 cpstsp]# time grep Excess tsp04.log

real    0m35.962s
user    0m0.910s
sys     0m1.630s

[root@tspes3dev2 cpstsp]# time grep Excess tsp04.log

real    0m1.691s
user    0m0.980s
sys     0m0.710s

I loaded the patch and "grep" is working as expected.  The 
initial "grep" completed in 35.962 seconds whereas subsequent 
requests within a short period of time completed in 1.692 seconds.

I am satisfied with this patch.

Comment 6 mike 2004-11-09 16:15:01 UTC
works for me;
one test that takes 90s with previous grep finishes in 1.3s with this 
version.

Comment 7 Tim Waugh 2004-11-10 11:38:14 UTC
2.5.1-24.1.0.2 fixes a bug that can cause false matches:

  ftp://people.redhat.com/twaugh/tmp/grep/rhel3/


Comment 8 Tim Waugh 2004-12-14 16:34:17 UTC

*** This bug has been marked as a duplicate of 142807 ***


Note You need to log in before you can comment on or make changes to this bug.