Bug 133932 - grep is very slow when searching for ASCII text
grep is very slow when searching for ASCII text
Status: CLOSED DUPLICATE of bug 142807
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: grep (Show other bugs)
3.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Tim Waugh
Mike McLean
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-09-28 11:27 EDT by Paul Zmuda
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-12-14 11:34:17 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Paul Zmuda 2004-09-28 11:27:50 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; 
Q312461; .NET CLR 1.1.4322)

Description of problem:
When grep searches for ASCII text in a large file (~500M), it can 
take as long as 2 minutes before the command completes.

When LANG is set to en_US.UTF-8, the command takes approximately 2 
minutes to complete.

When LANG is set to en_US, the command takes approximately 22 seconds 
to complete.

Version-Release number of selected component (if applicable):
grep-2.5.1-24.1

How reproducible:
Always

Steps to Reproduce:
1. Ensure LANG is set to en_US.UTF-8
2. Have available a large ASCII text file
3. time grep <text string> <filename>
    

Actual Results:  [root]# ls -l tsp04.log
-rw-r--r--    1 cpstsp   tsp      520034711 Sep 24 11:57 tsp04.log

[root]# export LANG=en_US.UTF-8 
[root]# echo $LANG
en_US.UTF-8

[root]# time grep Excess tsp04.log
real    2m3.489s
user    2m1.850s
sys     0m0.850s


Expected Results:  [root]# ls -l tsp04.log
-rw-r--r--    1 cpstsp   tsp      520034711 Sep 24 11:57 tsp04.log

[root]# export LANG=en_US
[root]# echo $LANG
en_US

[root]# time grep Excess tsp04.log
real    0m21.371s
user    0m0.950s
sys     0m1.260s


Additional info:


The system has been updated with all of the latest available packages 
using up2date.
Comment 1 Tim Waugh 2004-09-28 11:38:27 EDT
To put it more precisely: grep -F is slower than it need be.  Have I
understood correctly?

(If there are any special characters, for example ".", there is extra
processing to do in UTF-8 which cannot really be avoided.  For fixed
strings I think it might be possible to avoid UTF-8 processing.)
Comment 2 Paul Zmuda 2004-09-28 13:34:41 EDT
That is correct.

Issuing the above grep command on a Red Hat Linux 7.2 system produces 
basically the results as Red Hat Enterprise Linux 3 with LANG=en_US.  
That is it completes in approximately 22 seconds.  
Comment 3 mike 2004-10-05 16:31:20 EDT
see bug 116145, errata was supposed to resolve but does not at least 
in this case I can reproduce always (on any system with rhel3):

rhel3, grep-2.5.1-24.1

eg:
3.3M file which contains "word" a lot

$ export LANG=en_US
$ time grep -w word file >/dev/null
real    0m0.027s
user    0m0.030s
sys     0m0.000s

$ export LANG=en_US.UTF-8
$ time grep -w word file >/dev/null
real    0m47.790s
user    0m47.770s
sys     0m0.000s
Comment 4 Tim Waugh 2004-11-09 07:38:59 EST
Please try the experimental package 2.5.1-24.1.0.1, located here:

  ftp://people.redhat.com/twaugh/tmp/grep/rhel3/

Thanks.
Comment 5 Paul Zmuda 2004-11-09 08:43:57 EST

[root@tspes3dev2 up2date]# rpm -Uv grep-2.5.1-24.1.0.1.i386.rpm
Preparing packages for installation...
grep-2.5.1-24.1.0.1
[root@tspes3dev2 up2date]# rpm -q -a |grep grep
grep-2.5.1-24.1.0.1

[root@tspes3dev2 cpstsp]# echo $LANG
en_US.UTF-8

[root@tspes3dev2 cpstsp]# time grep Excess tsp04.log

real    0m35.962s
user    0m0.910s
sys     0m1.630s

[root@tspes3dev2 cpstsp]# time grep Excess tsp04.log

real    0m1.691s
user    0m0.980s
sys     0m0.710s

I loaded the patch and "grep" is working as expected.  The 
initial "grep" completed in 35.962 seconds whereas subsequent 
requests within a short period of time completed in 1.692 seconds.

I am satisfied with this patch.
Comment 6 mike 2004-11-09 11:15:01 EST
works for me;
one test that takes 90s with previous grep finishes in 1.3s with this 
version.
Comment 7 Tim Waugh 2004-11-10 06:38:14 EST
2.5.1-24.1.0.2 fixes a bug that can cause false matches:

  ftp://people.redhat.com/twaugh/tmp/grep/rhel3/
Comment 8 Tim Waugh 2004-12-14 11:34:17 EST

*** This bug has been marked as a duplicate of 142807 ***

Note You need to log in before you can comment on or make changes to this bug.