Bug 176488 - grep SLOW on multibyte LC_CTYPE
grep SLOW on multibyte LC_CTYPE
Status: CLOSED DUPLICATE of bug 179636
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: grep (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Tim Waugh
Mike McLean
:
Depends On: 121313
Blocks:
  Show dependency treegraph
 
Reported: 2005-12-23 08:27 EST by Tim Waugh
Modified: 2007-11-30 17:07 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-02-13 06:08:30 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Tim Waugh 2005-12-23 08:27:27 EST
+++ This bug was initially created as a clone of Bug #121313 +++

Description of problem:

grep is painfully slow on multibyte locales. Slowdown factor >30 observed.

Version-Release number of selected component (if applicable):

grep-2.5.1-26

How reproducible:

~ $ time LC_CTYPE=en_US.UTF-8 grep  '^//PS ' /tmp/r3.log  | wc -l
90304
grep : 97.31s user 0.17s system 87% cpu 1:51.15 total

~ $ time LC_CTYPE=C grep  '^//PS ' /tmp/r3.log  | wc -l
90304
grep : 0.22s user 0.04s system 83% cpu 0.312 total


Test file attached later on, and also downloadable from:

http://www.loria.fr/~thome/vrac/r3.log.gz

It's 40KB gzipped, 2.6MB gunzipped.

-- Additional comment from Emmanuel.Thome@inria.fr on 2004-04-20 08:31 EST --
Created an attachment (id=99558)
Test file I used


-- Additional comment from twaugh@redhat.com on 2004-04-20 12:38 EST --
(2.5.1-26 is a devel package; changing version.)

-- Additional comment from twaugh@redhat.com on 2004-04-20 12:48 EST --
The longer-term solution is to make grep use the system regex for
multibyte encodings.  The GNU libc implementation has quite an
efficient implementation now.

-- Additional comment from twaugh@redhat.com on 2004-11-08 09:33 EST --
Please try grep-2.5.1-36, available at:

http://download.fedora.redhat.com/pub/fedora/linux/core/development/i386/Fedora/RPMS/


-- Additional comment from Emmanuel.Thome@inria.fr on 2004-11-08 09:58 EST --

I'm happy with it.

with GREP_USE_DFA set, I observe a 2x slowdown.

E.

-- Additional comment from twaugh@redhat.com on 2004-11-10 06:39 EST --
grep-2.5.1-37 fixes a problem that can cause false matches.  It will
be available in the Fedora development tree tomorrow, or at:

  ftp://people.redhat.com/twaugh/tmp/grep/fedora-core-3/

-- Additional comment from alex@milivojevic.org on 2005-12-22 12:35 EST --
Nice to know the problem was fixed in Fedora Core.  However it seems that
grep-2.5.1-31 (RHEL4) still suffers from this problem.  Any chance of fixing
that one too?  Looking at the dates in comments, I kinda expected that there
would be new version of grep released as part of U1 or at latest U2.

One additional thing.  I found that grep is slow if there are many matches.  If
there are no matches (or just a few of matches), it is fast.

For example:

LANG=en_US.UTF-8   # Should be default
export LANG
a=0
while [ $a -lt 30000 ]; do
  printf "%.9d0\n" $a; a=$(( $a + 1 ))
done > testfile.txt
echo "Going to be sloooow...  Get yourself some coffe"
time grep -c '0$' testfile.txt
echo "However, this one is fast.  Sorry, no time for coffe"
time grep -c '1$' testfile.txt

It takes about 25 seconds on 2.8GHz Pentium D to run the first grep (jeeez). 
The second grep (that doesn't match any lines from the file) is fast.  Of
course, setting LANG to C or en_US solves the problem.
Comment 3 Tim Waugh 2006-02-13 06:08:30 EST

*** This bug has been marked as a duplicate of 179636 ***

Note You need to log in before you can comment on or make changes to this bug.