+++ This bug was initially created as a clone of Bug #121313 +++ Description of problem: grep is painfully slow on multibyte locales. Slowdown factor >30 observed. Version-Release number of selected component (if applicable): grep-2.5.1-26 How reproducible: ~ $ time LC_CTYPE=en_US.UTF-8 grep '^//PS ' /tmp/r3.log | wc -l 90304 grep : 97.31s user 0.17s system 87% cpu 1:51.15 total ~ $ time LC_CTYPE=C grep '^//PS ' /tmp/r3.log | wc -l 90304 grep : 0.22s user 0.04s system 83% cpu 0.312 total Test file attached later on, and also downloadable from: http://www.loria.fr/~thome/vrac/r3.log.gz It's 40KB gzipped, 2.6MB gunzipped. -- Additional comment from Emmanuel.Thome on 2004-04-20 08:31 EST -- Created an attachment (id=99558) Test file I used -- Additional comment from twaugh on 2004-04-20 12:38 EST -- (2.5.1-26 is a devel package; changing version.) -- Additional comment from twaugh on 2004-04-20 12:48 EST -- The longer-term solution is to make grep use the system regex for multibyte encodings. The GNU libc implementation has quite an efficient implementation now. -- Additional comment from twaugh on 2004-11-08 09:33 EST -- Please try grep-2.5.1-36, available at: http://download.fedora.redhat.com/pub/fedora/linux/core/development/i386/Fedora/RPMS/ -- Additional comment from Emmanuel.Thome on 2004-11-08 09:58 EST -- I'm happy with it. with GREP_USE_DFA set, I observe a 2x slowdown. E. -- Additional comment from twaugh on 2004-11-10 06:39 EST -- grep-2.5.1-37 fixes a problem that can cause false matches. It will be available in the Fedora development tree tomorrow, or at: ftp://people.redhat.com/twaugh/tmp/grep/fedora-core-3/ -- Additional comment from alex on 2005-12-22 12:35 EST -- Nice to know the problem was fixed in Fedora Core. However it seems that grep-2.5.1-31 (RHEL4) still suffers from this problem. Any chance of fixing that one too? Looking at the dates in comments, I kinda expected that there would be new version of grep released as part of U1 or at latest U2. One additional thing. I found that grep is slow if there are many matches. If there are no matches (or just a few of matches), it is fast. For example: LANG=en_US.UTF-8 # Should be default export LANG a=0 while [ $a -lt 30000 ]; do printf "%.9d0\n" $a; a=$(( $a + 1 )) done > testfile.txt echo "Going to be sloooow... Get yourself some coffe" time grep -c '0$' testfile.txt echo "However, this one is fast. Sorry, no time for coffe" time grep -c '1$' testfile.txt It takes about 25 seconds on 2.8GHz Pentium D to run the first grep (jeeez). The second grep (that doesn't match any lines from the file) is fast. Of course, setting LANG to C or en_US solves the problem.
*** This bug has been marked as a duplicate of 179636 ***