From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Q312461; .NET CLR 1.1.4322) Description of problem: When grep searches for ASCII text in a large file (~500M), it can take as long as 2 minutes before the command completes. When LANG is set to en_US.UTF-8, the command takes approximately 2 minutes to complete. When LANG is set to en_US, the command takes approximately 22 seconds to complete. Version-Release number of selected component (if applicable): grep-2.5.1-24.1 How reproducible: Always Steps to Reproduce: 1. Ensure LANG is set to en_US.UTF-8 2. Have available a large ASCII text file 3. time grep <text string> <filename> Actual Results: [root]# ls -l tsp04.log -rw-r--r-- 1 cpstsp tsp 520034711 Sep 24 11:57 tsp04.log [root]# export LANG=en_US.UTF-8 [root]# echo $LANG en_US.UTF-8 [root]# time grep Excess tsp04.log real 2m3.489s user 2m1.850s sys 0m0.850s Expected Results: [root]# ls -l tsp04.log -rw-r--r-- 1 cpstsp tsp 520034711 Sep 24 11:57 tsp04.log [root]# export LANG=en_US [root]# echo $LANG en_US [root]# time grep Excess tsp04.log real 0m21.371s user 0m0.950s sys 0m1.260s Additional info: The system has been updated with all of the latest available packages using up2date.
To put it more precisely: grep -F is slower than it need be. Have I understood correctly? (If there are any special characters, for example ".", there is extra processing to do in UTF-8 which cannot really be avoided. For fixed strings I think it might be possible to avoid UTF-8 processing.)
That is correct. Issuing the above grep command on a Red Hat Linux 7.2 system produces basically the results as Red Hat Enterprise Linux 3 with LANG=en_US. That is it completes in approximately 22 seconds.
see bug 116145, errata was supposed to resolve but does not at least in this case I can reproduce always (on any system with rhel3): rhel3, grep-2.5.1-24.1 eg: 3.3M file which contains "word" a lot $ export LANG=en_US $ time grep -w word file >/dev/null real 0m0.027s user 0m0.030s sys 0m0.000s $ export LANG=en_US.UTF-8 $ time grep -w word file >/dev/null real 0m47.790s user 0m47.770s sys 0m0.000s
Please try the experimental package 2.5.1-24.1.0.1, located here: ftp://people.redhat.com/twaugh/tmp/grep/rhel3/ Thanks.
[root@tspes3dev2 up2date]# rpm -Uv grep-2.5.1-24.1.0.1.i386.rpm Preparing packages for installation... grep-2.5.1-24.1.0.1 [root@tspes3dev2 up2date]# rpm -q -a |grep grep grep-2.5.1-24.1.0.1 [root@tspes3dev2 cpstsp]# echo $LANG en_US.UTF-8 [root@tspes3dev2 cpstsp]# time grep Excess tsp04.log real 0m35.962s user 0m0.910s sys 0m1.630s [root@tspes3dev2 cpstsp]# time grep Excess tsp04.log real 0m1.691s user 0m0.980s sys 0m0.710s I loaded the patch and "grep" is working as expected. The initial "grep" completed in 35.962 seconds whereas subsequent requests within a short period of time completed in 1.692 seconds. I am satisfied with this patch.
works for me; one test that takes 90s with previous grep finishes in 1.3s with this version.
2.5.1-24.1.0.2 fixes a bug that can cause false matches: ftp://people.redhat.com/twaugh/tmp/grep/rhel3/
*** This bug has been marked as a duplicate of 142807 ***