+++ This bug was initially created as a clone of Bug #499220 +++ The problem initially reported as bug 499220 is still present in Fedora 11, grep version "grep-2.5.3-4.fc11.i586" The problem can be reproduced as follows: -------------------------------------- $ for n in `seq 10000` > do > echo "0" >>test.txt >done $ export LANG=en_US.UTF-8 $ time grep [01] test.txt >/dev/null real 0m9.102s user 0m8.419s sys 0m0.021s -------------------------------------- while without utf8 the result is OK: $ export LANG=en_US $ time grep [01] test.txt >/dev/null real 0m0.018s user 0m0.004s sys 0m0.001s -------------------------------------- We have the same results with "[0]" in place of "[01]" as regular expression. This is a nasty bug because it could impacts a lot of system scripts. One note: the same grep command but without the '[' and ']' does not have the problem: $ export LANG=en_US.utf8 $ time grep 0 test.txt >/dev/null real 0m0.009s user 0m0.004s sys 0m0.002s -------------------- It seems that the rpm package includes, among others, the patch "grep-2.5.3-egf-speedup.patch", which is the most relevant in this respect. It fixes some unicode problems, but *not* the one that I am reporting. Here is an extract from that patch: --- extract from grep-2.5.3-egf-speedup.patch --- From aac37e1939632dbc7d2ade6f991af7ce103b0cba Mon Sep 17 00:00:00 2001 From: Tim Waugh <twaugh> Date: Sun, 23 Nov 2008 17:30:59 +0100 Subject: [PATCH] EGF Speedup The full story behind this patch is that grep-2.5.1a does not handle UTF-8 gracefully at all. The basic plan with handling UTF-8 in 2.5.1a is: * whenever a buffer is parsed, go through the entire buffer deciding how many bytes make up each character * use this information when necessary This patch changes that to: * when information about how many bytes make up a character is needed, work it out on demand [...]
The problem is still present in Fedora 12, comment above applies without any change
Fixed in grep-2.5.4-1 in rawhide.
(In reply to comment #2) > Fixed in grep-2.5.4-1 in rawhide. grep-2.5.4-1 solves the problem for me! What I did is - download the .src.rpm of fedora 13 - rpmbuild -ba ... - rpm -U ... - test with utf8 locale --> OK Thank you very much!