Bug 326131 - 'egrep' takes a long time to execute
Summary: 'egrep' takes a long time to execute
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: grep   
(Show other bugs)
Version: 4.0
Hardware: All
OS: Linux
Target Milestone: ---
: ---
Assignee: Stepan Kasal
QA Contact:
Depends On:
TreeView+ depends on / blocked
Reported: 2007-10-10 11:08 UTC by Equipe SMART
Modified: 2007-11-30 22:07 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-11-05 19:41:37 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

Description Equipe SMART 2007-10-10 11:08:22 UTC
Description of problem:

The following command : egrep -hv -f ./flt/flt.$BASE.grep
${REP}/20$3/$2/poster$3$2$1*  | ./prg.pl >> ./tmp/$BASE.txt

takes about 7 minutes to execute on Red Hat Enterprise Linux WS release 4
(Nahant Update 4)

In comparison, it takes less than 1 minute on Red Hat Linux release 7.1 (Seawolf)

In addition, in Red Hat Enterprise Linux WS release 4, 'egrep' is a single link
to 'grep' :
ll /bin/egrep
lrwxrwxrwx  1 root root 4 aoû 10 13:06 /bin/egrep -> grep

In Red Hat Linux release 7.1, 'egrep' and 'grep' are two distinct executables
files :
$ ll /bin/egrep
-rwxr-xr-x    1 root     root        49244 fév 27  2001 /bin/egrep

We tried to execute the command on Red Hat Enterprise Linux WS release 4 with
this version of 'egrep' (2.4.2) instead of the link egrep->grep (2.5.1). It
worked faster : only 3 seconds to execute, for the same result.

Version-Release number of selected component (if applicable):


How reproducible:

By using 'egrep' under Red Hat Enterprise Linux WS release 4, with -hv and -f

Actual results:

'egrep' takes a long time (>5mn) to execute.

Expected results:

'egrep' takes few seconds to execute.

Comment 1 Stepan Kasal 2007-11-05 19:41:37 UTC
grep version 2.5.1, which is contained in RHEL 4, contains a new matcher which
can correctly handle texts encoded in multibyte character sets, esp. UTF-8.
(The support for this in version 2.4.2 was very limited and buggy.)
Unfortunately, the new code is correct, but slow.
And since RHEL 4 uses UTF-8 locale by default, you are actually using the
correct-but-slow code even though you do not need it.

To work aroud this problem, please use the C locale.  For example, put
  export LC_ALL=C
to the top of your shell scripts.
Or you can call
  LC_ALL=C egrep
instead of the original "egrep"; or you can use bash aliases, for example.

We believe that the performace will improve in future builds of grep, but we do
not plan to include those improvements in a RHEL Update, since the stability (in
sense of correctness) is much more important.  We hope you will understand this

A side note:
you mentioned that in version 2.4.2, egrep was a separate binary, while in 2.5.1
egrep is a symlink.  This only a minor difference:
- previously, egrep and grep were almost identical binaries, compiled from the
same sources, and the only difference was one global constant defining the
default mode of operation;
- with the symlink, the grep binary sets this "constant" according to the name
of the binary.
As you see, in both cases "grep" and "egrep" are effectively the same, and this
change cannot have any performance effects.

Note You need to log in before you can comment on or make changes to this bug.