Bug 326131 - 'egrep' takes a long time to execute
'egrep' takes a long time to execute
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: grep (Show other bugs)
4.0
All Linux
low Severity high
: ---
: ---
Assigned To: Stepan Kasal
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-10 07:08 EDT by Equipe SMART
Modified: 2007-11-30 17:07 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-05 14:41:37 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Equipe SMART 2007-10-10 07:08:22 EDT
Description of problem:

The following command : egrep -hv -f ./flt/flt.$BASE.grep
${REP}/20$3/$2/poster$3$2$1*  | ./prg.pl >> ./tmp/$BASE.txt

takes about 7 minutes to execute on Red Hat Enterprise Linux WS release 4
(Nahant Update 4)

In comparison, it takes less than 1 minute on Red Hat Linux release 7.1 (Seawolf)

In addition, in Red Hat Enterprise Linux WS release 4, 'egrep' is a single link
to 'grep' :
ll /bin/egrep
lrwxrwxrwx  1 root root 4 aoû 10 13:06 /bin/egrep -> grep

In Red Hat Linux release 7.1, 'egrep' and 'grep' are two distinct executables
files :
$ ll /bin/egrep
-rwxr-xr-x    1 root     root        49244 fév 27  2001 /bin/egrep


We tried to execute the command on Red Hat Enterprise Linux WS release 4 with
this version of 'egrep' (2.4.2) instead of the link egrep->grep (2.5.1). It
worked faster : only 3 seconds to execute, for the same result.


Version-Release number of selected component (if applicable):

2.5.1

How reproducible:

By using 'egrep' under Red Hat Enterprise Linux WS release 4, with -hv and -f
options. 

Actual results:

'egrep' takes a long time (>5mn) to execute.

Expected results:

'egrep' takes few seconds to execute.
Comment 1 Stepan Kasal 2007-11-05 14:41:37 EST
grep version 2.5.1, which is contained in RHEL 4, contains a new matcher which
can correctly handle texts encoded in multibyte character sets, esp. UTF-8.
(The support for this in version 2.4.2 was very limited and buggy.)
Unfortunately, the new code is correct, but slow.
And since RHEL 4 uses UTF-8 locale by default, you are actually using the
correct-but-slow code even though you do not need it.

To work aroud this problem, please use the C locale.  For example, put
  export LC_ALL=C
to the top of your shell scripts.
Or you can call
  LC_ALL=C egrep
instead of the original "egrep"; or you can use bash aliases, for example.

We believe that the performace will improve in future builds of grep, but we do
not plan to include those improvements in a RHEL Update, since the stability (in
sense of correctness) is much more important.  We hope you will understand this
decision.

A side note:
you mentioned that in version 2.4.2, egrep was a separate binary, while in 2.5.1
egrep is a symlink.  This only a minor difference:
- previously, egrep and grep were almost identical binaries, compiled from the
same sources, and the only difference was one global constant defining the
default mode of operation;
- with the symlink, the grep binary sets this "constant" according to the name
of the binary.
As you see, in both cases "grep" and "egrep" are effectively the same, and this
change cannot have any performance effects.

Note You need to log in before you can comment on or make changes to this bug.