Bug 111800 - grep probs w/ regex and utf-8
grep probs w/ regex and utf-8
Product: Fedora
Classification: Fedora
Component: grep (Show other bugs)
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Tim Waugh
Mike McLean
Depends On:
  Show dependency treegraph
Reported: 2003-12-10 04:58 EST by Ralf Corsepius
Modified: 2007-11-30 17:10 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2003-12-11 05:08:38 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
input data to reproduce the bug mentioned in the PR (71.99 KB, text/plain)
2003-12-10 05:00 EST, Ralf Corsepius
no flags Details

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:079 normal SHIPPED_LIVE Updated grep package speeds UTF-8 searching 2004-09-01 00:00:00 EDT
Red Hat Product Errata RHBA-2004:083 normal SHIPPED_LIVE Updated grep package speeds UTF-8 searching 2004-03-18 00:00:00 EST

  None (edit)
Description Ralf Corsepius 2003-12-10 04:58:01 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)

Description of problem:
grep '^[bs][ir][nc]' <file>

occasionally reports incorrect results.

Version-Release number of selected component (if applicable):
grep > 2.5.1-17

How reproducible:

Steps to Reproduce:
1. Download the file foo from the attachment.
2. Run
grep '^[bs][ir][nc]' foo > out

Each line from file foo is supposed to match the regex, so 
wc -l foo
wc -l out
should report the same number of lines

Actual Results:  # grep '^[bs][ir][nc]' foo > out && wc -l foo out
    445 foo
    397 out
    842 total

In a LANG=en_US.UTF-8 environment, the number of lines differ,
apparently the regex match has failed.

In a C-locale, this issue does not occur:
# LANG=C grep '^[bs][ir][nc]' foo > out && wc -l foo out
    445 foo
    445 out
    890 total

Expected Results:  Function. The input file is plain ASCII and does
not contain any UTF-8 chars.

Additional info:
grep-2.5.1-17 was not affected.

In FC1, grep-2.5.1-17.2 was the first version of grep to expose this
All versions in rawhide / FC1/development up to and including 2.5.1-22
suffer from this issue.
Comment 1 Ralf Corsepius 2003-12-10 05:00:19 EST
Created attachment 96442 [details]
input data to reproduce the bug mentioned in the PR
Comment 2 Tim Waugh 2003-12-10 11:37:02 EST
Please try this package:


and let me know whether it fixes the problem for you.
Comment 3 Ralf Corsepius 2003-12-11 01:30:39 EST
Yes, it seems to fix my problem. At least my testcases don't fail anymore.
Comment 4 Jay Turner 2004-09-01 22:13:27 EDT
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.