Red Hat Bugzilla – Bug 111800
grep probs w/ regex and utf-8
Last modified: 2007-11-30 17:10:34 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)
Description of problem:
grep '^[bs][ir][nc]' <file>
occasionally reports incorrect results.
Version-Release number of selected component (if applicable):
grep > 2.5.1-17
Steps to Reproduce:
1. Download the file foo from the attachment.
grep '^[bs][ir][nc]' foo > out
Each line from file foo is supposed to match the regex, so
wc -l foo
wc -l out
should report the same number of lines
Actual Results: # grep '^[bs][ir][nc]' foo > out && wc -l foo out
In a LANG=en_US.UTF-8 environment, the number of lines differ,
apparently the regex match has failed.
In a C-locale, this issue does not occur:
# LANG=C grep '^[bs][ir][nc]' foo > out && wc -l foo out
Expected Results: Function. The input file is plain ASCII and does
not contain any UTF-8 chars.
grep-2.5.1-17 was not affected.
In FC1, grep-2.5.1-17.2 was the first version of grep to expose this
All versions in rawhide / FC1/development up to and including 2.5.1-22
suffer from this issue.
Created attachment 96442 [details]
input data to reproduce the bug mentioned in the PR
Please try this package:
and let me know whether it fixes the problem for you.
Yes, it seems to fix my problem. At least my testcases don't fail anymore.
An errata has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.