Bug 147259 - grep and UTF-8 don't play nicely together.
grep and UTF-8 don't play nicely together.
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: grep (Show other bugs)
4.0
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Tim Waugh
Mike McLean
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-02-05 10:46 EST by Charlie Brady
Modified: 2007-11-30 17:07 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-02-08 05:56:51 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Charlie Brady 2005-02-05 10:46:26 EST
grep and UTF-8 don't play nicely together. There's an errata package
for RHEL 3 - fix hasn't been propagated to HEAD, by the looks.

[charlieb@charlieb SOURCES]$ cat /tmp/test
a log
b log
c log
d log
A log
B log
C log
[charlieb@charlieb SOURCES]$ grep '[A-C]' /tmp/test
b log
c log
A log
B log
C log
[charlieb@charlieb SOURCES]$ echo $LANG
en_AU.UTF-8
[charlieb@charlieb SOURCES]$ rpm -q grep
grep-2.5.1-31
[charlieb@charlieb SOURCES]$

Hmmm, errata package seems to be broken as well:

[root@charlieb charlieb]# rpm -Uhv --oldpackage grep-2.5.1-24.1.i386.rpm
Preparing...               
########################################### [100%]
   1:grep                  
########################################### [100%]
[root@charlieb charlieb]# grep '[A-C]' /tmp/test
b log
c log
A log
B log
C log
[root@charlieb charlieb]# unset LANG
[root@charlieb charlieb]# grep '[A-C]' /tmp/test
A log
B log
C log
[root@charlieb charlieb]#
Comment 1 Tim Waugh 2005-02-08 05:56:51 EST
The behaviour you cite is correct.  For matching upper-case letters, you need to
use [[:upper:]], or list them explicitly in a class such as [ABC].

ISO 14651, which is the sorting standard, specifies this behaviour.  You can
also find some information in the strcoll documentation.

IEEE Std 1003.1, 2003 Edition says that grep uses the current locale as the
"locale for the behavior of ranges".

Note You need to log in before you can comment on or make changes to this bug.