Bug 138648 - UTF-8 grep character range bug?
UTF-8 grep character range bug?
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: grep (Show other bugs)
3
All Linux
medium Severity medium
: ---
: ---
Assigned To: Tim Waugh
Mike McLean
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-11-10 09:54 EST by Joe Orton
Modified: 2007-11-30 17:10 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-11-10 10:11:42 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Joe Orton 2004-11-10 09:54:58 EST
Is this some weird UTF-8 thing or am I being stupid?  Stock FC3 grep:

$ rpm -q grep
grep-2.5.1-31.i386
$ cat out
         U CRYPTO_add_lock
         w __cxa_finalize
         U d2i_PrivateKey
         U d2i_PrivateKey_bio
$ grep ' [A-TV-Z] ' out

Actual behaviour:

$ grep ' [A-TV-Z] ' out
         w __cxa_finalize
$

Expected behaviour as per LANG=C

$ LANG=C grep ' [A-TV-Z] ' out
$
Comment 1 Tim Waugh 2004-11-10 10:11:42 EST
It's not UTF-8 but locale collation order.  Try LANG=en_GB, for example.

For the particular case you're after I think it's most portably
described by ' [ABCDEFGHIJKLMNOPQRSTVWXYZ] ', believe it or not.

See 'man grep', "Regular Expressions", paragraph 5.

Note You need to log in before you can comment on or make changes to this bug.