Bug 138648

Summary: UTF-8 grep character range bug?
Product: [Fedora] Fedora Reporter: Joe Orton <jorton>
Component: grepAssignee: Tim Waugh <twaugh>
Status: CLOSED NOTABUG QA Contact: Mike McLean <mikem>
Severity: medium Docs Contact:
Priority: medium    
Version: 3   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-11-10 15:11:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joe Orton 2004-11-10 14:54:58 UTC
Is this some weird UTF-8 thing or am I being stupid?  Stock FC3 grep:

$ rpm -q grep
grep-2.5.1-31.i386
$ cat out
         U CRYPTO_add_lock
         w __cxa_finalize
         U d2i_PrivateKey
         U d2i_PrivateKey_bio
$ grep ' [A-TV-Z] ' out

Actual behaviour:

$ grep ' [A-TV-Z] ' out
         w __cxa_finalize
$

Expected behaviour as per LANG=C

$ LANG=C grep ' [A-TV-Z] ' out
$

Comment 1 Tim Waugh 2004-11-10 15:11:42 UTC
It's not UTF-8 but locale collation order.  Try LANG=en_GB, for example.

For the particular case you're after I think it's most portably
described by ' [ABCDEFGHIJKLMNOPQRSTVWXYZ] ', believe it or not.

See 'man grep', "Regular Expressions", paragraph 5.