138648 – UTF-8 grep character range bug?

Bug 138648 - UTF-8 grep character range bug?

Summary: UTF-8 grep character range bug?

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	grep
Sub Component:
Version:	3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Tim Waugh
QA Contact:	Mike McLean
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-11-10 14:54 UTC by Joe Orton
Modified:	2007-11-30 22:10 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-11-10 15:11:42 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Joe Orton 2004-11-10 14:54:58 UTC

Is this some weird UTF-8 thing or am I being stupid?  Stock FC3 grep:

$ rpm -q grep
grep-2.5.1-31.i386
$ cat out
         U CRYPTO_add_lock
         w __cxa_finalize
         U d2i_PrivateKey
         U d2i_PrivateKey_bio
$ grep ' [A-TV-Z] ' out

Actual behaviour:

$ grep ' [A-TV-Z] ' out
         w __cxa_finalize
$

Expected behaviour as per LANG=C

$ LANG=C grep ' [A-TV-Z] ' out
$

Comment 1 Tim Waugh 2004-11-10 15:11:42 UTC

It's not UTF-8 but locale collation order.  Try LANG=en_GB, for example.

For the particular case you're after I think it's most portably
described by ' [ABCDEFGHIJKLMNOPQRSTVWXYZ] ', believe it or not.

See 'man grep', "Regular Expressions", paragraph 5.

Note You need to log in before you can comment on or make changes to this bug.