grep and UTF-8 don't play nicely together. There's an errata package for RHEL 3 - fix hasn't been propagated to HEAD, by the looks. [charlieb@charlieb SOURCES]$ cat /tmp/test a log b log c log d log A log B log C log [charlieb@charlieb SOURCES]$ grep '[A-C]' /tmp/test b log c log A log B log C log [charlieb@charlieb SOURCES]$ echo $LANG en_AU.UTF-8 [charlieb@charlieb SOURCES]$ rpm -q grep grep-2.5.1-31 [charlieb@charlieb SOURCES]$ Hmmm, errata package seems to be broken as well: [root@charlieb charlieb]# rpm -Uhv --oldpackage grep-2.5.1-24.1.i386.rpm Preparing... ########################################### [100%] 1:grep ########################################### [100%] [root@charlieb charlieb]# grep '[A-C]' /tmp/test b log c log A log B log C log [root@charlieb charlieb]# unset LANG [root@charlieb charlieb]# grep '[A-C]' /tmp/test A log B log C log [root@charlieb charlieb]#
The behaviour you cite is correct. For matching upper-case letters, you need to use [[:upper:]], or list them explicitly in a class such as [ABC]. ISO 14651, which is the sorting standard, specifies this behaviour. You can also find some information in the strcoll documentation. IEEE Std 1003.1, 2003 Edition says that grep uses the current locale as the "locale for the behavior of ranges".