Red Hat Bugzilla – Bug 40571
Last modified: 2016-11-24 10:06:19 EST
Description of Problem:
seems that 'O' is between a-z
Steps to Reproduce:
[kad@bofh kad]$ set|grep -E 'LANG|LC_'
[kad@bofh kad]$ echo O|grep [a-z]
[kad@bofh kad]$ echo O|LANG= grep [a-z]
[kad@bofh kad]$ echo O|LANG=C grep [a-z]
[kad@bofh kad]$ echo O|LANG=ru_RU.KOI8-R grep [a-z]
[kad@bofh kad]$ echo O|LANG=de_DE grep [a-z]
[kad@bofh kad]$ echo O|LANG=uk_UA grep [a-z]
[kad@bofh kad]$ echo O|LANG=netu grep [a-z]
[kad@bofh kad]$ echo O|LANG=en_GB grep [a-z]
[kad@bofh kad]$ echo O|LC_CTYPE= grep [a-z]
O must not be present in any output.
sort make it like that:e
Why do you think it is a bug?
In several locales, O is after n and before p.
If you expect the POSIX locale collation, you should use the POSIX locale,
or e.g. use [[:lower:]] if you really mean all lower case letters.
but why it match [a-z] ?!?! Capital 'O' can be matched by [A-Z] but not on [a-z]
Because if in a particular locale `O' comes between `a' and `z', it fits into
this range and thus matches `[a-z]' regular expression.
E.g. read info grep about regular expressions:
For example, `[[:alnum:]]' means `[0-9A-Za-z]', except the latter
depends upon the POSIX locale and the ASCII character encoding, whereas
the former is independent of locale and character set. (Note that the
brackets in these class names are part of the symbolic names, and must
be included in addition to the brackets delimiting the bracket list.)
As you can see, [a-z] is dependent on the locale (and whether grep has been
built with NLS support).
Looking at regex.c file it seems that there is a pattern to lower case
conversion unless (probably) told otherwise.