Bug 40571

Summary: locale bug?
Product: [Retired] Red Hat Linux Reporter: Alexander Kanevskiy <kad>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact: Aaron Brown <abrown>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: fweimer
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-05-15 09:52:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alexander Kanevskiy 2001-05-14 18:02:28 UTC
Description of Problem:
seems that 'O' is between a-z

How Reproducible:

Steps to Reproduce:

[kad@bofh kad]$ set|grep -E 'LANG|LC_'
LANG=en_US
LC_CTYPE=ru_RU.KOI8-R
[kad@bofh kad]$ echo O|grep [a-z]
O
[kad@bofh kad]$ echo O|LANG= grep [a-z]
[kad@bofh kad]$ echo O|LANG=C grep [a-z]
[kad@bofh kad]$ echo O|LANG=ru_RU.KOI8-R grep [a-z]
O
[kad@bofh kad]$ echo O|LANG=de_DE grep [a-z]
O
[kad@bofh kad]$ echo O|LANG=uk_UA grep [a-z]
O
[kad@bofh kad]$ echo O|LANG=netu grep [a-z]
[kad@bofh kad]$ echo O|LANG=en_GB grep [a-z]
O
[kad@bofh kad]$ echo O|LC_CTYPE= grep [a-z]
O

Expected Results:
O must not be present in any output.

Additional Information:
sort make it like that:e
a
...
l
m
n
O
o
p
...
z

Comment 1 Jakub Jelinek 2001-05-15 09:43:34 UTC
Why do you think it is a bug?
In several locales, O is after n and before p.
If you expect the POSIX locale collation, you should use the POSIX locale,
or e.g. use [[:lower:]] if you really mean all lower case letters.

Comment 2 Alexander Kanevskiy 2001-05-15 09:52:24 UTC
but why it match [a-z] ?!?! Capital 'O' can be matched by [A-Z] but not on [a-z]


Comment 3 Jakub Jelinek 2001-05-15 09:59:03 UTC
Because if in a particular locale `O' comes between `a' and `z', it fits into
this range and thus matches `[a-z]' regular expression.
E.g. read info grep about regular expressions:
   For example, `[[:alnum:]]' means `[0-9A-Za-z]', except the latter
depends upon the POSIX locale and the ASCII character encoding, whereas
the former is independent of locale and character set.  (Note that the
brackets in these class names are part of the symbolic names, and must
be included in addition to the brackets delimiting the bracket list.)
As you can see, [a-z] is dependent on the locale (and whether grep has been
built with NLS support).

Comment 4 Eugene Kanter 2001-06-24 02:54:51 UTC
Looking at regex.c file it seems that there is a pattern to lower case
conversion unless (probably) told otherwise.