Bug 40571

Summary:	locale bug?
Product:	[Retired] Red Hat Linux	Reporter:	Alexander Kanevskiy <kad>
Component:	glibc	Assignee:	Jakub Jelinek <jakub>
Status:	CLOSED NOTABUG	QA Contact:	Aaron Brown <abrown>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.1	CC:	fweimer
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2001-05-15 09:52:29 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Alexander Kanevskiy 2001-05-14 18:02:28 UTC

Description of Problem:
seems that 'O' is between a-z

How Reproducible:

Steps to Reproduce:

[kad@bofh kad]$ set|grep -E 'LANG|LC_'
LANG=en_US
LC_CTYPE=ru_RU.KOI8-R
[kad@bofh kad]$ echo O|grep [a-z]
O
[kad@bofh kad]$ echo O|LANG= grep [a-z]
[kad@bofh kad]$ echo O|LANG=C grep [a-z]
[kad@bofh kad]$ echo O|LANG=ru_RU.KOI8-R grep [a-z]
O
[kad@bofh kad]$ echo O|LANG=de_DE grep [a-z]
O
[kad@bofh kad]$ echo O|LANG=uk_UA grep [a-z]
O
[kad@bofh kad]$ echo O|LANG=netu grep [a-z]
[kad@bofh kad]$ echo O|LANG=en_GB grep [a-z]
O
[kad@bofh kad]$ echo O|LC_CTYPE= grep [a-z]
O

Expected Results:
O must not be present in any output.

Additional Information:
sort make it like that:e
a
...
l
m
n
O
o
p
...
z

Comment 1 Jakub Jelinek 2001-05-15 09:43:34 UTC

Why do you think it is a bug?
In several locales, O is after n and before p.
If you expect the POSIX locale collation, you should use the POSIX locale,
or e.g. use [[:lower:]] if you really mean all lower case letters.

Comment 2 Alexander Kanevskiy 2001-05-15 09:52:24 UTC

but why it match [a-z] ?!?! Capital 'O' can be matched by [A-Z] but not on [a-z]

Comment 3 Jakub Jelinek 2001-05-15 09:59:03 UTC

Because if in a particular locale `O' comes between `a' and `z', it fits into
this range and thus matches `[a-z]' regular expression.
E.g. read info grep about regular expressions:
   For example, `[[:alnum:]]' means `[0-9A-Za-z]', except the latter
depends upon the POSIX locale and the ASCII character encoding, whereas
the former is independent of locale and character set.  (Note that the
brackets in these class names are part of the symbolic names, and must
be included in addition to the brackets delimiting the bracket list.)
As you can see, [a-z] is dependent on the locale (and whether grep has been
built with NLS support).

Comment 4 Eugene Kanter 2001-06-24 02:54:51 UTC

Looking at regex.c file it seems that there is a pattern to lower case
conversion unless (probably) told otherwise.