Bug 506525 - [RHEL5] using REG_ICASE with regcomp() can break ranges
[RHEL5] using REG_ICASE with regcomp() can break ranges
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: glibc (Show other bugs)
All Linux
low Severity medium
: rc
: ---
Assigned To: Andreas Schwab
Depends On:
  Show dependency treegraph
Reported: 2009-06-17 12:06 EDT by Jeff Bastian
Modified: 2009-10-28 14:06 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 506521
Last Closed: 2009-10-28 14:06:27 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
test case (716 bytes, text/x-csrc)
2009-06-17 12:08 EDT, Jeff Bastian
no flags Details

External Trackers
Tracker ID Priority Status Summary Last Updated
Sourceware 10290 None None None Never

  None (edit)
Description Jeff Bastian 2009-06-17 12:06:36 EDT
+++ This bug was initially created as a clone of Bug #506521 +++

Description of problem:
Using a regular expression range like [C-a] works fine if compiled with
regcomp() with just the REG_EXTENDED flag, but if the REG_ICASE flag is added
too, regcomp() returns an error "Invalid range end".

Testing other ranges with REG_ICASE reveals:
    [A-Z^-z] is invalid: Invalid range end (11)
    [A-Z^_`a-z] is ok
    [C-a] is invalid: Invalid range end (11)
    [C-f] is ok
    [_-a] is invalid: Invalid range end (11)
    [<-a] is ok
    [z-{] is ok

It appears that regcomp() is capitalizing the range if the REG_ICASE flag is
used, thus [C-a] becomes [C-A] and since A comes before C, the range is invalid.
 Likewise, in locales that match ASCII, ^ becomes before z, but after Z, so
[A-Z^-z] becomes invalid, and _ comes after A but before a, so [_-a] becomes

If this is not considered a bug, then at the very least, the regex(3) man page
should note the side-effects of using REG_ICASE.

Version-Release number of selected component (if applicable):

How reproducible:
every time

Steps to Reproduce:
1. compile a regex range [C-a] or [_-a] with regcomp() and REG_ICASE flag
Actual results:
regcomp() returns an "Invalid range end" error

Expected results:
regcomp() is consistent with or without REG_ICASE flag

Additional info:
If this is considered not a bug, then please update the regcomp(3) man page to note the side effects of using REG_ICASE.

Reported upstream at http://sourceware.org/bugzilla/show_bug.cgi?id=10290
Comment 1 Jeff Bastian 2009-06-17 12:08:54 EDT
Created attachment 348287 [details]
test case
Comment 2 Ulrich Drepper 2009-07-14 18:41:32 EDT
Paolo, could you perhaps look at this?  Or even take ownership of the bug?
Comment 3 Paolo Bonzini 2009-07-14 19:12:58 EDT
If someone can explain me what the correct behavior would be, okay. :-)

For example, [C-a] would be something like [AC-Z]?

One possible test could be to compile GNU sed 4.2.1 with --with-included-regex and see how it behaves (for example with s/[C-a]/foo/i).  Jeff, can you do this?
Comment 4 Paolo Bonzini 2009-07-15 02:23:17 EDT
From http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html

When a standard utility or function that uses regular expressions specifies that pattern matching shall be performed without regard to the case (uppercase or lowercase) of either data or patterns, then when each character in the string is matched against the pattern, not only the character, but also its case counterpart (if any), shall be matched.

So there _is_ a bug.  Fixing it is not easy because regex instead turns everything to lowercase as soon as possible.
Comment 5 Paolo Bonzini 2009-07-15 12:09:59 EDT
(BTW testing sed is useless.  The bug is in a path that is there for both glibc and non-glibc paths).
Comment 6 RHEL Product and Program Management 2009-10-28 14:06:27 EDT
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.