Bug 757887 - memory corruption from regex in some locales
Summary: memory corruption from regex in some locales
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: glibc
Version: 5.8
Hardware: x86_64
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Jeff Law
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On: 730952
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-28 21:37 UTC by Paolo Bonzini
Modified: 2016-11-24 15:51 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Same as RHEL 6.3 BZ 757888
Clone Of: 730952
Environment:
Last Closed: 2013-01-08 03:45:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:0022 0 normal SHIPPED_LIVE glibc bug fix and enhancement update 2013-01-08 08:38:20 UTC

Description Paolo Bonzini 2011-11-28 21:37:36 UTC
+++ This bug was initially created as a clone of Bug #730952 +++

The bug happens when a collating symbol exists in the current locale that is composed of the same character multiple times (e.g. aa in nb_NO locales).  Attachment 537445 [details] is a zip file with reproducers from Terje Braten.

In this case, you have something like this:

   %fourier-alt-itaalic -s -0.168exnansi
   0         1
   012345678901234^

with cur_idx pointing to the "a" at &mctx->input.mbs[15], which is also the last character (valid_len = 16).  Bytes after the first "a" are leftovers from previous matching attempts.

"aa" is a multicharacter collation element in the bokmal locale, so re_string_elem_size_at returns 2 and check_node_accept_bytes matches 2 bytes even though there is only one byte in the string.  clean_state_log_if_needed then accesses one item past the allocated memory.

I haven't tested the reproducer on RHEL5/6, but the out-of-bounds access is clear and the code has been mostly unchanged for years; attachment 537575 [details] should apply more or less to all even not-so-recent glibc versions.

Comment 1 Jeff Law 2011-11-29 04:48:38 UTC
So this is the first time I've ever looked at this code, so if my questions are dumb, please bear with me.

First, why not verify things in re_string_elem_size_at in a manner similar to re_string_char_size_at?  Something like:

  findidx (&p)
  ret = p - pstr->mbs - idx;
  if (idx + ret < pstr->valid_len)
    return ret;
  else
    return 1;

I'm guessing we can't check before findidx because we can't know the length of the weight prior to the call to findidx.

Presumably the reason to fix this in findidx to to prevent findidx itself from reading unnecessary (and possibly out of range) chars?

Assuming that's true, then why aren't the other users of findidx potentially buggy in the same way?  (strxfrm_l, strcoll_l, fnmatch_loop, etc).

Comment 2 Paolo Bonzini 2011-11-29 09:13:20 UTC
> So this is the first time I've ever looked at this code, so if my
> questions are dumb, please bear with me.

No problem. :)

> Presumably the reason to fix this in findidx to to prevent findidx itself from
> reading unnecessary (and possibly out of range) chars?

Yes.  For what we know, after the pstr->valid_len-th byte you might get a segv.

Also, in theory you could have collation elements that are more than two character wide.  For example you could have "ab" and "abb".  In that case, here:

  findidx (&p)
  ret = p - pstr->mbs - idx;
  if (idx + ret < pstr->valid_len)
    return ret;
  else
    return 1;

you'd need to return 2 (for "ab").  The simplest way is to reject "abb" in findidx.

> Assuming that's true, then why aren't the other users of findidx potentially
> buggy in the same way?  (strxfrm_l, strcoll_l, fnmatch_loop, etc).

str* accepts a NULL-terminated string, so it will always compare the NULL character before falling out of range.

fnmatch_loop seems buggy.  First of all, there is no guarantee that X in [=X=] is a single byte; it is a single character, but it can be multi-byte.  However, that would be a separate bug.  Second, it can indeed access str[1] out-of-bounds when it calls findidx.  This is fixed with the following patch (likely to be mangled by BZ, but simple enough):

diff --git a/posix/fnmatch_loop.c b/posix/fnmatch_loop.c
index 67c0ee4..e8fc2ef 100644
--- a/posix/fnmatch_loop.c
+++ b/posix/fnmatch_loop.c
@@ -343,7 +343,7 @@
 #ifdef _LIBC
 		else if (c == L('[') && *p == L('='))
 		  {
-		    UCHAR str[1];
+		    UCHAR str[2];
 		    uint32_t nrules =
 		      _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES);
 		    const CHAR *startp = p;
@@ -356,6 +356,7 @@
 			goto normal_bracket;
 		      }
 		    str[0] = c;
+		    str[1] = 0;
 
 		    c = *++p;
 		    if (c != L('=') || p[1] != L(']'))

Comment 3 Jeff Law 2011-11-30 05:05:49 UTC
So just to make sure I understand the code within findidx...  The test
if (cp[cnt] != usrc[cnt]) in the loops within the major THEN/ELSE clauses is always going to read beyond str in the fnmatch_loop referenced above.

Furthermore, NULL termination will cause those loops to exit at the appropriate time.  Thus NULL terminating str in fnmatch_loop per your patch avoids the out of bounds reference and causes findidx to DTRT.

[ Just want to get my head wrapped around that problem, then I'll look at the patch to fix the calls from regex proper. ]

BTW, if you wanted to champion this upstream with Uli, that'd be great since clearly you know the code better than I.  I'm just trying to learn enough that I can make a sensible short-term decision for F16.

jeff

Comment 4 Paolo Bonzini 2011-11-30 08:20:42 UTC
Yes, the analysis is correct.  (Except for one detail: fnmatch_loop's str will not be accessed beyond its end in the common case of findidx exiting before the loop).

I did plan to send the patches upstream sometime this week, a bit swamped right now.

Comment 5 Jeff Law 2011-11-30 21:07:41 UTC
It looks like Andreas fixed this upstream; implementation is different, but it appears to accomplish the same thing.

I'll get it into a testbuild shortly.

Comment 7 RHEL Program Management 2012-04-02 13:09:49 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 10 Patsy Griffin 2012-06-12 02:28:17 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Same as RHEL 6.3 BZ 757888

Comment 12 errata-xmlrpc 2013-01-08 03:45:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0022.html


Note You need to log in before you can comment on or make changes to this bug.