757887 – memory corruption from regex in some locales

Bug 757887 - memory corruption from regex in some locales

Summary: memory corruption from regex in some locales

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	glibc
Sub Component:
Version:	5.8
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Jeff Law
QA Contact:	qe-baseos-tools-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	730952
Blocks:
TreeView+	depends on / blocked

Reported:	2011-11-28 21:37 UTC by Paolo Bonzini
Modified:	2016-11-24 15:51 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Same as RHEL 6.3 BZ 757888
Clone Of:	730952
Environment:
Last Closed:	2013-01-08 03:45:29 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:0022	0	normal	SHIPPED_LIVE	glibc bug fix and enhancement update	2013-01-08 08:38:20 UTC

Description Paolo Bonzini 2011-11-28 21:37:36 UTC

+++ This bug was initially created as a clone of Bug #730952 +++

The bug happens when a collating symbol exists in the current locale that is composed of the same character multiple times (e.g. aa in nb_NO locales).  Attachment 537445 [details] is a zip file with reproducers from Terje Braten.

In this case, you have something like this:

   %fourier-alt-itaalic -s -0.168exnansi
   0         1
   012345678901234^

with cur_idx pointing to the "a" at &mctx->input.mbs[15], which is also the last character (valid_len = 16).  Bytes after the first "a" are leftovers from previous matching attempts.

"aa" is a multicharacter collation element in the bokmal locale, so re_string_elem_size_at returns 2 and check_node_accept_bytes matches 2 bytes even though there is only one byte in the string.  clean_state_log_if_needed then accesses one item past the allocated memory.

I haven't tested the reproducer on RHEL5/6, but the out-of-bounds access is clear and the code has been mostly unchanged for years; attachment 537575 [details] should apply more or less to all even not-so-recent glibc versions.

Comment 1 Jeff Law 2011-11-29 04:48:38 UTC

So this is the first time I've ever looked at this code, so if my questions are dumb, please bear with me.

First, why not verify things in re_string_elem_size_at in a manner similar to re_string_char_size_at?  Something like:

  findidx (&p)
  ret = p - pstr->mbs - idx;
  if (idx + ret < pstr->valid_len)
    return ret;
  else
    return 1;

I'm guessing we can't check before findidx because we can't know the length of the weight prior to the call to findidx.

Presumably the reason to fix this in findidx to to prevent findidx itself from reading unnecessary (and possibly out of range) chars?

Assuming that's true, then why aren't the other users of findidx potentially buggy in the same way?  (strxfrm_l, strcoll_l, fnmatch_loop, etc).

Comment 2 Paolo Bonzini 2011-11-29 09:13:20 UTC

> So this is the first time I've ever looked at this code, so if my
> questions are dumb, please bear with me.

No problem. :)

> Presumably the reason to fix this in findidx to to prevent findidx itself from
> reading unnecessary (and possibly out of range) chars?

Yes.  For what we know, after the pstr->valid_len-th byte you might get a segv.

Also, in theory you could have collation elements that are more than two character wide.  For example you could have "ab" and "abb".  In that case, here:

  findidx (&p)
  ret = p - pstr->mbs - idx;
  if (idx + ret < pstr->valid_len)
    return ret;
  else
    return 1;

you'd need to return 2 (for "ab").  The simplest way is to reject "abb" in findidx.

> Assuming that's true, then why aren't the other users of findidx potentially
> buggy in the same way?  (strxfrm_l, strcoll_l, fnmatch_loop, etc).

str* accepts a NULL-terminated string, so it will always compare the NULL character before falling out of range.

fnmatch_loop seems buggy.  First of all, there is no guarantee that X in [=X=] is a single byte; it is a single character, but it can be multi-byte.  However, that would be a separate bug.  Second, it can indeed access str[1] out-of-bounds when it calls findidx.  This is fixed with the following patch (likely to be mangled by BZ, but simple enough):

diff --git a/posix/fnmatch_loop.c b/posix/fnmatch_loop.c
index 67c0ee4..e8fc2ef 100644
--- a/posix/fnmatch_loop.c
+++ b/posix/fnmatch_loop.c
@@ -343,7 +343,7 @@
 #ifdef _LIBC
 		else if (c == L('[') && *p == L('='))
 		  {
-		    UCHAR str[1];
+		    UCHAR str[2];
 		    uint32_t nrules =
 		      _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES);
 		    const CHAR *startp = p;
@@ -356,6 +356,7 @@
 			goto normal_bracket;
 		      }
 		    str[0] = c;
+		    str[1] = 0;
 
 		    c = *++p;
 		    if (c != L('=') || p[1] != L(']'))

Comment 3 Jeff Law 2011-11-30 05:05:49 UTC

So just to make sure I understand the code within findidx...  The test
if (cp[cnt] != usrc[cnt]) in the loops within the major THEN/ELSE clauses is always going to read beyond str in the fnmatch_loop referenced above.

Furthermore, NULL termination will cause those loops to exit at the appropriate time.  Thus NULL terminating str in fnmatch_loop per your patch avoids the out of bounds reference and causes findidx to DTRT.

[ Just want to get my head wrapped around that problem, then I'll look at the patch to fix the calls from regex proper. ]

BTW, if you wanted to champion this upstream with Uli, that'd be great since clearly you know the code better than I.  I'm just trying to learn enough that I can make a sensible short-term decision for F16.

jeff

Comment 4 Paolo Bonzini 2011-11-30 08:20:42 UTC

Yes, the analysis is correct.  (Except for one detail: fnmatch_loop's str will not be accessed beyond its end in the common case of findidx exiting before the loop).

I did plan to send the patches upstream sometime this week, a bit swamped right now.

Comment 5 Jeff Law 2011-11-30 21:07:41 UTC

It looks like Andreas fixed this upstream; implementation is different, but it appears to accomplish the same thing.

I'll get it into a testbuild shortly.

Comment 7 RHEL Program Management 2012-04-02 13:09:49 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 10 Patsy Griffin 2012-06-12 02:28:17 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Same as RHEL 6.3 BZ 757888

Comment 12 errata-xmlrpc 2013-01-08 03:45:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0022.html

Note You need to log in before you can comment on or make changes to this bug.