Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 182621

Summary: Allow larger regex buffer to enable long substring filters
Product: [Retired] 389 Reporter: Nathan Kinder <nkinder>
Component: Database - Indexes/SearchesAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED ERRATA QA Contact: Chandrasekar Kannan <ckannan>
Severity: medium Docs Contact:
Priority: high    
Version: 1.0CC: benl, pauloviolada, ulf.weltman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2008-0268 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-09 17:23:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 249650, 444712    
Attachments:
Description Flags
regex.c
none
string.c
none
stacktraces of the server when the srch scripts are executed
none
cvs commit messages
none
valgrind output when the above testcases are run
none
cvs commit message (Directory71RtmBranch)
none
Patch result none

Description Nathan Kinder 2006-02-23 18:36:32 UTC
If you search for ("sn=abcdefg...500 characters here...*"), DS returns no
entries and the result was LDAP_SUCCESS (0) even when there are entries with sn
that have a long value that matches the substring.  An error is logged in the
errors log.

Analysis:
The substring filter code uses a BUFSIZ char array to hold the pattern it
constructs from the filter value.  For example, searching for ("sn=foo*oo*bar*")
will set the pattern buffer to "^foo.*ooo.*bar.*".  This buffer is then passed
to the old regular expression compilation function slapd_re_comp, which compiles
it into a finite automaton that it sticks in a static buffer.  Then it calls
slapd_re_exec with values found in the entries, for example,
slapd_re_exec("fooABCooDEFbarXYZ").  If this matches, the filter was true and
the entry is returned to the client.

The reason the pattern buffer is of fixed size is because the static buffer in
the old regex code is of fixed size.  Simply expanding our pattern buffer will
cause the regex buffer to clobber the stack.  Enhancing the old regex code to
use a dynamic buffer isn't trivial, it's a little tricky to figure out how much
space we will need.  For example, a pattern like "foo[a-z]bar" uses something
like 22 bytes: 2 for each regular character, 20 for the character class. 

Instead of using POSIX regex, below are updates to the old regex to make the old
global buffer a global malloced buffer.  As I iterate over the pattern I check
whether we're approaching the end and double its size when we're 20 bytes away
(which is how much the NFA might grow in one iteration).

I moved the slapd_re_init function to the end so the globals I need for
initializing the buffer will be declared.  They were cozy all around the file
and I didn't want to move them up top.  Maybe they should be moved to regex.h,
or maybe this stuff should just be scrapped for a more standard implementation
when we're bored sometime...

Index: slapd/regex.c
===================================================================
RCS file: /repository/source/ldap2/ldapserver/ldap/servers/slapd/regex.c,v
retrieving revision 1.1.1.1
diff -U8 -r1.1.1.1 regex.c
--- regex.c     2005/07/08 23:25:38     1.1.1.1
+++ regex.c     2006/02/07 02:43:03
@@ -33,41 +33,16 @@
 *
 * Copyright (C) 2001 Sun Microsystems, Inc. Used by permission.
 * Copyright (C) 2005 Red Hat, Inc.
 * All rights reserved.
 * END COPYRIGHT BLOCK **/
#include "slap.h"      /* must come before regex.h */
#include "portable.h"

-static PRLock  *regex_mutex = NULL;
-
-int
-slapd_re_init( void )
-{
-       if ( NULL == regex_mutex ) {
-               regex_mutex = PR_NewLock();
-       }
-       return( NULL == regex_mutex ? -1 : 0 );
-}
-
-void
-slapd_re_lock( void )
-{
-       PR_ASSERT( NULL != regex_mutex );
-       PR_Lock( regex_mutex );
-}
-
-int
-slapd_re_unlock( void )
-{
-       PR_ASSERT( NULL != regex_mutex );
-       return( PR_Unlock( regex_mutex ) );
-}
-
#if defined( MACOS ) || defined( DOS ) || defined( _WIN32 ) || defined(
NEED_BSDREGEX )
#include "regex.h"

/*
 * regex - Regular expression pattern matching  and replacement
 *
 * By:  Ozan S. Yigit (oz)
 *      Dept. of Computer Science
@@ -437,17 +412,18 @@
#define ASCIIB 0177

typedef unsigned char UCHAR;
/* char, on the other hand, may be signed or unsigned;
 * it's platform-dependent.  A hard fact of life, in C.
 */

static int  tagstk[MAXTAG];             /* subpat tag stack..*/
-static UCHAR nfa[MAXNFA];              /* automaton..       */
+static UCHAR *nfa = NULL;              /* automaton..       */
+static int  nfasize = MAXNFA;          /* tracks size of nfa buffer */
static int  sta = NOP;                 /* status of lastpat */

static UCHAR bittab[BITBLK];           /* bit table for CCL */
                                       /* pre-set bits...   */
static UCHAR bitarr[] = {1,2,4,8,16,32,64,128};

#ifdef DEBUG
static void nfadump( UCHAR *ap);
@@ -481,16 +457,31 @@
               if (sta)
                       return 0;
               else
                       return badpat("No previous regular expression");
        }
       sta = NOP;

       for (p = (UCHAR*)pat; *p; p++) {
+               /* Check if we are approaching end of nfa buffer. 20 bytes is
+                  the max we might add to the nfa per loop. */
+               if (mp - (UCHAR*)nfa + 20 >= nfasize) {
+                       /* Save offsets */
+                       long mppos = mp - nfa;
+                       long sppos = sp - nfa;
+
+                       /* Double the nfa buffer size */
+                       nfasize *= 2;
+                       nfa = (UCHAR*)slapi_ch_realloc((char*)nfa, nfasize);
+
+                       /* Restore pointers into realloced space */
+                       mp = nfa + mppos;
+                       sp = nfa + sppos;
+               }
               lp = mp;
               switch(*p) {

               case '.':               /* match any char..  */
                       store(ANY);
                       break;

               case '^':               /* match beginning.. */
@@ -1082,8 +1073,40 @@
               default:
                       printf("bad nfa. opcode %o\n", ap[-1]);
                       exit(1);
                       break;
               }
}
#endif
#endif /* MACOS or DOS or NEED_BSDREGEX */
+
+static PRLock  *regex_mutex = NULL;
+
+int
+slapd_re_init( void )
+{
+       if ( NULL == regex_mutex ) {
+               regex_mutex = PR_NewLock();
+       }
+
+       if ( NULL == nfa ) {
+               nfa = (UCHAR*)slapi_ch_malloc( MAXNFA );
+       }
+
+       return( NULL == regex_mutex ? -1 : 0 );
+}
+
+void
+slapd_re_lock( void )
+{
+       PR_ASSERT( NULL != regex_mutex );
+       PR_Lock( regex_mutex );
+}
+
+int
+slapd_re_unlock( void )
+{
+       PR_ASSERT( NULL != regex_mutex );
+       return( PR_Unlock( regex_mutex ) );
+}
+
+

Index: plugins/syntaxes/string.c
===================================================================
RCS file:
/repository/source/ldap2/ldapserver/ldap/servers/plugins/syntaxes/string.c,v
retrieving revision 1.1.1.1
diff -U8 -r1.1.1.1 string.c
--- string.c    2005/07/08 23:25:37     1.1.1.1
+++ string.c    2006/02/07 02:26:58
@@ -178,87 +178,96 @@

       return( rc );
}

int
string_filter_sub( Slapi_PBlock *pb, char *initial, char **any, char *final,
    Slapi_Value **bvals, int syntax )
{
-       int             i, j, rc;
-       char            *p, *end, *realval, *tmpbuf;
+       int             i, j, rc, size=0;
+       char            *p, *end, *realval, *tmpbuf, *bigpat = NULL;
       size_t          tmpbufsize;
       char            pat[BUFSIZ];
       char            buf[BUFSIZ];
       char            ebuf[BUFSIZ];

       LDAPDebug( LDAP_DEBUG_FILTER, "=> string_filter_sub\n",
           0, 0, 0 );
-
       /*
        * construct a regular expression corresponding to the
        * filter and let regex do the work for each value
        * XXX should do this once and save it somewhere XXX
        */
       pat[0] = '\0';
       p = pat;
       end = pat + sizeof(pat) - 2;    /* leave room for null */
+
       if ( initial != NULL ) {
-               value_normalize( initial, syntax, 1 /* trim leading blanks */ );
-               strcpy( p, "^" );
-               p = strchr( p, '\0' );
-               /* 2 * in case every char is special */
-               if ( p + 2 * strlen( initial ) > end ) {
-                       LDAPDebug( LDAP_DEBUG_ANY, "not enough pattern space\n",
-                           0, 0, 0 );
-                       return( -1 );
+               size = strlen( initial ) + 1; /* add 1 for "^" */
+       }
+
+       if ( any != NULL ) {
+               i = 0;
+               while( any[i]) {
+                       size += strlen(any[i++]) + 2; /* add 2 for ".*" */
               }
+       }
+
+       if ( final != NULL ) {
+                size += strlen( final ) + 3; /* add 3 for ".*" and "$" */
+       }
+
+       size *= 2; /* doubled in case all filter chars need escaping */
+       size++; /* add 1 for null */
+
+       if ( p + size > end ) {
+               bigpat = slapi_ch_malloc( size );
+               p = bigpat;
+       }
+
+       if ( initial != NULL ) {
+               value_normalize( initial, syntax, 1 /* trim leading blanks */ );
+               *p++ = '^';
               filter_strcpy_special( p, initial );
               p = strchr( p, '\0' );
       }
       if ( any != NULL ) {
               for ( i = 0; any[i] != NULL; i++ ) {
                       value_normalize( any[i], syntax, 0 /* DO NOT trim leading
blanks */ );
                       /* ".*" + value */
-                       if ( p + 2 * strlen( any[i] ) + 2 > end ) {
-                               LDAPDebug( LDAP_DEBUG_ANY,
-                                   "not enough pattern space\n", 0, 0, 0 );
-                               return( -1 );
-                       }
                       strcpy( p, ".*" );
-                       p = strchr( p, '\0' );
+                       *p++ = '.';
+                       *p++ = '*';
                       filter_strcpy_special( p, any[i] );
                       p = strchr( p, '\0' );
               }
       }
       if ( final != NULL ) {
               value_normalize( final, syntax, 0 /* DO NOT trim leading blanks */ );
-               /* ".*" + value */
-               if ( p + 2 * strlen( final ) + 2 > end ) {
-                       LDAPDebug( LDAP_DEBUG_ANY, "not enough pattern space\n",
-                           0, 0, 0 );
-                       return( -1 );
-               }
-               strcpy( p, ".*" );
-               p = strchr( p, '\0' );
+               *p++ = '.';
+               *p++ = '*';
               filter_strcpy_special( p, final );
-               p = strchr( p, '\0' );
-               strcpy( p, "$" );
+               strcat( p, "$" );
       }

       /* compile the regex */
+       p = (bigpat) ? bigpat : pat;
       slapd_re_lock();
-       if ( (p = slapd_re_comp( pat )) != 0 ) {
+       if ( (tmpbuf = slapd_re_comp( p )) != 0 ) {
               LDAPDebug( LDAP_DEBUG_ANY, "re_comp (%s) failed (%s)\n",
-                   pat, p, 0 );
+                   pat, tmpbuf, 0 );
               slapd_re_unlock();
-               return( -1 );
+               if( bigpat != NULL ) {
+                       slapi_ch_free((void**)&bigpat );
+               }
+               return( LDAP_OPERATIONS_ERROR );
       } else {
-               LDAPDebug( LDAP_DEBUG_TRACE, "re_comp (%s)\n",
-                                  escape_string( pat, ebuf ), 0, 0 );
+               LDAPDebug( LDAP_DEBUG_TRACE, "re_comp (%s)\n",
+                                  escape_string( p, ebuf ), 0, 0 );
       }

       /*
        * test the regex against each value
        */
       rc = -1;
       tmpbuf = NULL;
       tmpbufsize = 0;
@@ -289,16 +298,19 @@
                       rc = 0;
                       break;
               }
       }
       slapd_re_unlock();
       if ( tmpbuf != NULL ) {
               slapi_ch_free((void**)&tmpbuf );
       }
+       if( bigpat != NULL ) {
+               slapi_ch_free((void**)&bigpat );
+       }

       LDAPDebug( LDAP_DEBUG_FILTER, "<= string_filter_sub %d\n",
           rc, 0, 0 );
       return( rc );
}

int
string_values2keys( Slapi_PBlock *pb, Slapi_Value **bvals,

Comment 1 Ulf Weltman 2006-02-23 23:48:54 UTC
Created attachment 125147 [details]
regex.c

Updated fix to regex.c, use ptrdiff_t to store the offsets to be restored after
the realloc, and use a constant for the value of "how much the NFA buffer can
grow in one iteration on the pattern".

Comment 2 Ulf Weltman 2006-02-23 23:51:29 UTC
Created attachment 125148 [details]
string.c

Updated fix to string.c, used wrong pointer (pat instead of p) in a debug
message, and performed an unneeded strcat of ".*"

Comment 5 Noriko Hosoi 2008-04-28 20:50:21 UTC
*** Bug 443955 has been marked as a duplicate of this bug. ***

Comment 6 Noriko Hosoi 2008-04-28 20:56:59 UTC
Steps to reproduce the problem
1. script to create a test entry; add the entry to the server
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
echo dn: uid=tuser1, dc=example,dc=com
echo objectClass: top
echo objectClass: person
echo objectClass: organizationalPerson
echo objectClass: inetOrgPerson
echo cn: test user1
echo sn: user0123456789012345678901234567890123456789012345678901234567890123456789
i=0
while [ $i -lt 100 ]
do
echo "
01234567890123456789012345678901234567890123456789012345678901234567890123456789"
i=`expr $i + 1`
done
echo uid: tuser1
echo givenName: test
echo description: test user 0
echo userPassword: tuser1tuser1
echo mail: tuser1
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

2. scripts to generate search commands:
# 2-1 output: srch10.sh
t="0123456789"
T=$t$t$t$t$t$t$t$t$t$t
FILTER="$T$T$T$T$T$T*"
echo ldapsearch -T -p \$\1 -D \"cn=Directory Manager\" -w \$2 -b
\"dc=example,dc=com\" \"\(sn=*$t*\)\"
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# 2-2 output: srch100.sh
t="0123456789"
T=$t$t$t$t$t$t$t$t$t$t
FILTER="$T$T$T$T$T$T$T$T$T$T*"
echo ldapsearch -T -p \$\1 -D \"cn=Directory Manager\" -w \$2 -b
\"dc=example,dc=com\" \"\(sn=*$t*\)\"
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# 2-3 output: srchcomplex.sh
t="0123456789"
T=$t$t$t$t$t$t$t$t$t$t
FILTER="$T$T$T$T$T$T$T$T$T$T*"
echo ldapsearch -T -p \$\1 -D \"cn=Directory Manager\" -w \$2 -b
\"dc=example,dc=com\"
\"\(\&\(\|\(objectClass=inetorgperson\)\(objectClass=person\)\)\(\|\(sn=*$FILTER*\)\(mail=*bogus*\)\)\)\"

3. run the srch scripts.
Attaching the stacktraces of the server next.

Comment 7 Noriko Hosoi 2008-04-28 20:58:19 UTC
Created attachment 304042 [details]
stacktraces of the server when the srch scripts are executed

Comment 8 Noriko Hosoi 2008-04-28 21:00:47 UTC
Applied Ulf's patches found in comment #1 and #2, the test cases in the comment
#6 do not crash the server and the server returns the matched entry.

Comment 9 Noriko Hosoi 2008-04-29 00:40:22 UTC
Created attachment 304058 [details]
cvs commit messages

Checked in the Ulf's patches into HEAD as well as Directory_Server_8_0_Branch.

Comment 10 Noriko Hosoi 2008-04-29 19:43:09 UTC
Created attachment 304153 [details]
valgrind output when the above testcases are run

No memory corruption nor leak from regex are observed.

Comment 13 Noriko Hosoi 2008-04-30 21:03:25 UTC
Created attachment 304274 [details]
cvs commit message (Directory71RtmBranch)

The patches have been checked in with the patch inf files.

Comment 14 Rich Megginson 2008-05-05 21:15:28 UTC
Paulo, would you be able to build from source and try to reproduce your crash
with the proposed patch?

Comment 15 Paulo Alberto 2008-05-06 17:35:19 UTC
Created attachment 304665 [details]
Patch result

Comment 16 Paulo Alberto 2008-05-06 17:36:51 UTC
I don`t know if I`m doing something wrong, but the regex.c patch result was:

--------------------------
|Index: string.c
|===================================================================
|RCS file:
/repository/source/ldap2/ldapserver/ldap/servers/plugins/syntaxes/string.c,v
|retrieving revision 1.1.1.1
|diff -U8 -r1.1.1.1 string.c
|--- string.c   2005/07/08 23:25:37     1.1.1.1
|+++ string.c   2006/02/23 23:47:17
--------------------------
File to patch:
/rpm/sources/fedora-ds-base-1.1.0/ldap/servers/plugins/syntaxes/string.c
patching file
/rpm/sources/fedora-ds-base-1.1.0/ldap/servers/plugins/syntaxes/string.c
Hunk #1 succeeded at 184 (offset 6 lines).





--------------------------
|Index: regex.c
|===================================================================
|RCS file: /repository/source/ldap2/ldapserver/ldap/servers/slapd/regex.c,v
|retrieving revision 1.1.1.1
|diff -U8 -r1.1.1.1 regex.c
|--- regex.c    2005/07/08 23:25:38     1.1.1.1
|+++ regex.c    2006/02/08 21:39:11
--------------------------
File to patch: /rpm/sources/fedora-ds-base-1.1.0/ldap/servers/slapd/regex.c
patching file /rpm/sources/fedora-ds-base-1.1.0/ldap/servers/slapd/regex.c
Hunk #1 FAILED at 33.
Hunk #2 succeeded at 386 (offset 12 lines).
Hunk #4 succeeded at 475 (offset 12 lines).
1 out of 5 hunks FAILED -- saving rejects to file
/rpm/sources/fedora-ds-base-1.1.0/ldap/servers/slapd/regex.c.rej

Comment 23 errata-xmlrpc 2008-05-09 17:23:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0268.html


Comment 24 Rich Megginson 2008-06-24 17:01:09 UTC
*** Bug 448900 has been marked as a duplicate of this bug. ***