Bug 472457

Summary: Specially crafted Server Side Sort crashes directory server or makes it unresponsive
Product: [Retired] 389 Reporter: Andrey Ivanov <andrey.ivanov>
Component: Database - Indexes/SearchesAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED CURRENTRELEASE QA Contact: Chandrasekar Kannan <ckannan>
Severity: urgent Docs Contact:
Priority: high    
Version: 1.1.3CC: benl, jgalipea, nhosoi, nkinder, rmeggins
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: 8.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-04-29 23:07:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 249650, 467277, 493682    
Attachments:
Description Flags
The perl script causing the server crash
none
The example ldif that should be imported ti reproduce th bug
none
Output from Bugzilla-FDS-Sort.pl
none
Output of valgrind on RHEL5.2 32 bit during the server crash
none
Output of strace for the same RHEL5 32 crash case
none
GDB Stack Trace of the crash
none
cvs diff ldap/servers/slapd/back-ldbm/sort.c
none
cvs diff ldap/servers/slapd/back-ldbm/sort.c
none
cvs commit message none

Description Andrey Ivanov 2008-11-20 22:34:39 UTC
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.0.4) Gecko/2008102920 Firefox/3.0.4

Using a critical reversed server side sort control on two (or more?) attributes with a special ordering rule crashes the server or makes it completely unresponsive. The search filter contains an indexed attribute, the sorting attributes are different from this one.

Reproducible: Always

Steps to Reproduce:
1. Install the fds server from an rpm or recompile it (i tried both)
2. Import the example.ldif file attached to this bug (generated by dbgen.pl with "-n 10000")
3. Run the attached perl script. This script makes a search with a server side sort control (order => "-sn:2.16.840.1.113730.3.3.2.18.1.6 -givenName:2.16.840.1.113730.3.3.2.18.1.6", critica
l => 1 ). As for now, everything seems to be fine. The result is

Binding with simple bind
...Bound...


===========================> Error over here : Initial ldapsearch
        Error : Sort Response Control
        Error name : LDAP_UNAVAILABLE_CRITICAL_EXT
        Error text : A control or matching rule specified in the request is not supported by
the server
        Error description : Critical extension not available
Sort Response Control at Bugzilla-FDS-Sort.pl line 32, <DATA> line 704.

4. Now let us index the ou attribute. Stop the server. Add this to dse.ldif :
dn: cn=ou,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
objectClass: top
objectClass: nsIndex
nsSystemIndex: false
cn: ou
nsIndexType: eq
nsIndexType: pres

create the index on this attribute :
db2index -n userRoot -t ou

Start the ds.

5. Launch again the perl script.

Actual Results:  
Binding with simple bind
...Bound...


===========================> Error over here : Initial ldapsearch
        Error : Unexpected EOF
        Error name : LDAP_OPERATIONS_ERROR
        Error text : Server encountered an internal error
        Error description : Operations error
Unexpected EOF at Bugzilla-FDS-Sort.pl line 32, <DATA> line 680.

In other words, with the indexed 'ou' attribute the server crashes immediately.

Expected Results:  
The server should not crash, neither become unresponsive. Either an error message or the sorted result should be returned.

I tested this on CentOS 5.2 i686 and x86_64 with all the latest patches.

If we inverse the order of sorting for at least one attribute the crash does not happen. If the number of entries in the result is not very large it does not happen. Neither it happens when we use less "exotic" ordering rules like the standard ones.

On i386/686 platform the server simply crashes without any message in error or access logs, on x86_64 platforms the server becomes most of the time unresponsive with a strange line in access log "conn=1 op=-1 fd=64 @3+" (the symbols are each time different and starting from @). At the meantime the telnet to the port 389 works fine but the server does not return anything.

The server usually restarts and recovers its database after this crash. However on x86_64 architecture once the database was so corrupted that the server did not start at all.

Comment 1 Andrey Ivanov 2008-11-20 22:36:45 UTC
Created attachment 324248 [details]
The perl script causing the server crash

Comment 2 Andrey Ivanov 2008-11-20 22:39:18 UTC
Created attachment 324250 [details]
The example ldif that should be imported ti reproduce th bug

Any ldif file sufficiently large should work. This one was generated by "dbgen.pl -g -n 10000 -o example.ldif"

Comment 3 Noriko Hosoi 2008-11-21 00:27:30 UTC
Created attachment 324261 [details]
Output from Bugzilla-FDS-Sort.pl

I followed the steps in comment#1, but I could not reproduce the problem.

I ran the test on Fedora-9 x86_64.  The FDS is my local build from the latest source code.  I'm testing FDS 1.1.3 from the download page on fedoraproject.org.

Comment 4 Noriko Hosoi 2008-11-21 00:35:20 UTC
The perl script works just fine with fedora-ds-base-1.1.3-2.fc9.x86_64 from fedoraproject.org on Fedora-9 x86_64, too.

Comment 5 Noriko Hosoi 2008-11-21 00:49:17 UTC
I tested on RHEL5 x86_64.  It worked fine, too...

Comment 6 Andrey Ivanov 2008-11-21 12:49:05 UTC
I confirm that everything is ok on FC9 x86_64, i have just verified it (using the rpm).

The systems where i tested were CentOS5.2 i686 and x86_64. I'll try RHEL5 and FC9 i686 with the latest patches and then i'll tell you the results.

As for the bug, i can make an .ovf appliance in a virtual machine and upload it somewhere (~1Go) (CentOS5 minimum install + rpm as indicated at http://directory.fedoraproject.org/wiki/Download).

The bug may be really CentOS-specific but CentOS is almost exactly RHEL so it would be strange...

Comment 7 Andrey Ivanov 2008-11-21 15:51:29 UTC
I have verified the bug on RHEL5.2 i386 (minimum installation from DVD, no patches, then the instructions http://directory.fedoraproject.org/wiki/Download). After the rpm installation by "yum install fedora-ds-admin fedora-ds-base" the versions are :

fedora-ds-admin.i386                     1.1.6-1.fc6            installed       
fedora-ds-base.i386                      1.1.3-2.fc6            installed       

[root@ldap-rpm DEVEL]# uname -a
Linux ldap-rpm.polytechnique.fr 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:12 EDT 2008 i686 i686 i386 GNU/Linux

[root@ldap-rpm DEVEL]# rpm -qa |grep release
redhat-release-notes-5Server-12
redhat-release-5Server-5.2.0.4


After that i used setup-ds-admin.pl to install the ds, the admin server and to import the example.ldif.

Then i stopped the server, changed dse.ldif (vim /etc/dirsrv/slapd-ldap-rpm/dse.ldif) as in comment#1. Started and stopped the server, made the indexation by "/usr/lib/dirsrv/slapd-ldap-rpm/db2index -n userRoot -t ou"

After this point the server crashes while using the supplied script.
Selinux may be disabled or enabled, it does not change the result. Interesting error messages for the cases without problem that do not concern however the "problem" oid.

[21/Nov/2008:16:40:24 +0100] - collation_indexer_create: could not set the collator strength for oid 2.16.840.1.113730.3.3.2.0.1 to 0: err -128
[21/Nov/2008:16:40:24 +0100] - collation_indexer_create: could not set the collator decomposition mode for oid 2.16.840.1.113730.3.3.2.0.1 to 17: err -128
[21/Nov/2008:16:40:24 +0100] - collation_indexer_create: could not set the collator strength for oid 2.16.840.1.113730.3.3.2.0.1 to 0: err -128
[21/Nov/2008:16:40:24 +0100] - collation_indexer_create: could not set the collator decomposition mode for oid 2.16.840.1.113730.3.3.2.0.1 to 17: err -128


I patched the system bu yum update. The server still crashes.

It's a virtual machine though my first tests were on physical hardware. So i can make an .ovf appliance if you cannot reproduce the problem

I will try to test RHEL5 x86_64 when i have time.

Comment 8 Noriko Hosoi 2008-11-21 18:17:26 UTC
Thank you,Andrey, for more input!  Since I don't have a RHEL5 32-bit machine handy, I tested on RHEL4 32-bit machine.  Your perl script worked just fine on RHEL4 i386, too...  I'm very curious about your test result on RHEL5 x86_64, on which platform, I could not duplicate the bug...
NOTE: this is the content of my /etc/redhat-release on my RHEL5.
Red Hat Enterprise Linux Server release 5.2 (Tikanga)

In the meantime, could it be possible for you to install fedora-ds-base-debuginfo and attach the ns-slapd process to gdb, then run your test on RHEL5 i386?  When the server crashes, you can get the server stacktrace and the crashed point in gdb.

The second idea is running the server via valgrind.  I usually create a start-slapd.val script applying this diff.  Then, instead of start-slapd, start the server with start-slapd.val.  When some forbidden memory is touched, the utility reports it.  The information is quite helpful for debugging.
--- start-slapd	2008-11-20 16:28:09.000000000 -0800
+++ start-slapd.val	2008-11-21 10:07:29.000000000 -0800
@@ -35,17 +35,17 @@
     PID=`cat $PIDFILE`
     if kill -0 $PID > /dev/null 2>&1 ; then
         echo There is an ns-slapd running: $PID
         exit 2;
     else
         rm -f $PIDFILE
     fi
 fi
-cd /usr/sbin; ./ns-slapd -D /etc/dirsrv/slapd-kiki10 -i $PIDFILE -w $STARTPIDFILE "$@"
+cd /usr/sbin; valgrind --num-callers=32 --tool=memcheck --leak-check=full --show-reachable=yes --leak-resolution=high ./ns-slapd -D /etc/dirsrv/slapd-kiki10 -i $PIDFILE -w $STARTPIDFILE "$@"
 if [ $? -ne 0 ]; then
     exit 1
 fi
 
 loop_counter=1
 # wait for 10 seconds for the start pid file to appear
 max_count=${STARTPID_TIME:-10}
 while test $loop_counter -le $max_count; do

Thank you so much for your help!
--noriko

Comment 9 Andrey Ivanov 2008-11-21 21:45:14 UTC
Maybe for your systems you need to generate a larger example.ldif? When you say the script is working OK you mean you have the sorted result or the error "LDAP_UNAVAILABLE_CRITICAL_EXT"? If it is the second one then you have not indexed on ou by db2index.



I passed valgrind on the server as you asked, the log file is in the attachment, the interesting part concerns the process ==4174==.




I am not a specialist in gdb. That's what i have made :

[root@ldap-rpm slapd-ldap-rpm]# /usr/bin/gdb /usr/sbin/ns-slapd 
GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".

(gdb) set detach-on-fork off
(gdb) run -D /etc/dirsrv/slapd-ldap-rpm -i /var/run/dirsrv/slapd-ldap-rpm.pid -w /var/run/dirsrv/slapd-ldap-rpm.startpid
Starting program: /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-ldap-rpm -i /var/run/dirsrv/slapd-ldap-rpm.pid -w /var/run/dirsrv/slapd-ldap-rpm.startpid
[Thread debugging using libthread_db enabled]
[New Thread -1208383280 (LWP 4412)]
warning: Lowest section in /usr/lib/libicudata.so.36 is .gnu.hash at 000000b4

Program exited normally.
[Switching to process 4415]
(gdb) 

Unfortunately after gdb reconnects to the forked child ([Switching to process 4415]) the ds does not listen on 389 port, so i can't make any tests. Do you have an idea of what i am doing wrong?

Comment 10 Andrey Ivanov 2008-11-21 21:46:11 UTC
Created attachment 324351 [details]
Output of valgrind on RHEL5.2 32 bit during the server crash

Comment 11 Andrey Ivanov 2008-11-21 22:05:14 UTC
Continuing about the gdb :  if i use in gdb "attach procnum" gdb attaches successfully to the process and the server continues to listen to the port 389 but even a simplest ldapsearch never returns...

Comment 12 Andrey Ivanov 2008-11-21 22:16:38 UTC
Oh! I think i've found the "magic" gdb command :)))) It's "continue". But it does not help a lot :

(gdb) continue
Continuing.




Program received signal SIGABRT, Aborted.
[Switching to Thread -1340167280 (LWP 4749)]
0x00f96402 in __kernel_vsyscall ()
(gdb) 
Continuing.

Program terminated with signal SIGABRT, Aborted.
The program no longer exists.
(gdb) 
The program is not being run.
(gdb) 
The program is not being run.
(gdb) thread apply all bt full
(gdb) bt
No stack.

Comment 13 Noriko Hosoi 2008-11-21 22:25:43 UTC
Wow...  No stack...  Also, valgrind does not show any memory error...

BTW, I could borrow RHEL5.2 32-bit VM.  Still, your perl script runs just fine.  I'm puzzled.

Comment 14 Andrey Ivanov 2008-11-21 22:36:44 UTC
I could send you my RHEL5.2 32bit VM where it crashes perfectly:)  Try maybe a larger example.ldif? My VM has 1Gb of memory.

The most intriguing is that i use only standard distribution packages, no unusual 'personalisation'. Btw here is the setup.inf, just in case :

[General]
AdminDomain = polytechnique.fr
SuiteSpotGroup = nobody
ConfigDirectoryLdapURL = ldap://ldap-rpm.polytechnique.fr:389/o=NetscapeRoot
ConfigDirectoryAdminID = admin
SuiteSpotUserID = nobody
ConfigDirectoryAdminPwd = admin
FullMachineName = ldap-rpm.polytechnique.fr

[admin]
ServerAdminID = admin
ServerAdminPwd = {SHA}0DPiKuNIrrVmD8IUCuw1hQxNqZc=\

SysUser = nobody
Port = 9830


[slapd]
InstallLdifFile = /DEVEL/example.ldif
ServerIdentifier = ldap-rpm
ServerPort = 389
AddOrgEntries = No
RootDN = cn=Directory Manager
RootDNPwd = manager101
SlapdConfigForMC = yes
Suffix = dc=example,dc=com
UseExistingMC = 0
AddSampleEntries = Yes



As for RHEL 5.2 64 bit, i'll try to reproduce the bug next week. Maybe the stack and valgrind will be more cooperative... :)

Comment 15 Andrey Ivanov 2008-11-21 22:42:38 UTC
Maybe we can use strace in some way to find out what's happening?

Comment 16 Andrey Ivanov 2008-11-21 22:51:14 UTC
Here are the results of strace -f ./ns-slapd -D /etc/dirsrv/slapd-ldap-rpm -i /var/run/dirsrv/slapd-ldap-rpm.pid -w /var/run/dirsrv/slapd-ldap-rpm.startpid

At the end i see *** stack smashing detected ***... Maybe that's a hint...

Comment 17 Andrey Ivanov 2008-11-21 22:52:45 UTC
Created attachment 324356 [details]
Output of strace for the same RHEL5 32 crash case

Comment 18 Andrey Ivanov 2008-11-21 22:59:15 UTC
It seems to be a gcc stack protection feature, that's what i've found about it after some googling : http://www.de-brauwer.be/wiki/wikka.php?wakka=StackSmash

Comment 19 Noriko Hosoi 2008-11-21 23:09:47 UTC
(In reply to comment #9)
> Maybe for your systems you need to generate a larger example.ldif? When you say
> the script is working OK you mean you have the sorted result 

I get the sorted result.  Please see the attachment in comment #3.  I get the
same result on the other platforms I tested.

> or the error
> "LDAP_UNAVAILABLE_CRITICAL_EXT"? If it is the second one then you have not
> indexed on ou by db2index.

As you suggested, I increased the data size to 100K entries.  (I also needed to
update the nsslapd-idlistscanlimit to the larger value 100000 to cover my data
size.)  I ran your script and got the reverse sorted result on RHEL5.2 32-bit.

Comment 20 Noriko Hosoi 2008-11-21 23:16:50 UTC
(In reply to comment #18)
> It seems to be a gcc stack protection feature, that's what i've found about it
> after some googling : http://www.de-brauwer.be/wiki/wikka.php?wakka=StackSmash

Great!  So you are experiencing stack overflow?  Do you happen to know your system stack size?

Comment 21 Andrey Ivanov 2008-11-22 00:00:59 UTC
By default in RHEL5.2 the core dumps are forbidden in /etc/profile, that's why i could not produce any stack traces with gdb. I have changed the ulimit -c, now it produces cores. Here are the new data from gdb :

(gdb) continue
Continuing.

Program received signal SIGABRT, Aborted.
[Switching to Thread -1339835504 (LWP 1969)]
0x008be402 in __kernel_vsyscall ()
(gdb) bt
#0  0x008be402 in __kernel_vsyscall ()
#1  0x002b9d10 in raise () from /lib/libc.so.6
#2  0x002bb621 in abort () from /lib/libc.so.6
#3  0x002f1e5b in __libc_message () from /lib/libc.so.6
#4  0x00377551 in __stack_chk_fail () from /lib/libc.so.6
#5  0x00ab4154 in __stack_chk_fail_local () from /usr/lib/dirsrv/plugins/libback-ldbm.so
#6  0x00aad8be in sort_log_access (pb=0x86de1f0, s=0x86df710, candidates=0x943b320) at ldap/servers/slapd/back-ldbm/sort.c:138
#7  0x00aa356d in ldbm_back_search (pb=0x86de1f0) at ldap/servers/slapd/back-ldbm/ldbm_search.c:485
#8  0x00648153 in op_shared_search (pb=0x86de1f0, send_result=1) at ldap/servers/slapd/opshared.c:547
#9  0x08069efe in do_search (pb=0x86de1f0) at ldap/servers/slapd/search.c:350
#10 0x0805781f in connection_threadmain () at ldap/servers/slapd/connection.c:532
#11 0x0013b6ed in PR_JoinThread () from /usr/lib/libnspr4.so
#12 0x0015345b in start_thread () from /lib/libpthread.so.0
#13 0x00361c4e in clone () from /lib/libc.so.6


The full stack trace of all the threads is in the attachment. The most interesting seems to be the thread 31 :
Thread 31 (Thread -1339835504 (LWP 1969)):
#0  0x008be402 in __kernel_vsyscall ()
No symbol table info available.
#1  0x002b9d10 in raise () from /lib/libc.so.6
No symbol table info available.
#2  0x002bb621 in abort () from /lib/libc.so.6
No symbol table info available.
#3  0x002f1e5b in __libc_message () from /lib/libc.so.6
No symbol table info available.
#4  0x00377551 in __stack_chk_fail () from /lib/libc.so.6
No symbol table info available.
#5  0x00ab4154 in __stack_chk_fail_local () from /usr/lib/dirsrv/plugins/libback-ldbm.so
No symbol table info available.
#6  0x00aad8be in sort_log_access (pb=0x86de1f0, s=0x86df710, candidates=0x943b320) at ldap/servers/slapd/back-ldbm/sort.c:138
        stack_buffer = "SORT -sn;2.16.840.1.113730.3.3.2.18.1.6 -givenName;2.16.840.1.113730.3.3.2.18.1.6 (206"
        buffer = 0xb0236412 "SORT -sn;2.16.840.1.113730.3.3.2.18.1.6 -givenName;2.16.840.1.113730.3.3.2.18.1.6 (2060)"
        ret = <value optimized out>
        size = 77
#7  0x00aa356d in ldbm_back_search (pb=0x86de1f0) at ldap/servers/slapd/back-ldbm/ldbm_search.c:485
---Type <return> to continue, or q <return> to quit---
        idl = <value optimized out>
        be = (backend *) 0x82ee0d0
        li = (struct ldbminfo *) 0x8284db0
        e = (struct backentry *) 0x84a1c30
        candidates = (IDList *) 0x943b320
        base = 0x82a0980 "dc=example,dc=com"
        basesdn = {flag = 0 '\0', dn = 0x82a0980 "dc=example,dc=com", ndn = 0x82a0980 "dc=example,dc=com", ndn_len = 17}
        scope = 2
        controls = (LDAPControl **) 0x8459108
        operation = (Slapi_Operation *) 0x86e1370
        addr = (entry_address *) 0x86e13dc
        sort = 1
        vlv = 0
        sort_spec = (struct berval *) 0x845253c
        is_sorting_critical = -1
        is_sorting_critical_orig = -1
        sort_control = (sort_spec_thing *) 0x86df710
        virtual_list_view = 0
        vlv_spec = (struct berval *) 0x0
        is_vlv_critical = 0
        vlv_request_control = {beforeCount = 0, afterCount = 0, tag = 0, index = 0, contentCount = 0, value = {bv_len = 0, bv_val = 0x0}}
        sr = (back_search_result_set *) 0x844bfb8
        tmp_err = <value optimized out>
        tmp_desc = <value optimized out>
        lookup_returned_allids = 0
        backend_count = 0
        print_once = 1
#8  0x00648153 in op_shared_search (pb=0x86de1f0, send_result=1) at ldap/servers/slapd/opshared.c:547
        be_suffix = (const Slapi_DN *) 0x825b100
        err = 0
        base = 0x8464c90 "dc=example,dc=com"
        fstr = 0x844f840 "(ou=Product Development)"
        scope = 2
        be = (Slapi_Backend *) 0x82ee0d0
        be_single = (Slapi_Backend *) 0x0
        be_list = {0x82ee0d0, 0x0 <repeats 79 times>, 0x2f9c21, 0x0 <repeats 11 times>, 0x3d1170, 0x0, 0x0, 0x28, 0x3d1154, 0x0, 0x0, 0x0}
        referral_list = {0x0 <repeats 100 times>}
        ebuf = '\0' <repeats 84 times>, "?230w", '\0' <repeats 13 times>, "|\034x\0000?bm?\000\b\213#°?w\000T\213#°°\001", '\0' <repeats 11 times>, "°\000\000\000\000?230w", '\0' <repeats 12 times>, "\004|\034x\0000?b\004\000\000\0008\213#°¶mw\000T\213#°+\213#°\001", '\0' <repeats 18 times>, "\023|\034x\0000?bm?\000\210\214#°*sw\000T\213#°,?b", '\0' <repeats 56 times>, "!\234/", '\0' <repeats 17 times>, "K?, '\0' <repeats 17 times>, "\237\025n\b4\026n\bp\021=\000"...
        attrlistbuf = "\"cn\"", '\0' <repeats 74 times>, "[22/Nov/2008:00:53:41 +0100] ", '\0' <repeats 169 times>, "?230w", '\0' <repeats 13 times>, "|\034x\000@%E\b?h\000?#°?w\000\024¬#°»«#°\001", '\0' <repeats 11 times>, "»«#°\000\000\000\000?230w\000\000\000\000\000K?000\000\000\000\004|\034x\000@%E\b\004\000\000\000w\000\024¬#°?°\001\000\000\000(¬#°\\¬#°\000\000\000\000\000\000\000[|\034x\000@%E\b?h\000H­#°*sw\000\024¬#"...
        attrliststr = 0xb023aa58 "\"cn\""
---Type <return> to continue, or q <return> to quit---
        attrs = (char **) 0x86de350
        rc = -1
        internal_op = 0
        sdn = {flag = 4 '\004', dn = 0x8464c90 "dc=example,dc=com", ndn = 0x8466b48 "dc=example,dc=com", ndn_len = 17}
        operation = (Slapi_Operation *) 0x86e1370
        referral = <value optimized out>
        errorbuf = "\000\000\000\000Pµ\024\000 \000\000\000xKh\000\210j#°\bg\022\000 ", '\0' <repeats 15 times>, "Pµ\024\000\001\000\000\000?j#°?\022\000 ", '\0' <repeats 19 times>, "\030\000\000\000 \000\000\000Pµ\024\000\177Kh\000\177Kh\000?°S&\022\00030000\001", '\0' <repeats 31 times>, "\\m#°00\000\000\000\000\000\000\00000\000\000\000\002", '\0' <repeats 47 times>, "P\"E\b\023", '\0' <repeats 15 times>, "\177Kh\000\\m#°", '\0' <repeats 40 times>, "?(\000?(\b?#°'", '\0' <repeats 17 times>...
        nentries = 0
        pnentries = 0
        flag_search_base_found = 0
        flag_no_such_object = 0
        flag_referral = <value optimized out>
        flag_psearch = 0
        err_code = 0
        ctrlp = (LDAPControl **) 0x8459108
        ctl_value = (struct berval *) 0x0
        iscritical = 0
        be_name = 0x0
        index = 0
        sent_result = 0
#9  0x08069efe in do_search (pb=0x86de1f0) at ldap/servers/slapd/search.c:350
        operation = (Slapi_Operation *) 0x86e1370
        ber = (BerElement *) 0x86e1444
        err = <value optimized out>
        attrsonly = 0
        scope = 2
        deref = 2
        sizelimit = 0
        timelimit = 0
        base = 0x8464c90 "dc=example,dc=com"
        fstr = 0x844f840 "(ou=Product Development)"
        filter = (struct slapi_filter *) 0x82eed18
        attrs = (char **) 0x86de350
        gerattrs = (char **) 0x0
        psearch = 0
        psbvp = (struct berval *) 0x0
        changetypes = 0
        send_entchg_controls = 0
        changesonly = 0
        rc = -1
        original_base = 0x8464c90 "dc=example,dc=com"
        new_base = 0x0
#10 0x0805781f in connection_threadmain () at ldap/servers/slapd/connection.c:532
---Type <return> to continue, or q <return> to quit---
        i = 1
        ret = <value optimized out>
        pb = (Slapi_PBlock *) 0x86de1f0
        interval = 10000
        conn = (Connection *) 0xb0c3fc08
        op = (Operation *) 0x86e1370
        tag = 99
        thread_turbo_flag = 0
        ret = <value optimized out>
        more_data = 0
        replication_connection = 0
#11 0x0013b6ed in PR_JoinThread () from /usr/lib/libnspr4.so
No symbol table info available.
#12 0x0015345b in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#13 0x00361c4e in clone () from /lib/libc.so.6
No symbol table info available.

Comment 22 Andrey Ivanov 2008-11-22 00:02:52 UTC
Created attachment 324368 [details]
GDB Stack Trace of the crash

Comment 23 Andrey Ivanov 2008-11-22 00:08:59 UTC
As for the stack size, i think this is it (10M) :

[root@ldap-rpm slapd-ldap-rpm]# ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 16384
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 16384
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Comment 24 Noriko Hosoi 2008-11-22 00:46:41 UTC
My RHEL5 i386 has the same set of values...  I wonder why only your server hits the stack_chk_fail...???
ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 16383
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 16383
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Did you happen to make any changes in /etc/security/limits.conf?

While I was running your perl script, I stopped ns-slapd in the function sort_log_access in gdb.  The stack trace looks just like yours.  But it just goes through without any problem.

When you install the server, you are using the default configuration values?  Did you make any changes other than adding an index on "ou"?

Comment 25 Noriko Hosoi 2008-11-22 03:04:38 UTC
Created attachment 324395 [details]
cvs diff ldap/servers/slapd/back-ldbm/sort.c

Thank you so much for the good test case, Andrey.  It was a buffer overflow.  The length of the 2 sort specs "-sn;2.16.840.1.113730.3.3.2.18.1.6 -givenName;2.16.840.1.113730.3.3.2.18.1.6 " is just about the prepared buffer size, which is unfortunate since there is no space for the candidate size, e.g., "(1944)" being added later.  By adding the "(1944)" to the static buffer, it caused buffer overflow and crashed your server, I think.  My server should have crashed, too.  For some reason, it did not... :(

Anyway, could you please try the attached patch?

Once again, thank you for the report and helping us to debug this sticky bug!
--noriko

Comment 26 Andrey Ivanov 2008-11-22 14:22:05 UTC
Yes, now i see why it was tricky - the sort specification length should be very close to the buffer size and the number of characters disignating the number of returned entries should be large enough. That's why when i deleted the "-" or took a shorter length oid the crash did not take place.

As for why your server does not crash, i have examined the results of the script you have attached. It seems that you have generated your own example.ldif, that's why, in particular, in my stack trace you can see that the number of concerned entries is "(2060)", not "(1944)" as in your case. So, the sequence of functions and the structure of stack was different in your case, too. Try to import the bzipped example.ldif that i have attached in the very beginning (https://bugzilla.redhat.com/attachment.cgi?id=324250) in lieu of your own example.ldif and make the test in RHEL5.2 32bit. I did not make any changes to the system and /etc/security/limits.conf. Everything is in its "original" rpm post-installation state.
And as i described earlier, on our x86_64 CentOS servers it was more often a hanged server than a crash.

I have just tested your patch on CentOS5.2 x86_64. The patch eliminates the crashes and hangs, so it seems to be good. I will also test it on RHEL5.2 i386 and then i'll tell you the results.

Comment 27 Andrey Ivanov 2008-11-22 15:35:11 UTC
I have rebuilt the new rpm from fedora-ds-base-1.1.3-2.fc6.src.rpm  with the patched sort.c and tested it on RHEL5.2 i386. It works!

Thank you for your patience and persistence while trying to reproduce the bug! And thank you for the amazing responsiveness and the lightning fast availability of the proper patch.

Comment 28 Noriko Hosoi 2008-11-23 18:13:07 UTC
I'm so glad to hear my patch worked on your machines!  Thank you for testing it in this fast manner.  I'm going to test some more cases and send out a review request to the community.  It'll be included in the next release of FDS and RHDS.

Comment 29 Noriko Hosoi 2008-11-24 17:15:52 UTC
Created attachment 324508 [details]
cvs diff ldap/servers/slapd/back-ldbm/sort.c

Cleaned up the previous patch.

Comment 30 Noriko Hosoi 2008-11-24 20:28:10 UTC
Created attachment 324535 [details]
cvs commit message

Reviewed by Rich and Nathan (Thank you!!)

Checked in into CVS HEAD.

Comment 31 Jenny Severance 2009-03-31 19:21:16 UTC
I've imported the attached example.ldif file, but can not execute the attached perl script

[root@jennyv2 472457]# ./Bugzilla-FDS-Sort.pl 
-bash: ./Bugzilla-FDS-Sort.pl: /usr/bin/perl^M: bad interpreter: No such file or directory

Please advise.

Comment 32 Noriko Hosoi 2009-03-31 19:29:02 UTC
Could you attach your Bugzilla-FDS-Sort.pl?

Comment 33 Rich Megginson 2009-03-31 19:29:37 UTC
(In reply to comment #31)
> I've imported the attached example.ldif file, but can not execute the attached
> perl script
> 
> [root@jennyv2 472457]# ./Bugzilla-FDS-Sort.pl 
> -bash: ./Bugzilla-FDS-Sort.pl: /usr/bin/perl^M: bad interpreter: No such file
> or directory
> 
> Please advise.  

dos2unix ./Bugzilla-FDS-Sort.pl

Comment 34 Jenny Severance 2009-03-31 19:31:29 UTC
[root@jennyv2 472457]# dos2unix ./Bugzilla-FDS-Sort.pl
dos2unix: converting file ./Bugzilla-FDS-Sort.pl to UNIX format ...
[root@jennyv2 472457]# ./Bugzilla-FDS-Sort.pl 
Can't locate Net/LDAP.pm in @INC (@INC contains: /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.7/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.7/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.6/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.8/i386-linux-thread-multi /usr/lib/perl5/5.8.8 .) at ./Bugzilla-FDS-Sort.pl line 3.
BEGIN failed--compilation aborted at ./Bugzilla-FDS-Sort.pl line 3.

The perl script is the same one that is attached - I only added execute permissions.

Comment 35 Rich Megginson 2009-03-31 19:38:36 UTC
yum install perl-LDAP

Comment 36 Jenny Severance 2009-03-31 19:56:04 UTC
Verified DS 8.1 RHEL 5 32 Bit

Before indexing ou:

[root@jennyv2 472457]# ./Bugzilla-FDS-Sort.pl 
Binding with simple bind


===========================> Error over here : Unknown
        Error : Invalid credentials
        Error name : LDAP_INVALID_CREDENTIALS
        Error text : The wrong password was supplied or the SASL credentials could not be processed
        Error description : Invalid credentials
Invalid credentials at ./Bugzilla-FDS-Sort.pl line 32, <DATA> line 740.
[root@jennyv2 472457]# vi Bugzilla-FDS-Sort.pl 
[root@jennyv2 472457]# ./Bugzilla-FDS-Sort.pl 
Binding with simple bind
...Bound...


===========================> Error over here : Initial ldapsearch
        Error : Sort Response Control
        Error name : LDAP_UNAVAILABLE_CRITICAL_EXT
        Error text : A control or matching rule specified in the request is not supported by
the server
        Error description : Critical extension not available
Sort Response Control at ./Bugzilla-FDS-Sort.pl line 32, <DATA> line 704.

after indexing:

..........

dn:uid=UAbello5102, ou=People, dc=example,dc=com

cn: Ursa Abello
------------------------------------------------------------------------
dn:uid=SAbdalla6621, ou=People, dc=example,dc=com

cn: Shirlee Abdalla
------------------------------------------------------------------------
dn:uid=HAberneth1678, ou=People, dc=example,dc=com

cn: Hannie Abernethy
------------------------------------------------------------------------
dn:uid=GAbdul-No2077, ou=People, dc=example,dc=com

cn: Gerty Abdul-Nour
------------------------------------------------------------------------
dn:uid=CAbell1644, ou=People, dc=example,dc=com

cn: Carter Abell
------------------------------------------------------------------------
dn:ou=Product Development, dc=example,dc=com

Quitting...

Comment 37 Chandrasekar Kannan 2009-04-29 23:07:59 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0455.html