Description of problem: To debug the bug 459440 "rhds71sp3 - ns-slapd crash on 'too many' ABANDON targetop=NOTFOUND", we need a tool to send abandon requests to the server
Created attachment 323274 [details] cvs diff ldclt files Files: ldap/servers/slapd/tools/ldapfct.c ldclt.c ldclt.h ldcltU.c threadMain.c Description: added "-e abandon" Sample usage: ROOTDN="cn=Directory Manager" ROOTDNPW="password" BASEDN="dc=example,dc=com" HOST="host.example.com" PORT=389 DAT=/export/data ldclt -D "$ROOTDN" -w "$ROOTDNPW" -h "$HOST" -p "$PORT" \ -e abandon -e random -b "$BASEDN" -f "uid=emailperson_XXXXX" \ -q -n 2 -r 10000 -R 99999 -I 32 -e inetOrgPerson -e imagesdir=$DAT Sample access log: [..] conn=68 op=16297 SRCH base="dc=example,dc=com" scope=2 filter="(uid=emailperson_78683)" attrs=ALL [..] conn=68 op=16297 RESULT err=0 tag=101 nentries=1 etime=6 [..] conn=68 op=16298 ABANDON targetop=NOTFOUND msgid=16298 [..] conn=68 op=16299 SRCH base="dc=example,dc=com" scope=2 filter="(uid=emailperson_15947)" attrs=ALL [..] conn=68 op=16299 RESULT err=0 tag=101 nentries=1 etime=11 [..] conn=68 op=16300 ABANDON targetop=NOTFOUND msgid=16300
The code looks fine. Your sample access logs show the abandons happening after the operation has already completed. Will we be able to have the abandons actually abandon active operations if we put enough load against the server with your changes? It is important to be able to stress that code path in the server as well. I suppose this will also depend if you pass in a complex enough search that it will take some time for the server to process it (an unindexed subtree search maybe).
I'm issuing the abandon request between the async search and receiving the result. Actually, pretty close to the search request in the code. As you wrote, it's quite difficult for the exact match search to give enough time to abandon the operation. The case I wanted to simulate was also the searches are fast and abandons come after the results: [..] conn=22840 op=94993 SRCH base="ou=Autofs5, ou=Application Services, ou=engineering, o=example, c=US" scope=2 filter="(&(|(objectClass=automount))(|(automountKey=\2a)))" attrs="automountKey automountInformation automountKey" [..] conn=22840 op=94993 RESULT err=0 tag=101 nentries=2 etime=0 [..] conn=22840 op=94994 ABANDON targetop=NOTFOUND msgid=94994 [..] conn=22840 op=94995 SRCH base="ou=Autofs5, ou=Application Services, ou=engineering, o=example, c=US" scope=2 filter="(&(|(objectClass=automount))(|(automountKey=.DS_Store)))" attrs="automountKey automountInformation automountKey" [..] conn=22840 op=94995 RESULT err=0 tag=101 nentries=0 etime=0 [..] conn=22840 op=94996 ABANDON targetop=NOTFOUND msgid=94996 So, I think this is good for the test case. I think I'm going to introduce another option(maybe?) to replace the search filter with the unindexed search. Thanks for the useful suggestion, Nathan!
(In reply to comment #5) > So, I think this is good for the test case. I think I'm going to introduce > another option(maybe?) to replace the search filter with the unindexed search. I think we'd be able to just pass in a different filter in the "-f" option that is an unindexed attribute. Even though it is an exact match, we should still see some abandons take place before the server is finished processing if the database is large enough and we have enough load. With the current ldclt options, it is possible to do a search using a wildcard in the filter?
Looks good. Do we have a way to send ABANDON requests for bogus msgids?
Thank you, Nathan and Rich, for reviewing the code I'm trying to run these cases which somehow "hangs" either in ber_flush or poll. :( ldclt -D "$ROOTDN" -w "$ROOTDNPW" -h "$HOST" -p "$PORT" \ -e abandon -e random -b "$BASEDN" -f "uid=emailperson_*" \ -q -n 2 -r 10000 -R 99999 -I 32 -e inetOrgPerson -e imagesdir=$DAT ldclt -D "$ROOTDN" -w "$ROOTDNPW" -h "$HOST" -p "$PORT" \ -e abandon -e random -b "$BASEDN" -f "jpegphoto=*" \ -q -n 2 -r 10000 -R 99999 -I 32 -e inetOrgPerson -e imagesdir=$DAT I'm hoping it could be solved by tuning the timeout parameter and/or ldap_result options... > Do we have a way to send ABANDON requests for bogus msgids? Not now. I'm going to add it.
(In reply to comment #8) > > Do we have a way to send ABANDON requests for bogus msgids? > Not now. I'm going to add it. I tried to add the feature, but unfortunately, LDAP C SDK is so smart that it does not send bogus requests. The library checks the request history in the LDAP handle and get the msgids in it. If the msgid to abandon is not in the msgids from the request history, the request won't be sent. 125 static int 126 do_abandon( LDAP *ld, int origid, int msgid, LDAPControl **serverctrls, 127 LDAPControl **clientctrls ) 128 { [...] 143 /* 144 * Find the request that we are abandoning. Don't send an 145 * abandon message unless there is something to abandon. 146 */
Ok. If we need to test sending bogus abandons, we can do that with the ldapbertest.pl thing.
Created attachment 323816 [details] vcs diff ldclt files Made some minor changes to doAbandon inldapfct.c. I'm attaching the new diffs.
Created attachment 323817 [details] cvs commit messages Thanks to Rich and Nathan for their reviews and comments. Checked in into CVS HEAD.
There was no duration listed here, but I ran the following script for about 1 hour without errors. ROOTDN="cn=Directory Manager" ROOTDNPW="pw" BASEDN="dc=example,dc=com" HOST="epsilon.dsdev.sjc.redhat.com" PORT=389 DAT=/export/data ldclt -D "$ROOTDN" -w "$ROOTDNPW" -h "$HOST" -p "$PORT" \ -v -e abandon -e random -b "$BASEDN" -f "uid=emailperson_XXXXX" \ -q -n 8 -r 10000 -R 99999 -I 32 -e inetOrgPerson -e imagesdir=$DAT verified against 8.1.0-0.6.el5dsrv
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0455.html