Description of problem: Version-Release number of selected component (if applicable): fedora-ds-1.0.2-1.Linux How reproducible: create a suitable DIT and search it (see attachmend for ldif) Steps to Reproduce: 1. create dit as per attached ldif 2. perform a subtree search with base ou=Type3products,ou=LINKTEST,o=dmetests and filter e.g. (objectclass=*) 3. some items not found. Actual results: The search completes successfully but any object under the item with ou=#,+"\>:=<<>;/ is not found. This happens with any filter, and with the search started from any point above the (strangely named) item. Renaming the item with dn: ou=\#\,\+\"\\\>:\=\<\<\>\;/,ou=Type3products,ou=LINKTEST,o=dmetests to something more sensible makes the search work, but the point of this is to make sure the directory copes with special characters -- we can't always make our customers name things sensibly. The equivalent test works with CA's eTrust Directory and OpenLDAP. Expected results: matching dit entries should always be found. Additional info:
Created attachment 132912 [details] ldif export of a directory subtree which illustrates the problem.
Did you create this DIT with a database import or with LDAP ADD?
Neither -- the data was created programatically using LDAP, so it shouldn't be broken.
Created attachment 138371 [details] output of "dbscan -f .../id2entry.db4" This is the dbscan output of ancestorid.db4. $ dbscan -r -f .../ancestorid.db4 =1 2 3 5 =2 5 The id 4 entry is not indexed by ancestorid. That's why the subtree search fails to return the entry. Keep investigating...
Created attachment 138375 [details] sample ldif Additional info: It should be related to the original problem. But if I try to import the attached ldif file, the entry under the ou with the escaped characters fails to be added: [12/Oct/2006:13:25:54 -0700] - import userRoot: WARNING: Skipping entry "uid=TVradmin0, ou=\#\,\+\"\\\>:\=\<\<\>\;/, dc=example,dc=com" which has no parent, ending at line 56 of file "/path/test0.ldif" So, I had to run ldapadd to add just the entry, where the ancestorid was not indexed.
Created attachment 139778 [details] sample ldif step to reproduce: 1) stop slapd 2) import the sample ldif file 3) run subtree search with the filter "(ou=*)" and the attribute list "dn" expected result: search result dn: ou=Accounting, dc=sfbay,dc=redhat,dc=com actual result: search result dn: ou=Accounting, dc=sfbay,dc=redhat,dc=com dn: ou=\#\,\+\"\\\>:\=\<\<\>\;/, dc=sfbay,dc=redhat,dc=com dn: uid=TVradmin0, ou=\#\,\+\"\\\>:\=\<\<\>\;/, dc=sfbay,dc=redhat,dc=com
Created attachment 139786 [details] cvs diffs (ldapserver) Files: ldap/servers/slapd/back-ldbm/backentry.c ldap/servers/slapd/back-ldbm/import-threads.c \ ldap/servers/slapd/back-ldbm/ldbm_add.c \ ldap/servers/slapd/back-ldbm/proto-back-ldbm.h Bug description: The directory server internally normalizes the dn and uses the normalized dn to compare. In some cases, the server was normalizing the dn more than once. Normally, the normalized string is identical to the renormalized dn. { i.e., normalized_dn == normalize(normalize_dn) } The problem is, if the dn contains '\\' (not just the escape character, but it contains some real backslash characters: dn: ou=\#\,\+\"\\\>:\=\<\<\>\;/, dc=example,dc=com), the string is transformed to the one which is not considered to be the same as the original. (the normalization is done in the function substr_dn_normalize in slapd/dn.c; if you are interested in, the function has a detailed comment). Changes: Avoid re-normalizing an already normalized dn. It should be better from the performance point of view, as well. Note: When running ldapsearch having such attribute values containing '"' in it, we should use single quote to quote the string. Otherwise, the command line interface consumes the backslash before '"' and mismatched string is passed to the server and returns "no such object". $ dapsearch -p <port> -D "cn=Directory Manager" -w <pw> -b 'ou=\#\,\+\"\\\>:\=\<\<\>\;/, dc=example,dc=com' '(ou=\#\,\+\"\\\>:\=\<\<\>\;/)' version: 1 dn: ou=\#\,\+\"\\\>:\=\<\<\>\;/, dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: \#\,\+\"\\\>:\=\<\<\>\;/ ou: #,+"\>:=<<>;/ $ ldapsearch -p <port> -D "cn=Directory Manager" -w <pw> -b "ou=\#\,\+\"\\\>:\=\<\<\>\;/, dc=example,dc=com" "(ou=\#\,\+\"\\\>:\=\<\<\>\;/)" ldap_search: No such object
Ok. The code says: bv.bv_val = (void*)backentry_get_dn(ep); /* normalized later */ Where later is it normalized?
These changes make me a little uneasy. From a client of the api perspective I want to ask for a normalized dn when I need one, not rely on behaviour that happens before or after what I do. Can we not have some flag for whether the dn is normalized somewhere?
Actually, I had the same fear and tried to add a flag. And I looked this slapi_dn structure... When the dn is normalized, ndn has the value. If not, it's NULL. So, the knowledge is already there. struct slapi_dn { unsigned char flag; const char *dn; /* DN */ const char *ndn; /* Case Normalised DN */ int ndn_len; /* normalize dn len */ }; The problem is using the current APIs, we have to pass just the dn string like these and no way to tell if the string is already normalized or not. /* insert into the entrydn index */ - bv.bv_val = (void*)backentry_get_ndn(fi->entry); /* jcm - Had to cast away const */ + bv.bv_val = (void*)backentry_get_dn(fi->entry); /* jcm - Had to cast away const */ - addr.dn = (char*)slapi_sdn_get_ndn (&parentsdn); + addr.dn = (char*)slapi_sdn_get_dn (&parentsdn); If we pass Slapi_DN ("struct slapd_dn") as much as we can, it might have been easier to control it, but it'd require more API changes... Do we want to go that way? Please note that, with this change, I did not see any degradation in the acceptance test level.
That's what Slapi_DN is supposed to do - normalize once, and be able to get normalized and "raw" DN where needed. We should use Slapi_DN as much as possible. We might want to schedule some time to do some much needed API revisions to eliminate the use of char *dn in favor of Slapi_DN *dn. But not for 7.2.
Created attachment 139882 [details] stacktraces (In reply to comment #8) > Ok. The code says: > bv.bv_val = (void*)backentry_get_dn(ep); /* normalized later */ > > Where later is it normalized? Thanks, Rich. Please take a look at the stacktraces I attached. The first one is the location I put "normalized later", where attribute type entrydn and its value "ou=\\#\\,\\+\\\"\\\\\\>:\\=\\<\\<\\>\\;/, dc=example,dc=com" are stored in the entry. The second one is the place that entrydn value is normalized before adding to the entrydn index file. The third one is for the ancestorid index. If in the first stacktrace, add_update_entrydn_operational_attributes sets normalized dn as the entrydn value, it's normalized again at the second stacktrace, which stores the doubly-normalized dn in the entrydn index file. That makes the slapi_sdn_compare fail in the third stacktrace and the ancestorid index broken. That explains the ancestorid index problem I described in the Comment #4.
(In reply to comment #11) > That's what Slapi_DN is supposed to do - normalize once, and be able to get > normalized and "raw" DN where needed. We should use Slapi_DN as much as > possible. We might want to schedule some time to do some much needed API > revisions to eliminate the use of char *dn in favor of Slapi_DN *dn. But not > for 7.2. I opened a new bug: Bugzilla Bug 213301: RFE: eliminate the use of char *dn in favor of Slapi_DN *dn which blocks Bugzilla Bug 183294: Tracking bug for Directory Server 7.3.
Created attachment 139932 [details] cvs diff back-ldbm/ldbm_delete.c File: ldapserver/ldap/servers/slapd/back-ldbm/ldbm_delete.c Change: Avoid normalizing a dn twice.
Ok.
Created attachment 140068 [details] cvs diffs (ldapserver - revised) Files: servers/plugins/syntaxes/string.c servers/slapd/attrlist.c servers/slapd/entry.c servers/slapd/proto-slap.h servers/slapd/slap.h servers/slapd/slapi-plugin.h servers/slapd/value.c servers/slapd/valueset.c servers/slapd/back-ldbm/back-ldbm.h servers/slapd/back-ldbm/import-threads.c servers/slapd/back-ldbm/index.c servers/slapd/back-ldbm/ldbm_add.c servers/slapd/back-ldbm/ldbm_delete.c servers/slapd/back-ldbm/ldbm_modrdn.c Description: Definitely, this is a worm can... :( It turned out the previous try did not work for modrdn. With the minimum changes, I tried to make the prinary index and secondery indexes consistent, but it was not the right thing to do. This new proposal is somehow closer to what Pete suggested: setting a normalized flag if the value is already normalized. Since the value is stored in Slapi_Value, I had to add a flags field to it to pass the information around. So, you have to be responsible to set the flag if the value is normalized (mainly for syntax dn, that is, entrydn, only though). I'm going to add test cases to our test suite. Here's the list to be added... 1-1. import ldif files including escaped characters in the non-leaf node (e.g., ou) 1-2. search subtree search (from the top, from the non-leaf, and the leaf) 2-1. add an entry with escaped characters, and add additional one as a subordinate of the first entry 2-2. search subtree search (from the top, from the non-leaf, and the leaf) 3-1. delete the leaf and non-leaf node having the escaped characters 3-2. add them back 3-2. search subtree search (from the top, from the non-leaf, and the leaf) 4-1. run modrdn test program to do modrdn the leaf node. 4-2. search subtree search (from the top, from the non-leaf, and the leaf) Manual tests of these cases have passed. And the acceptance test suite showed the same pass rate.
Sorry, I have to ask for the review, again...
The changes look fine. You should open up a doc bug to get the new slapi_* functions documented in the plugin guide.
Created attachment 140515 [details] cvs commit message Reviewed by Nathan (Thank you!!) Checked in into HEAD.
(In reply to comment #18) > The changes look fine. You should open up a doc bug to get the new slapi_* > functions documented in the plugin guide. I opened a bug for doc: 214285: New slapi functions introduced by a subtree search problem
Created attachment 273041 [details] ldif containing '\' entry
(In reply to comment #22) > Created an attachment (id=273041) [edit] > ldif containing '\' entry > I imported this ldif file and verified the entry via the directory console. Changed status to VERIFIED
Created attachment 273271 [details] Correct ldif file which contains funny character entry
as of March 04, 2010, this is happening again. Noriko is looking into this one right now. Reopen it by yzhang
Created attachment 398612 [details] git diff File: ldap/servers/plugins/syntaxes/validate.c ldap/servers/slapd/dn.c Problem Description: A simple failed case observed before applying the patch: $ /usr/lib64/mozldap/ldapmodify -p 10389 -D 'cn=directory manager' -w pw << EOF dn: ou=\#\<,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: \#\< EOF ldap_add: Invalid DN syntax ldap_add: additional info: DN value invalid per syntax Fix Description: dn.c: Based upon RFC 4514, '#', '+', ';', '<', '>', and '=' need to be escaped in addition to '\\' and '"' if it appears in the DN string. validate.c: Using the above example, if an escaped character (\<) followed by an escaped character (\#), the pointer was moved twice skipping '\' before '<' and it makes the validation fail. ====================================================== Breakpoint 2, rdn_validate ( begin=0x7fd090001ed0 "ou=\\#\\<,dc=example,dc=com", end=0x7fd090001ee8 "m", last=0x7fd0a9bedac0) at ldap/servers/plugins/syntaxes/validate.c:430 430 int rc = 0; /* Assume RDN is valid */ (gdb) p p $35 = 0x7fd090001ed3 "\\#\\<,dc=example,dc=com" (gdb) p end $36 = 0x7fd090001ee8 "m" (gdb) p *p $37 = 92 '\\' (gdb) n 472 if (numericform) { (gdb) n 498 if (IS_UTF1(*p) && !IS_ESC(*p) && !IS_LUTF1(*p)) { (gdb) n 507 if (numericform) { (gdb) n 517 if (IS_UTF1(*p)) { (gdb) n 520 if ((p == end) && !IS_TUTF1(*p)) { (gdb) n 524 } else if (IS_ESC(*p)) { (gdb) n 528 p++; <== *p is '#' (gdb) n 529 if (!IS_ESC(*p) && !IS_SPECIAL(*p)) { (gdb) n 538 p++; <== move the pointer to the next char '\\' (gdb) p *p $40 = 92 '\\' (gdb) n 545 p++; <== another move to '<', which needs to be escaped (gdb) n 517 if (IS_UTF1(*p)) { (gdb) n 520 if ((p == end) && !IS_TUTF1(*p)) { (gdb) n 524 } else if (IS_ESC(*p)) { (gdb) n 540 } else if (!IS_SUTF1(*p)) { (gdb) n 541 rc = 1; <== failed.
Comment on attachment 398612 [details] git diff A '#' character should not always be escaped. It only needs to be escaped if it's at the beginning of the value portion of a RDN. The removal of the increment looks correct, but it'd be nice to see if any other tests show some case that we are overlooking before commiting the fix.
Created attachment 399189 [details] git patch file Files: ldap/servers/plugins/syntaxes/validate.c ldap/servers/slapd/back-ldbm/ldbm_add.c ldap/servers/slapd/dn.c Fix Description: dn.c: Based upon RFC 4514, the following characters in the RDN values need to be escaped: '+', ';', '<', '>', and '=' for the intermediate characters '+', ';', '<', '>', '=', '#' and ' ' for leading characters '+', ';', '<', '>', '=', and ' ' for trailing characters validate.c: If an escaped character followed by another escaped character, e.g., \#\<, the pointer was moved twice skipping '\' before '<' and it makes the validation fail. ldbm_add.c: a local variable addr was not initialized. Thanks to Nathan for his review. I revised dn.c based upon his review comments.
Created attachment 399190 [details] test ldifs Steps to verify: 1) add the entries in the testcases.ldif file: $ ldapmodify -D 'cn=directory manager' -w <pw> -a -f testcases.ldif 2) expected result: $ ldapsearch -b "dc=example,dc=com" "(objectclass=organizationalUnit)" dn: ou=\#,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: \# ou: # dn: ou=\#\#,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: \#\# ou: ## dn: ou=\#\<,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: \#\< ou: #< dn: ou=\;\,,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: \;\, ou: ;, dn: ou=\#\ ,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou:: XCNcIA== ou: # dn: ou=\#\,\+\"\\\>:\=\<\<\>\;/,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: \#\,\+\"\\\>:\=\<\<\>\;/ ou: #,+"\>:=<<>;/ dn: ou=\ \,\#\+\"\\\>:\=\<\<\>\;/\ ,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou:: XCBcLFwjXCtcIlxcXD46XD1cPFw8XD5cOy9cIA== ou:: ICwjKyJcPjo9PDw+Oy8=
(In reply to comment #29) > Created an attachment (id=399190) [details] > test ldifs > > Steps to verify: > 1) add the entries in the testcases.ldif file: > $ ldapmodify -D 'cn=directory manager' -w <pw> -a -f testcases.ldif > > 2) expected result: > $ ldapsearch -b "dc=example,dc=com" "(objectclass=organizationalUnit)" > dn: ou=\#,dc=example,dc=com > objectClass: organizationalUnit > objectClass: top > ou: \# > ou: # The escaped version of the "ou" attribute value should not be present in the above test. The same applies to the rest of the tests. > > dn: ou=\#\#,dc=example,dc=com > objectClass: organizationalUnit > objectClass: top > ou: \#\# > ou: ## > > dn: ou=\#\<,dc=example,dc=com > objectClass: organizationalUnit > objectClass: top > ou: \#\< > ou: #< > > dn: ou=\;\,,dc=example,dc=com > objectClass: organizationalUnit > objectClass: top > ou: \;\, > ou: ;, > > dn: ou=\#\ ,dc=example,dc=com > objectClass: organizationalUnit > objectClass: top > ou:: XCNcIA== > ou: # > > dn: ou=\#\,\+\"\\\>:\=\<\<\>\;/,dc=example,dc=com > objectClass: organizationalUnit > objectClass: top > ou: \#\,\+\"\\\>:\=\<\<\>\;/ > ou: #,+"\>:=<<>;/ > > dn: ou=\ \,\#\+\"\\\>:\=\<\<\>\;/\ ,dc=example,dc=com > objectClass: organizationalUnit > objectClass: top > ou:: XCBcLFwjXCtcIlxcXD46XD1cPFw8XD5cOy9cIA== > ou:: ICwjKyJcPjo9PDw+Oy8=
(In reply to comment #30) > (In reply to comment #29) > > Created an attachment (id=399190) [details] [details] > > test ldifs > > > > Steps to verify: > > 1) add the entries in the testcases.ldif file: > > $ ldapmodify -D 'cn=directory manager' -w <pw> -a -f testcases.ldif > > > > 2) expected result: > > $ ldapsearch -b "dc=example,dc=com" "(objectclass=organizationalUnit)" > > dn: ou=\#,dc=example,dc=com > > objectClass: organizationalUnit > > objectClass: top > > ou: \# > > ou: # > > The escaped version of the "ou" attribute value should not be present in the > above test. The same applies to the rest of the tests. I'm a little concerned that if the entry looks like this, and if we search with the filter "(ou=\#)" (let's assume it's indexed on ou), would this entry be found? Shouldn't the search with both (ou=#) and (ou=\#) work? Or (ou=\#) does not matter? dn: ou=\#,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: #
The value of "ou" is simply "#". My understanding is that the escaped form is only necessary for displaying the value in the DN. Following this logic, the search filter of "(ou=\#)" should not return the entry.
(In reply to comment #32) > The value of "ou" is simply "#". My understanding is that the escaped form is > only necessary for displaying the value in the DN. Following this logic, the > search filter of "(ou=\#)" should not return the entry. I see. Now, this is the dump of id2entry.db4. The RDN value should not be escaped, either? id 8 rdn: ou=\# objectClass: organizationalUnit objectClass: top ou: \# ou: # Should it be A) id 8 rdn: ou=# objectClass: organizationalUnit objectClass: top ou: # or B) id 8 rdn: ou=\# objectClass: organizationalUnit objectClass: top ou: # ?
(In reply to comment #29) > Created an attachment (id=399190) [details] > test ldifs > Sorry. My test cases were silly. The attribute value was escaped in the test ldifs I attached to comment #29 as follows. dn: ou=\#,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: \# If I unescaped them like this: $ ldapsearch -b "dc=example,dc=com" "(objectclass=organizationalunit) dn: ou=\#,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: # then we got the expected result: dn: ou=\#,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: # dn: ou=\#\#,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: ## dn: ou=\#\<,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: #< dn: ou=\;\,,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: ;, dn: ou=\#\ ,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou:: IyA= dn: ou=\#\,\+\"\\\>:\=\<\<\>\;/,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: #,+"\>:=<<>;/ dn: ou=\#\,\+\"\\\>:\=\<\<\>\;/\ ,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou:: IywrIlw+Oj08PD47LyAg
Thanks so much to Nathan for his reviews and advices. Pushed to master. $ git merge work Updating be17b93..f11afee Fast forward ldap/servers/plugins/syntaxes/validate.c | 1 - ldap/servers/slapd/back-ldbm/ldbm_add.c | 9 +- ldap/servers/slapd/dn.c | 149 +++++++++++++++++++----------- 3 files changed, 100 insertions(+), 59 deletions(-) $ git push Counting objects: 21, done. Delta compression using 4 threads. Compressing objects: 100% (11/11), done. Writing objects: 100% (11/11), 2.04 KiB, done. Total 11 (delta 9), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git be17b93..f11afee master -> master Pushed to Directory_Server_8_2_Branch. $ git cherry-pick f11afee0ca0c4039cebc0efe4388b95776b6da4b $ git push origin ds82-local:Directory_Server_8_2_Branch Counting objects: 21, done. Delta compression using 4 threads. Compressing objects: 100% (11/11), done. Writing objects: 100% (11/11), 2.05 KiB, done. Total 11 (delta 9), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 5962d96..b46d314 ds82-local -> Directory_Server_8_2_Branch
Not sure I got the expected results here? Added the following: version: 1 dn: ou=Type3products,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: Type3products dn: ou=prod 1,ou=Type3products,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: prod 1 dn: ou=prod 2,ou=Type3products,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: prod 2 dn: ou=\#\,\+\"\\\>:\=\<\<\>\;/,ou=Type3products,dc=example,dc=com objectClass: organizationalUnit objectClass: top ou: #,+"\>:=<<>;/ search on the subtree ... ldapsearch -x -h `hostname` -p 389 -D "cn=Directory Manager" -w mysecret -b "ou=Type3products,dc=example,dc=com" "(ou=*)" dn # extended LDIF # # LDAPv3 # base <ou=Type3products,dc=example,dc=com> with scope sub # filter: (ou=*) # requesting: dn # # Type3products, example.com dn: ou=Type3products,dc=example,dc=com # prod 1, Type3products, example.com dn: ou=prod 1,ou=Type3products,dc=example,dc=com # prod 2, Type3products, example.com dn: ou=prod 2,ou=Type3products,dc=example,dc=com # \23\2C\2B\22\5C\3E:\3D\3C\3C\3E\3B/, Type3products, example.com dn: ou=\23\2C\2B\22\5C\3E:\3D\3C\3C\3E\3B/,ou=Type3products,dc=example,dc=com # search result search: 2 result: 0 Success # numResponses: 5 # numEntries: 4
Yes, the DN string is correctly escaped. (See also RFC4514) Hex Char ---------- 23 # 2C , 2B + 22 " 5C \ ’\\’ 3E > 3D = 3C < 3B ; You may also want to run this command adding "ou" to the attribute type list. ldapsearch -x -h `hostname` -p 389 -D "cn=Directory Manager" -w mysecret -b "ou=Type3products,dc=example,dc=com" "(ou=*)" dn ou <===
verified - RHEL 4 version: redhat-ds-base-8.2.0-2010060304.el4dsrv # ldapsearch -x -h `hostname` -p 389 -D "cn=Directory Manager" -w Secret123 -b "ou=Type3products,dc=example,dc=com" "(ou=*)" dn ou # extended LDIF # # LDAPv3 # base <ou=Type3products,dc=example,dc=com> with scope sub # filter: (ou=*) # requesting: dn ou # # Type3products, example.com dn: ou=Type3products,dc=example,dc=com ou: Type3products # prod 1, Type3products, example.com dn: ou=prod 1,ou=Type3products,dc=example,dc=com ou: prod 1 # prod 2, Type3products, example.com dn: ou=prod 2,ou=Type3products,dc=example,dc=com ou: prod 2 # \23\2C\2B\22\5C\3E:\3D\3C\3C\3E\3B/, Type3products, example.com dn: ou=\23\2C\2B\22\5C\3E:\3D\3C\3C\3E\3B/,ou=Type3products,dc=example,dc=com ou: #,+"\>:=<<>;/ # search result search: 2 result: 0 Success # numResponses: 5 # numEntries: 4