I am running 389-ds-base-1.2.6.1-2 on RHEL 5.5: rpm -qi 389-ds-base Name : 389-ds-base Relocations: (not relocatable) Version : 1.2.6.1 Vendor: Fedora Project Release : 2.el5 Build Date: Thu 30 Sep 2010 09:15:13 AM EST Install Date: Mon 18 Oct 2010 03:45:22 PM EST Build Host: x86-02.phx2.fedoraproject.org Group : System Environment/Daemons Source RPM: 389-ds-base-1.2.6.1-2.el5.src.rpm Size : 5855143 License: GPLv2 with exceptions Signature : DSA/SHA1, Fri 01 Oct 2010 01:56:46 AM EST, Key ID 119cc036217521f6 Packager : Fedora Project URL : http://port389.org/ Summary : 389 Directory Server (base) Description : 389 Directory Server is an LDAPv3 compliant server. The base package includes the LDAP server and command line utilities for server administration. $ rpm -qa | grep 389 389-console-1.1.4-1.el5 389-dsgw-1.1.5-1.el5 389-admin-1.1.11-1.el5 389-ds-console-doc-1.2.3-1.el5 389-ds-base-1.2.6.1-2.el5 389-ds-console-1.2.3-1.el5 389-ds-1.2.1-1.el5 389-admin-console-1.1.5-1.el5 389-ds-base-debuginfo-1.2.6.1-2.el5 389-admin-console-doc-1.1.5-1.el5 We have been getting crashes every couple of weeks. See the output from core files below (it's 800MB core): (gdb) where #0 slapi_sdn_get_ndn (sdn=0x0) at ldap/servers/slapd/dn.c:1933 #1 0x000000000042113d in need_new_pw (pb=0x2aaab13f0f70, t=0x52b7ce78, e=0x0, pwresponse_req=0) at ldap/servers/slapd/pw_mgmt.c:71 #2 0x000000000040ecbf in do_bind (pb=0x2aaab13f0f70) at ldap/servers/slapd/bind.c:745 #3 0x000000000041336d in connection_threadmain () at ldap/servers/slapd/connection.c:553 #4 0x00000033c5a284ad in ?? () from /usr/lib64/libnspr4.so #5 0x00000033c1a0673d in start_thread () from /lib64/libpthread.so.0 #6 0x00000033c12d3f6d in clone () from /lib64/libc.so. (gdb) print *pb $1 = {pb_backend = 0x1996ec00, pb_conn = 0x2aaaab81f750, pb_op = 0x2aaab29ed580, pb_plugin = 0x1992fb10, pb_opreturn = 0, pb_object = 0x0, pb_destroy_fn = 0, pb_requestor_isroot = 0, pb_config_fname = 0x0, pb_config_lineno = 0, pb_config_argc = 0, pb_config_argv = 0x0, pb_target_entry = 0x0, pb_existing_dn_entry = 0x0, pb_existing_uniqueid_entry = 0x0, pb_parent_entry = 0x0, pb_newparent_entry = 0x0, pb_pre_op_entry = 0x0, pb_post_op_entry = 0x0, pb_seq_type = 0, pb_seq_attrname = 0x0, pb_seq_val = 0x0, pb_ldif_file = 0x0, pb_removedupvals = 0, pb_db2index_attrs = 0x0, pb_ldif2db_noattrindexes = 0, pb_ldif_printkey = 0, pb_instance_name = 0x0, pb_task = 0x0, pb_task_flags = 0, pb_mr_filter_match_fn = 0, pb_mr_filter_index_fn = 0, pb_mr_filter_reset_fn = 0, pb_mr_index_fn = 0, pb_mr_oid = 0x0, pb_mr_type = 0x0, pb_mr_value = 0x0, pb_mr_values = 0x0, pb_mr_keys = 0x0, pb_mr_filter_reusable = 0, pb_mr_query_operator = 0, pb_mr_usage = 0, pb_pwd_storage_scheme_user_passwd = 0x0, pb_pwd_storage_scheme_db_passwd = 0x0, pb_managedsait = 0, pb_internal_op_result = 0, pb_plugin_internal_search_op_entries = 0x0, pb_plugin_internal_search_op_referrals = 0x0, pb_plugin_identity = 0x0, pb_parent_txn = 0x0, pb_txn = 0x0, pb_dbsize = 0, pb_ldif_files = 0x0, pb_ldif_include = 0x0, pb_ldif_exclude = 0x0, pb_ldif_dump_replica = 0, pb_ldif_dump_uniqueid = 0, pb_ldif_generate_uniqueid = 0, pb_ldif_namespaceid = 0x0, pb_ldif_encrypt = 0, pb_operation_notes = 0, pb_slapd_argc = 0, pb_slapd_argv = 0x0, pb_slapd_configdir = 0x0, pb_ctrls_arg = 0x0, pb_dse_dont_add_write = 0, pb_dse_add_merge = 0, pb_dse_dont_check_dups = 0, pb_dse_is_primary_file = 0, pb_schema_flags = 0, pb_result_code = 0, pb_result_text = 0x0, pb_result_matched = 0x0, pb_nentries = 0, urls = 0x0, pb_import_entry = 0x0, pb_import_state = 0, pb_destroy_content = 0, pb_dse_reapply_mods = 0, pb_urp_naming_collision_dn = 0x0, pb_urp_tombstone_uniqueid = 0x0, pb_server_running = 0, pb_backend_count = 1, pb_pwpolicy_ctrl = 0, pb_vattr_context = 0x0, pb_substrlens = 0x0, pb_plugin_enabled = 0, pb_search_ctrls = 0x0, pb_mr_index_sv_fn = 0} We not sure how this is occurring as we have turned off the ability for users to change their own passwords. If you need any extra info from the core file let me know.
Created attachment 477520 [details] thread apply all bt output
The need_new_pw() function is called when a BIND operation is processed to see if a password is expired. This crash is not being triggered by a password change operation. This function does not expect to be passed a NULL pointer for the "e" variable, which should contain the entry that one is trying to bind as. In do_bind (frame 2 of thread 1), what do the "dn", "rawdn", and "sdn" variables contain? Do you use the chaining feature of 389? Do you do any binds that are something other than a simple bind, such as SASL or client certificate authentication? Are you using the LDAPI autobind feature?
(gdb) down #2 0x000000000040ecbf in do_bind (pb=0x2aaab13f0f70) at ldap/servers/slapd/bind.c:745 (gdb) print dn $1 = 0x2aaaae4aeba0 "uid=qx\t,ou=People,dc=deakin,dc=edu,dc=au" (gdb) print rawdn $2 = 0x2aaaae4aeba0 "uid=qx\t,ou=People,dc=deakin,dc=edu,dc=au" (gdb) print sdn $3 = {flag = 6 '\006', dn = 0x2aaaae4aeba0 "uid=qx\t,ou=People,dc=deakin,dc=edu,dc=au", ndn = 0x2aaaaebeb9f0 "uid=qx\t,ou=people,dc=deakin,dc=edu,dc=au", ndn_len = 40} (gdb) We have turned off all the password policies as we don't use them. Is the \t in the dn normal?
We don't use chaining. LDAPI and autobind are off: nsslapd-ldapifilepath: /var/run/slapd-auth-f.socket nsslapd-ldapilisten: off nsslapd-ldapiautobind: off We use simple binds however could someone trying to connect as sasl cause an issue? Is their a way to turn it off on the server?
(In reply to comment #3) > > We have turned off all the password policies as we don't use them. This code path will still be hit without password policies turned on. > > Is the \t in the dn normal? No, this is not normal and may be a part of the problem. I will run some tests.
Do you have an entry in your database that looks something similar to "uid=qx\t,ou=people,dc=deakin,dc=edu,dc=au"? Do you know what the client application is that is attempting to bind as this DN? It is possible that the current versions of 389-ds-base do not have this problem as there have been many changes around DN normalization. I would recommend that you use a more recent version of 389-ds-base. The latest version in EPEL5 is 389-ds-base-1.2.7.5-1. I will continue testing to see if I can reproduce the issue on current code.
I can reproduce this problem. It is triggered by a bind with a valid DN and password except that the RDN has a tab at the end of it when binding. For example, you can add a user like this: dn: uid=foo,dc=example,dc=com objectclass: posixaccount cn: foo userpassword: secret uidnumber: 500 gidnumber: 500 homedirectory: /home/foo If you then bind as that user like this, ns-slapd will crash: ldapsearch -x -D "uid=foo\09,dc=example,dc=com" -w secret -b "dc=example,dc=com" "objectclass=*" This appears to be related to the entryrdn changes, as it doesn't affect versions prior to supporting modrdn with new superior. The problem is that the call to get_entry() in do_bind() fails to find the bind target entry, but the call to the backend bind function succeeds, which then puts us on a path to call need_new_pw() with a NULL entry. Versions prior to the entryrdn changes do not have the problem since the call to the backend bind function fails since it says that the bind target doesn't exist. At this point, do_bind short circuits and returns an error to the client.
The find_entry() function in the backend code does find the entry yet get_entry() in the frontend code fails to find the entry. These functions differ in the way they locate the bind target entry. The get_entry() function uses slapi_search_internal_get_entry() to locate the entry. The find_entry() function uses dn2entry(). This ends up using entryrdn_index_read() to find the entry by consulting the entryrdn index. This in turn calls slapi_rdn_init_all_sdn() to break the DN into RDNs so it can use the RDN to consult the index. This ends up removing the tab character, so the left-most RDN is "uid=foo" instead of "uid=foo\t". Since this truncated RDN does indeed exist, the index entry is found which results in fetching the entry from the database allowing the bind to succeed.
The root cause of the problem is that slapi_ldap_explode_dn() is truncating the trailing tab character off of the RDN. When ns-slapd is built against MozLDAP, we use the MozLDAP ldap_explode_dn() function, which trims the tab character. When ns-slapd is built against OpenLDAP, we use our own mozldap_ldap_explode_dn() function which mimics the MozLDAP function. We do this because OpenLDAP has no equivalent function available. Both the ldap_expode_dn() and mozldap_explode_dn() functions use the ldap_utf8isspace() function to decide which trailing characters to trim. This function considers tabs (along other whitespace characters) to be a space in both the MozLDAP and 389 versions of the function. This function is supposed to mimic the isspace() call, so I believe that it is behaving correctly. I think we should not be using this function to trim trailing space (0x20) characters when exploding a DN. As a compatibility test with OpenLDAP server, a bind using a trailing tab on the left-most RDN fails as if the user does not exist. I believe that this is the correct behavior.
Created attachment 477699 [details] Patch
Thanks Nathan. So I have been unable to track which application is sending the tab in the bind. Do you know if the src ip of the connection is stored anywhere that I can access via the core file? maybe in pb variable somewhere?
(In reply to comment #11) > Thanks Nathan. > > So I have been unable to track which application is sending the tab in the > bind. Do you know if the src ip of the connection is stored anywhere that I can > access via the core file? maybe in pb variable somewhere? The address is in pb->pb_conn->cin_addr, but it's not in a readible format. It's stored as a NSPR PRNetAddr. I'm not sure if there's an easy way to convert it within gdb, so it would require dumping memory from the core and trying to recreate the PRNetAddr in another test program. You could then have that test program call PR_NetAddrToString() to get back a human-readible address.
Created attachment 477718 [details] Address conversion program This is a simple program which can be used to convert a dumped PRNetAddr to a human-readable network address string. The source (netaddr.c) needs to be modified to make the "addr" array contain the dumped PRNetAddress that you want to convert. To dump the PRNetAddr from your ns-slapd core, run the following command in gdb inside of the do_bind stack frame (be sure to dump the number of bytes returned by the call to sizeof on your system): (gdb) call sizeof(PRNetAddr) $20 = 112 (gdb) x/112x pb->pb_conn->cin_addr At this point you will get output like the following from gdb: 0x1d22070: 0x0a 0x00 0xce 0x52 0x00 0x00 0x00 0x00 0x1d22078: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x1d22080: 0x00 0x00 0xff 0xff 0x0a 0x0e 0x36 0x8c 0x1d22088: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x1d22090: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x1d22098: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x1d220a0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x1d220a8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x1d220b0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x1d220b8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x1d220c0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Eliminate the addresses (left hand column) and copy the bytes into the initialization of the "addr" array in the netaddr.c source. The bytes will need to be separated by commas as you will see in the example address in the source code. To build netaddr, run the build.sh script. You will need to have the nspr-devel package installed. Once the program is build, simply run netaddr and it will print out the address in a readable format like this: [nkinder@localhost netaddr]$ ./netaddr Address is ::ffff:10.14.54.140.
This is what i get from gdb: (gdb) call sizeof(PRNetAddr) $1 = 112 (gdb) x/112x pb->pb_conn->cin_addr 0x2aaabd9b8fb0: 0x1b6c000a 0x00000000 0x00000000 0x00000000 0x2aaabd9b8fc0: 0xffff0000 0xf0a7b880 0x00000000 0x00000000 0x2aaabd9b8fd0: 0x00000000 0x00000000 0x00000000 0x00000000 0x2aaabd9b8fe0: 0x00000000 0x00000000 0x00000000 0x00000000 0x2aaabd9b8ff0: 0x00000000 0x00000000 0x00000000 0x00000000 0x2aaabd9b9000: 0x00000000 0x00000000 0x00000000 0x00000000 0x2aaabd9b9010: 0x00000000 0x00000000 0x00000000 0x00000000 0x2aaabd9b9020: 0x00000080 0x00000000 0x00000035 0x00000000 0x2aaabd9b9030: 0x31313032 0x39313130 0x35313530 0x005a3235 0x2aaabd9b9040: 0xbc27e8c0 0x00002aaa 0xbc4a6360 0x00002aaa 0x2aaabd9b9050: 0x00000030 0x00000000 0x00000035 0x00000000 0x2aaabd9b9060: 0xbc76d5d0 0x00002aaa 0x00000004 0x00000000 0x2aaabd9b9070: 0xb43009c0 0x00002aaa 0xb5121490 0x00002aaa 0x2aaabd9b9080: 0x00000000 0x00000000 0x00000025 0x00000000 0x2aaabd9b9090: 0x72746e65 0x00646979 0x5470756f 0x6c706972 0x2aaabd9b90a0: 0x3d630065 0x00007561 0x00000045 0x00000000 0x2aaabd9b90b0: 0x31642d66 0x7374692d 0x6d73632d 0x3736312d 0x2aaabd9b90c0: 0x74616e2d 0x6c6f6f70 0x74656e2e 0x6165642e 0x2aaabd9b90d0: 0x2e6e696b 0x2e756465 0x00007561 0x00000016 0x2aaabd9b90e0: 0x00000017 0x00000018 0x00000025 0x00000000 0x2aaabd9b90f0: 0x706d6973 0x0000656c 0x00000000 0x00000000 0x2aaabd9b9100: 0x00000020 0x00000000 0x00000045 0x00000000 0x2aaabd9b9110: 0x3d646975 0x7365706d 0x6f2c616b 0x65503d75 0x2aaabd9b9120: 0x656c706f 0x3d63642c 0x6b616564 0x642c6e69 0x2aaabd9b9130: 0x64653d63 0x63642c75 0x0075613d 0x00000000 0x2aaabd9b9140: 0x00000000 0x00000000 0x00000055 0x00000000 0x2aaabd9b9150: 0x00000001 0x00000000 0xb09fd220 0x00002aaa 0x2aaabd9b9160: 0xaf4fc350 0x00002aaa 0x00005290 0x00000000 Placing the data into the program i get: $ ./netaddr Address is ::.
(In reply to comment #14) > This is what i get from gdb: > > (gdb) call sizeof(PRNetAddr) > $1 = 112 > (gdb) x/112x pb->pb_conn->cin_addr > 0x2aaabd9b8fb0: 0x1b6c000a 0x00000000 0x00000000 0x00000000 This data in in the wrong format. It turns out that gdb remembers the last size and format for the examine (x) command, and I had used "xb" previously. Try using x/112xb to get the data listed in single bytes.
Pushed patch to master. Thanks to Noriko for her review! Counting objects: 11, done. Delta compression using up to 2 threads. Compressing objects: 100% (6/6), done. Writing objects: 100% (6/6), 1.18 KiB, done. Total 6 (delta 4), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 10f6c0e..30cb812 master -> master
Tested using comment#7. [root@testvm ~]# ldapadd -x -h localhost -p 389 -D "cn=Directory Manager" -w Secret123 << EOF dn: uid=foo,ou=people,dc=testnew,dc=com objectclass: posixaccount cn: foo userpassword: secret uidnumber: 500 gidnumber: 500 homedirectory: /home/foo EOF adding new entry "uid=foo,ou=people,dc=testnew,dc=com" [root@testvm ~]# service dirsrv status dirsrv testvm (pid 3316) is running... dirsrv testvm1 (pid 3394) is running... [root@testvm ~]# ldapsearch -x -h localhost -p 389 -D "uid=foo\09,ou=people,dc=testnew,dc=com" -w secret -b "ou=people,dc=testnew,dc=com" ldap_bind: No such object (32) matched DN: ou=people,dc=testnew,dc=com [root@testvm ~]# service dirsrv status dirsrv testvm (pid 3316) is running... dirsrv testvm1 (pid 3394) is running... [root@testvm ~]#