User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11) Gecko/2009060309 Ubuntu/9.10 (karmic) Firefox/3.0.11 1. The local machine can always connect via ldap or ldaps. 2. Remote machines an only connect when slapd is run with -d2, aka SLAPD_OPTIONS='-d2' /etc/init.d/ldap restart If TLSVerifyClient is set to "never" then ldaps works as expected. Reproducible: Always Steps to Reproduce: 1. set TLSVerifyClient allow in slapd.conf 2. try to do an ldap search from a remote host, aka ldapsearch -H ldaps://server -x -D "manager dn" -w password uid=abennett -b dc=clarku,dc=edu Actual Results: ldapsearch from the client hangs. The server logs nothing, even if run with -d1. Expected Results: Returned the search. The man page for slapd.conf says that if TLSVerifyClient is set to allow, The client certificate is requested. If no certificate is provided, the session proceeds normally. If a bad certificate is provided, it will be ignored and the session proceeds normally. I'm not seeing this result.
Issue was caused by unreasonably large file with accepted client certificate's CAs. This, combined with a fact that slapd didn't check success of SSL handshake, led to freeze before connection with client was even established. Patch has been backported from upstream and is now in CVS. Changing status to MODIFIED.
I can't reproduce this bug. * version: openldap-2.3.43-3.el5 (it should be fixed in 8) * slapd configured to use TLS * large CA certificate file (~5MB) * large client's certificate file (~500kB) * tried remote queries as well as local ones over a very slow proxy * many slapd restarts, many queries Aaron, Jan, could you please provide more specific reproducer (version of openldap, configuration, certificates, etc.)?
Your openldap version is correct, for CA cert file I used /etc/pki/tls/certs/ca-bundle.crt from Fedora, which has ~600k. No special configuration used, I just used the default one, just switched TLSVerifyClient to allow. Also no proxy used. Maybe you forgot to start slapd with ldaps:/// or you are connecting to ldap:///? One more detail to reproducing this bug - I never managed to get deterministic reproducer going, so I just had to test it by hand. You dan't have to use ldapsearch and similar tools, just simple openssl s_client -connect localhost:636 will do the trick. If it shows you all server certificate details, everything is going ok. If it freezes after a few lines (~5-10), that means you've got a hit on the issue. But either way you'll have to try by hand. In my case, sometimes the connection froze in ~20% of cases, sometimes in ~85% of cases. But as I already said, I never managed to reproduce this deterministically. Some more info is in upstream issue tracker: http://www.openldap.org/its/index.cgi/Software%20Bugs?id=6304;selectid=6304
Thanks a lot! Now I can easily reproduce the bug :-). RHTS test proposed, see QA whiteboard.
It's been quite complicated to reproduce this bug. However I've found a reproducer that works always: Two machines needed: SERVER: a) use default slapd.conf: i) uncomment the three TLS... lines ii) add TLSVerifyClient allow b) run slapd i) /usr/sbin/slapd -h ldaps:/// -u ldap c) compile and run attached slowing down proxy: i) gcc slowproxy.c -o slowproxy ii) ./slowproxy -l 6360 -h `hostname` -p 636 -d 100 CLIENT: d) list server's CA certs via proxy: i) openssl s_client -connect SERVER:6360 -showcerts -state -CAfile /etc/pki /tls/certs/ca-bundle.crt ii) it will succeed, kill it by C^ SERVER: e) kill proxy by C^ CLIENT: f) list server's CA certs directly: i) openssl s_client -connect SERVER:6360 -showcerts -state -CAfile /etc/pki /tls/certs/ca-bundle.crt ii) it will freeze See also RHTS test from QA whiteboard. Successfully verified via RHTS test. * see RHTS job 129141 [RHEL5-Server-U4 (x86_64)] * see RHTS job 129142 [RHEL5.5-Server-20100226.nightly (x86_64)]
Created attachment 396982 [details] slow proxy
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0198.html
I updated a server to openldap 2.3.43-12 (RHEL 5.5) and this patch broke TLS connections. I get a 3+ second delay (a simple search went from 0.03 seconds to 3.5+), and running slapd with -d2 gives over 37,000 lines like: tls_read: want=5 error=Resource temporarily unavailable slapd runs at 100% CPU during that time (it appears to be busy looping). I rebuilt 2.3.43-12 without the tls-connection patch and it works fine. It appears this patch is broken.
From your description it doesn't seem the patch is broken. The patch adds code which just sets a flag that SSL handshake should be done again if the initial connection fails. That seems to be your case - the initial connection in SSL handshake is probably failing repeatedly and as a result, slapd is still trying to reconnect. That could be the loop you are talking about. Please check is you have a client configured properly for SSL client verification, that would be my first guess where to look.
Well, the client is "ldapsearch -Z" with the following /etc/openldap/ldap.conf: URI ldap://fly2.hiwaay.net/ BASE dc=hiwaay,dc=net TLS_CACERTDIR /etc/openldap/cacerts TLS_REQCERT hard Tracing the client shows nothing happening for several seconds between a write to the server and a read back. The server is looping on reading with nothing to read. I also reproduced it with an ldapi:/// socket connection. Neither the client nor the server log anything about TLS failing. I also submitted a Red Hat support case for this; with my slapd.conf, the support tech was able to reproduce the problem and has passed it on to engineering. In any case, a misbehaving client should not be able to send the server to 100% CPU in a tight loop (that's a denial of service attack waiting to happen).
(In reply to comment #13) > I also submitted a Red Hat support case for this; with my slapd.conf, the > support tech was able to reproduce the problem and has passed it on to > engineering. In that case, I'll wait what our GSS dept. will say. If you have any more information, please fill it here. For example a wireshark trace of communication between client and server would be great - I'd like to know what data makes the server loop. Also please provide a link to this bug to your support request, it will help the information flow.
(In reply to comment #14) > If you have any more > information, please fill it here. For example a wireshark trace of > communication between client and server would be great - I'd like to know what > data makes the server loop. I ran tcpdump, and nothing happens during the pause. There's a packet from the server to the client, the client ACKs it, and then there's a 3+ second pause before the next packet (from the client to the server). This is over the loopback interface, so there's no external network issue involved. One odd looking thing: when I strace both the client (ldapsearch) and server (slapd), it looks like slapd writes something to the client socket that ldapsearch doesn't get right away (and that's when slapd loops). Here's are the read/write lines from the ldapsearch strace: 1271339298.342709 write(3, "0\35\2\1\1w\30\200\0261.3.6.1.4.1.1466.20037", 31) = 31 1271339298.344221 read(3, "0\f\2\1\1x\7\n", 8) = 8 1271339298.344293 read(3, "\1\0\4\0\4\0", 6) = 6 1271339298.374458 write(3, "\200w\1\3\1\0N\0\0\0 \0\0009\0\0008\0\0005\0\0\26\0\0\23\0\0\n\7\0\300\0\0003\0\0002\0\0/\3\0\200\0\0\5\0\0\4\1\0\200\0\0\25\0\0\22\0\0\t\6\0@\0\0\24\0\0\21\0\0\10\0\0\6\4\0\200\0\0\3\2\0\200\0\0\377\0248\311\346y#@z\26\374\341\241\30\204GH\244\376\333\f\243\302\10\323\245\332\335'\27\17\32\27", 121) = 121 1271339300.887399 read(3, "\26\3\1\0Q\2\0", 7) = 7 1271339300.887648 read(3, "\0M\3\1K\307\31\"d\232\342\n\\rjt\nGR\213\347\254\32u\227\331\255\213W\242\360\277o\326\312\264 \276\0304dd\213\32J\234\346\177\373\364\332\243*.]\352\334\344\226\221`_\321\17\310\35Hb\32\0005\0\0\5\377\1\0\1\0", 79) = 79 1271339300.887783 read(3, "\26\3\1\6\357", 5) = 5 1271339300.887851 read(3, "\v\0\6\353\0\6\350\0\2\2630\202\2\2570\202\2\30\2\1]0\r\6\t*\206H\206\367\r\1\1\5\5\0000\201\3001\v0\t\6\3U\4\6\23\2US1\0200\16\6\3U\4\10\23\7Alabama1\0230\21\6\3U\4\7\23\nHuntsville1$0\"\6\3U\4\n\23\33HiWAAY Information Services1%0#\6\3U\4\v\23\34HiWAAY Certificate Authority1\0310\27\6\3U\4\3\23\20HiWAAY Server CA1\"0 \6\t*\206H\206\367\r\1\t\1\26\23ssladmin\36\27\r100129141751Z\27\r120129"..., 1775) = 1775 1271339300.890843 read(3, "\26\3\1\0\322", 5) = 5 1271339300.890914 read(3, "\r\0\0\312\2\1\2\0\305\0\3030\201\3001\v0\t\6\3U\4\6\23\2US1\0200\16\6\3U\4\10\23\7Alabama1\0230\21\6\3U\4\7\23\nHuntsville1$0\"\6\3U\4\n\23\33HiWAAY Information Services1%0#\6\3U\4\v\23\34HiWAAY Certificate Authority1\0310\27\6\3U\4\3\23\20HiWAAY Server CA1\"0 \6\t*\206H\206\367\r\1\t\1\26\23ssladmin\16\0\0\0", 210) = 210 And here's the corresponding bit from the slapd trace: 1271339298.343338 read(18, "0\35\2\1\1w\30\200", 8) = 8 1271339298.343415 read(18, "\0261.3.6.1.4.1.1466.20037", 23) = 23 1271339298.343821 write(18, "0\f\2\1\1x\7\n\1\0\4\0\4\0", 14) = 14 1271339298.375256 read(18, "\200w\1\3\1\0N\0\0\0 ", 11) = 11 1271339298.375335 read(18, "\0\0009\0\0008\0\0005\0\0\26\0\0\23\0\0\n\7\0\300\0\0003\0\0002\0\0/\3\0\200\0\0\5\0\0\4\1\0\200\0\0\25\0\0\22\0\0\t\6\0@\0\0\24\0\0\21\0\0\10\0\0\6\4\0\200\0\0\3\2\0\200\0\0\377\0248\311\346y#@z\26\374\341\241\30\204GH\244\376\333\f\243\302\10\323\245\332\335'\27\17\32\27", 110) = 110 1271339298.375747 write(18, "\26\3\1\0Q\2\0\0M\3\1K\307\31\"d\232\342\n\\rjt\nGR\213\347\254\32u\227\331\255\213W\242\360\277o\326\312\264 \276\0304dd\213\32J\234\346\177\373\364\332\243*.]\352\334\344\226\221`_\321\17\310\35Hb\32\0005\0\0\5\377\1\0\1\0\26\3\1\6\357\v\0\6\353\0\6\350\0\2\2630\202\2\2570\202\2\30\2\1]0\r\6\t*\206H\206\367\r\1\1\5\5\0000\201\3001\v0\t\6\3U\4\6\23\2US1\0200\16\6\3U\4\10\23\7Alabama1\0230\21\6\3U\4\7\23\nHuntsville1$0\"\6\3U\4\n\23\33HiWAAY Information Services1%0#\6\3U\4\v\23\34HiWAAY Certificate Author"..., 2081) = 2081 1271339300.891576 read(18, "\26\3\1\0\7", 5) = 5 1271339300.891705 read(18, "\v\0\0\3\0\0\0", 7) = 7 1271339300.891818 read(18, "\26\3\1\0\206", 5) = 5 1271339300.891886 read(18, "\20\0\0\202\0\200\r\332\253\267\0342\334\244\210b\320Xqd\226\32\306\277\277\5\236\235`\274#'\0366 \206\342\203^\24\3\201(z\366J\203\210\313\255h\273\273\232?\255\256\261\255(\330<\352A\317\265B\320\226\265aS\306\271\221\320\266E\252]C\22\214\344\367\250f\201\242mW\2677\303\353`d.\323s\331\351\206\333\307\340\2612\324\302\316\236\247\273\355\261\204\320\335Jk%sO\306m\341\234\27\2056EKs", 134) = 134
My concern was more about what happens before the pause. Can you please send me the data flow from start of the client connection to the freeze?
Bug 591419 has been created. Please post any further comments there.