Bug 509230 - ldaps fails if TLSVerifyClient=allow unless slapd is run with -d2
Summary: ldaps fails if TLSVerifyClient=allow unless slapd is run with -d2
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openldap
Version: 5.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Jan Zeleny
QA Contact: Ondrej Moriš
URL:
Whiteboard:
Depends On:
Blocks: 591755
TreeView+ depends on / blocked
 
Reported: 2009-07-01 20:20 UTC by Aaron Bennett
Modified: 2010-08-17 23:30 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 591755 (view as bug list)
Environment:
Last Closed: 2010-03-30 08:05:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
slow proxy (5.32 KB, text/x-csrc)
2010-03-01 08:37 UTC, Ondrej Moriš
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0198 0 normal SHIPPED_LIVE Moderate: openldap security and bug fix update 2010-03-29 13:22:42 UTC

Description Aaron Bennett 2009-07-01 20:20:37 UTC
User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11) Gecko/2009060309 Ubuntu/9.10 (karmic) Firefox/3.0.11

1.  The local machine can always connect via ldap or ldaps.
2.  Remote machines an only connect when slapd is run with -d2, aka  
    SLAPD_OPTIONS='-d2' /etc/init.d/ldap restart

If TLSVerifyClient is set to "never" then ldaps works as expected.

Reproducible: Always

Steps to Reproduce:
1. set TLSVerifyClient allow in slapd.conf
2. try to do an ldap search from a remote host, aka ldapsearch -H ldaps://server -x -D "manager dn" -w password uid=abennett -b dc=clarku,dc=edu

Actual Results:  
ldapsearch from the client hangs.  The server logs nothing, even if run with -d1.

Expected Results:  
Returned the search.

The man page for slapd.conf says that if TLSVerifyClient is set to allow, The client certificate is requested.  If  no  certificate is  provided,  the  session  proceeds normally.  If a bad certificate is provided,  it  will  be  ignored  and  the session proceeds normally.  I'm not seeing this result.

Comment 2 Jan Zeleny 2009-11-16 11:34:50 UTC
Issue was caused by unreasonably large file with accepted client certificate's CAs. This, combined with a fact that slapd didn't check success of SSL handshake, led to freeze before connection with client was even established. Patch has been backported from upstream and is now in CVS. Changing status to MODIFIED.

Comment 4 Ondrej Moriš 2010-02-25 23:12:08 UTC
I can't reproduce this bug.

* version: openldap-2.3.43-3.el5 (it should be fixed in 8)
* slapd configured to use TLS
* large CA certificate file (~5MB)
* large client's certificate file (~500kB)
* tried remote queries as well as local ones over a very slow proxy
* many slapd restarts, many queries

Aaron, Jan, could you please provide more specific reproducer (version of openldap, configuration, certificates, etc.)?

Comment 5 Jan Zeleny 2010-02-26 08:58:07 UTC
Your openldap version is correct, for CA cert file I used /etc/pki/tls/certs/ca-bundle.crt from Fedora, which has ~600k. No special configuration used, I just used the default one, just switched TLSVerifyClient to allow. Also no proxy used. Maybe you forgot to start slapd with ldaps:/// or you are connecting to ldap:///?

One more detail to reproducing this bug - I never managed to get deterministic reproducer going, so I just had to test it by hand. You dan't have to use ldapsearch and similar tools, just simple openssl s_client -connect localhost:636 will do the trick. If it shows you all server certificate details, everything is going ok. If it freezes after a few lines (~5-10), that means you've got a hit on the issue. But either way you'll have to try by hand. In my case, sometimes the connection froze in ~20% of cases, sometimes in ~85% of cases. But as I already said, I never managed to reproduce this deterministically.

Some more info is in upstream issue tracker:
http://www.openldap.org/its/index.cgi/Software%20Bugs?id=6304;selectid=6304

Comment 6 Ondrej Moriš 2010-02-26 10:58:31 UTC
Thanks a lot! Now I can easily reproduce the bug :-).

RHTS test proposed, see QA whiteboard.

Comment 7 Ondrej Moriš 2010-02-28 22:38:14 UTC
It's been quite complicated to reproduce this bug. However I've found a reproducer that works always:

Two machines needed:

SERVER:
a) use default slapd.conf:
   i)  uncomment the three TLS... lines 
   ii) add TLSVerifyClient allow
b) run slapd
   i)  /usr/sbin/slapd -h ldaps:/// -u ldap
c) compile and run attached slowing down proxy:
   i)  gcc slowproxy.c -o slowproxy 
   ii) ./slowproxy -l 6360 -h `hostname` -p 636 -d 100 

CLIENT:
d) list server's CA certs via proxy:
   i)  openssl s_client -connect SERVER:6360 -showcerts -state -CAfile /etc/pki
       /tls/certs/ca-bundle.crt
   ii) it will succeed, kill it by C^

SERVER:
e) kill proxy by C^

CLIENT:
f) list server's CA certs directly:
   i)  openssl s_client -connect SERVER:6360 -showcerts -state -CAfile /etc/pki
       /tls/certs/ca-bundle.crt
   ii) it will freeze
   
See also RHTS test from QA whiteboard.

Successfully verified via RHTS test.

* see RHTS job 129141 [RHEL5-Server-U4 (x86_64)]
* see RHTS job 129142 [RHEL5.5-Server-20100226.nightly (x86_64)]

Comment 8 Ondrej Moriš 2010-03-01 08:37:56 UTC
Created attachment 396982 [details]
slow proxy

Comment 10 errata-xmlrpc 2010-03-30 08:05:31 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0198.html

Comment 11 Chris Adams 2010-04-05 20:57:48 UTC
I updated a server to openldap 2.3.43-12 (RHEL 5.5) and this patch broke TLS connections.  I get a 3+ second delay (a simple search went from 0.03 seconds to 3.5+), and running slapd with -d2 gives over 37,000 lines like:

   tls_read: want=5 error=Resource temporarily unavailable

slapd runs at 100% CPU during that time (it appears to be busy looping).

I rebuilt 2.3.43-12 without the tls-connection patch and it works fine.  It appears this patch is broken.

Comment 12 Jan Zeleny 2010-04-12 14:36:34 UTC
From your description it doesn't seem the patch is broken. The patch adds code which just sets a flag that SSL handshake should be done again if the initial connection fails.

That seems to be your case - the initial connection in SSL handshake is probably failing repeatedly and as a result, slapd is still trying to reconnect. That could be the loop you are talking about. Please check is you have a client configured properly for SSL client verification, that would be my first guess where to look.

Comment 13 Chris Adams 2010-04-12 14:57:13 UTC
Well, the client is "ldapsearch -Z" with the following /etc/openldap/ldap.conf:

URI ldap://fly2.hiwaay.net/
BASE dc=hiwaay,dc=net
TLS_CACERTDIR /etc/openldap/cacerts
TLS_REQCERT hard

Tracing the client shows nothing happening for several seconds between a write to the server and a read back.  The server is looping on reading with nothing to read.  I also reproduced it with an ldapi:/// socket connection.  Neither the client nor the server log anything about TLS failing.

I also submitted a Red Hat support case for this; with my slapd.conf, the support tech was able to reproduce the problem and has passed it on to engineering.

In any case, a misbehaving client should not be able to send the server to 100% CPU in a tight loop (that's a denial of service attack waiting to happen).

Comment 14 Jan Zeleny 2010-04-13 06:26:17 UTC
(In reply to comment #13)
> I also submitted a Red Hat support case for this; with my slapd.conf, the
> support tech was able to reproduce the problem and has passed it on to
> engineering.

In that case, I'll wait what our GSS dept. will say. If you have any more information, please fill it here. For example a wireshark trace of communication between client and server would be great - I'd like to know what data makes the server loop. Also please provide a link to this bug to your support request, it will help the information flow.

Comment 15 Chris Adams 2010-04-15 14:02:01 UTC
(In reply to comment #14)
> If you have any more
> information, please fill it here. For example a wireshark trace of
> communication between client and server would be great - I'd like to know what
> data makes the server loop.   

I ran tcpdump, and nothing happens during the pause.  There's a packet from the server to the client, the client ACKs it, and then there's a 3+ second pause before the next packet (from the client to the server).  This is over the loopback interface, so there's no external network issue involved.

One odd looking thing: when I strace both the client (ldapsearch) and server (slapd), it looks like slapd writes something to the client socket that ldapsearch doesn't get right away (and that's when slapd loops).  Here's are the read/write lines from the ldapsearch strace:

1271339298.342709 write(3, "0\35\2\1\1w\30\200\0261.3.6.1.4.1.1466.20037", 31) = 31
1271339298.344221 read(3, "0\f\2\1\1x\7\n", 8) = 8
1271339298.344293 read(3, "\1\0\4\0\4\0", 6) = 6
1271339298.374458 write(3, "\200w\1\3\1\0N\0\0\0 \0\0009\0\0008\0\0005\0\0\26\0\0\23\0\0\n\7\0\300\0\0003\0\0002\0\0/\3\0\200\0\0\5\0\0\4\1\0\200\0\0\25\0\0\22\0\0\t\6\0@\0\0\24\0\0\21\0\0\10\0\0\6\4\0\200\0\0\3\2\0\200\0\0\377\0248\311\346y#@z\26\374\341\241\30\204GH\244\376\333\f\243\302\10\323\245\332\335'\27\17\32\27", 121) = 121
1271339300.887399 read(3, "\26\3\1\0Q\2\0", 7) = 7
1271339300.887648 read(3, "\0M\3\1K\307\31\"d\232\342\n\\rjt\nGR\213\347\254\32u\227\331\255\213W\242\360\277o\326\312\264 \276\0304dd\213\32J\234\346\177\373\364\332\243*.]\352\334\344\226\221`_\321\17\310\35Hb\32\0005\0\0\5\377\1\0\1\0", 79) = 79
1271339300.887783 read(3, "\26\3\1\6\357", 5) = 5
1271339300.887851 read(3, "\v\0\6\353\0\6\350\0\2\2630\202\2\2570\202\2\30\2\1]0\r\6\t*\206H\206\367\r\1\1\5\5\0000\201\3001\v0\t\6\3U\4\6\23\2US1\0200\16\6\3U\4\10\23\7Alabama1\0230\21\6\3U\4\7\23\nHuntsville1$0\"\6\3U\4\n\23\33HiWAAY Information Services1%0#\6\3U\4\v\23\34HiWAAY Certificate Authority1\0310\27\6\3U\4\3\23\20HiWAAY Server CA1\"0 \6\t*\206H\206\367\r\1\t\1\26\23ssladmin\36\27\r100129141751Z\27\r120129"..., 1775) = 1775
1271339300.890843 read(3, "\26\3\1\0\322", 5) = 5
1271339300.890914 read(3, "\r\0\0\312\2\1\2\0\305\0\3030\201\3001\v0\t\6\3U\4\6\23\2US1\0200\16\6\3U\4\10\23\7Alabama1\0230\21\6\3U\4\7\23\nHuntsville1$0\"\6\3U\4\n\23\33HiWAAY Information Services1%0#\6\3U\4\v\23\34HiWAAY Certificate Authority1\0310\27\6\3U\4\3\23\20HiWAAY Server CA1\"0 \6\t*\206H\206\367\r\1\t\1\26\23ssladmin\16\0\0\0", 210) = 210

And here's the corresponding bit from the slapd trace:

1271339298.343338 read(18, "0\35\2\1\1w\30\200", 8) = 8
1271339298.343415 read(18, "\0261.3.6.1.4.1.1466.20037", 23) = 23
1271339298.343821 write(18, "0\f\2\1\1x\7\n\1\0\4\0\4\0", 14)             = 14
1271339298.375256 read(18, "\200w\1\3\1\0N\0\0\0 ", 11) = 11
1271339298.375335 read(18, "\0\0009\0\0008\0\0005\0\0\26\0\0\23\0\0\n\7\0\300\0\0003\0\0002\0\0/\3\0\200\0\0\5\0\0\4\1\0\200\0\0\25\0\0\22\0\0\t\6\0@\0\0\24\0\0\21\0\0\10\0\0\6\4\0\200\0\0\3\2\0\200\0\0\377\0248\311\346y#@z\26\374\341\241\30\204GH\244\376\333\f\243\302\10\323\245\332\335'\27\17\32\27", 110) = 110
1271339298.375747 write(18, "\26\3\1\0Q\2\0\0M\3\1K\307\31\"d\232\342\n\\rjt\nGR\213\347\254\32u\227\331\255\213W\242\360\277o\326\312\264 \276\0304dd\213\32J\234\346\177\373\364\332\243*.]\352\334\344\226\221`_\321\17\310\35Hb\32\0005\0\0\5\377\1\0\1\0\26\3\1\6\357\v\0\6\353\0\6\350\0\2\2630\202\2\2570\202\2\30\2\1]0\r\6\t*\206H\206\367\r\1\1\5\5\0000\201\3001\v0\t\6\3U\4\6\23\2US1\0200\16\6\3U\4\10\23\7Alabama1\0230\21\6\3U\4\7\23\nHuntsville1$0\"\6\3U\4\n\23\33HiWAAY Information Services1%0#\6\3U\4\v\23\34HiWAAY Certificate Author"..., 2081) = 2081
1271339300.891576 read(18, "\26\3\1\0\7", 5) = 5
1271339300.891705 read(18, "\v\0\0\3\0\0\0", 7) = 7
1271339300.891818 read(18, "\26\3\1\0\206", 5) = 5
1271339300.891886 read(18, "\20\0\0\202\0\200\r\332\253\267\0342\334\244\210b\320Xqd\226\32\306\277\277\5\236\235`\274#'\0366 \206\342\203^\24\3\201(z\366J\203\210\313\255h\273\273\232?\255\256\261\255(\330<\352A\317\265B\320\226\265aS\306\271\221\320\266E\252]C\22\214\344\367\250f\201\242mW\2677\303\353`d.\323s\331\351\206\333\307\340\2612\324\302\316\236\247\273\355\261\204\320\335Jk%sO\306m\341\234\27\2056EKs", 134) = 134

Comment 16 Jan Zeleny 2010-04-16 08:03:03 UTC
My concern was more about what happens before the pause. Can you please send me the data flow from start of the client connection to the freeze?

Comment 20 Jan Zeleny 2010-05-12 08:31:22 UTC
Bug 591419 has been created. Please post any further comments there.


Note You need to log in before you can comment on or make changes to this bug.