Bug 1186562

Summary:	OpenLDAP does not time out correctly on stalled LDAPS connections
Product:	Red Hat Enterprise Linux 6	Reporter:	Paul Wayper <pwayper>
Component:	openldap	Assignee:	Matus Honek <mhonek>
Status:	CLOSED WONTFIX	QA Contact:	BaseOS QE Security Team <qe-baseos-security>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	6.5	CC:	ebenes, jsynacek, pwayper, qe-baseos-security
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	1186559	Environment:
Last Closed:	2015-10-13 10:27:11 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1265549

Description Paul Wayper 2015-01-28 01:22:16 UTC

+++ This bug was initially created as a clone of Bug #1186559 +++

Description of problem:

When a remote LDAP server stalls, new connections can be made but the SSL handshake never completes.  This is not correctly detected in OpenLDAP, causing it to ignore the set timeouts on the LDAP handle.

Version-Release number of selected component (if applicable):

openldap-2.3.43-26.el6_3.2

How reproducible:

Always.

Steps to Reproduce:

In order to reproduce this problem, the following setup is used:

Client running nslcd (ssh://root.211.131) using LDAPS URI for server.
Server running RHDS or IPA. (ssh://root.234.250)
Client uses the following timeouts:

bind_timelimit 2
timelimit 4
idle_timelimit 7
reconnect_retrytime 2

1. On client, request 'id' information for user in LDAP directory.  Verify that this works.
2. On server, issue 'kill -STOP $LDAP_PROCESS_ID'.
3. On client, request 'id' of same user.

Actual results:

id process takes one minute to time out.  (This is hard coded in nslcd as the timeout on the communication between nslcd and PAM).

Expected results:

id process times out after 2 seconds.

Additional info:

After the server issues a 'kill -CONT $LDAP_PROCESS_ID', the connection will work as normal.

What is happening here is:

A) client sends SYN, gets SYN ACK, sends ACK.  server's kernel TCP stack has handled this so far.
B) client tries to initiate SSL connection.
C) server process never handles the SSL connection (in this case because it's stopped by the kernel, but this can also happen if the process has stopped for other reasons).
D) yet client has not yet sent the BIND request and is therefore not in the bind_timeout period.

Therefore, we propose that the bind timeout should apply from the start of the connection process, before the SSL handshake process starts.

Comment 2 Jan Synacek 2015-01-28 07:48:23 UTC

Does this also happen if you use ldapsearch as the client and set TIMEOUT and NETWORK_TIMEOUT in /etc/openldap/ldap.conf?

Comment 3 Jan Synacek 2015-01-28 07:51:28 UTC

Also, I just noticed that the openldap version you mention is ancient. Please use the latest stable version: openldap-2.4.39-8.el6.

Comment 4 Paul Wayper 2015-01-29 01:49:38 UTC

Without SSL:

[root@vm131 ~]# grep TIMEOUT /etc/openldap/ldap.conf
TIMEOUT		11
NETWORK_TIMEOUT	13
[root@vm131 ~]# time ldapsearch -H ldap://openldap.example.com  -b "dc=example,dc=com" -x -LLL  "uid=user1" 
dn: uid=user1,ou=People,dc=example,dc=com
sn: Ldap
cn: User
objectClass: top
objectClass: person
objectClass: inetOrgPerson
objectClass: posixAccount
uid: user1
uidNumber: 32259
gidNumber: 24
loginShell: /bin/bash
homeDirectory: /home/user1
userPassword:: cmVkaGF0MTIz

real	0m0.008s
user	0m0.001s
sys	0m0.002s

[root@vm60 ~]# kill -STOP 79085

[root@vm131 ~]# time ldapsearch -H ldap://openldap.example.com  -b "dc=example,dc=com" -x -LLL  "uid=user1" 
ldap_result: Timed out (-5)

real	0m11.016s
user	0m0.000s
sys	0m0.004s

With SSL:

[root@vm60 ~]# kill -CONT 79085

[root@vm131 ~]# time ldapsearch -H ldaps://openldap.example.com  -b "dc=example,dc=com" -x -LLL  "uid=user1" 
dn: uid=user1,ou=People,dc=example,dc=com
sn: Ldap
cn: User
objectClass: top
objectClass: person
objectClass: inetOrgPerson
objectClass: posixAccount
uid: user1
uidNumber: 32259
gidNumber: 24
loginShell: /bin/bash
homeDirectory: /home/user1
userPassword:: cmVkaGF0MTIz

real	0m0.025s
user	0m0.009s
sys	0m0.005s

[root@vm60 ~]# kill -STOP 79085

[root@vm131 ~]# time ldapsearch -H ldaps://openldap.example.com  -b "dc=example,dc=com" -x -LLL  "uid=user1" 
^C

real	1m22.887s
user	0m0.003s
sys	0m0.005s

So, yes, TIMEOUT and NETWORK_TIMEOUT do not prevent this problem.

Packet capture on server of packets from client during SSL attempt:

1127.897383 10.65.211.131 -> 10.65.211.60 TCP 74 49738 > ldaps [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=746015472 TSecr=0 WS=64
1127.897446 10.65.211.60 -> 10.65.211.131 TCP 74 ldaps > 49738 [SYN, ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 SACK_PERM=1 TSval=1692125023 TSecr=746015472 WS=64
1127.897764 10.65.211.131 -> 10.65.211.60 TCP 66 49738 > ldaps [ACK] Seq=1 Ack=1 Win=14656 Len=0 TSval=746015473 TSecr=1692125023
1127.901492 10.65.211.131 -> 10.65.211.60 SSL 220 Client Hello
1127.901521 10.65.211.60 -> 10.65.211.131 TCP 66 ldaps > 49738 [ACK] Seq=1 Ack=155 Win=15552 Len=0 TSval=1692125027 TSecr=746015476
1210.781304 10.65.211.131 -> 10.65.211.60 TCP 66 49738 > ldaps [FIN, ACK] Seq=155 Ack=1 Win=14656 Len=0 TSval=746098356 TSecr=1692125027
1210.821152 10.65.211.60 -> 10.65.211.131 TCP 66 ldaps > 49738 [ACK] Seq=1 Ack=156 Win=15552 Len=0 TSval=1692207947 TSecr=746098356

Hope this helps,

Paul

Comment 5 Paul Wayper 2015-01-29 01:52:41 UTC

That's with the latest OpenLDAP:

[root@vm131 ~]# rpm -qa openldap\*
openldap-devel-2.4.39-8.el6.x86_64
openldap-2.4.39-8.el6.x86_64
openldap-debuginfo-2.4.23-26.el6_3.2.x86_64
openldap-clients-2.4.39-8.el6.x86_64

Hope this helps,

Paul

Comment 6 Jan Synacek 2015-01-29 10:43:36 UTC

OK, using ldaps:// was the important part. I reproduced this on the latest Fedora 21 (openldap-servers-2.4.40-2.fc21.x86_64) and I expect this to be a problem in the upstream version as well.

Thanks!

Comment 7 Matus Honek 2015-10-13 10:27:11 UTC

According to the upstream (see ITS#8047, attached) this is a known issue:
> This is a known issue - we don't have async connect/handshake APIs for 
> these crypto libraries.
Therefore I am closing this as WONTFIX for now.