Bug 1186559

Summary: OpenLDAP does not time out correctly on stalled LDAPS connections
Product: Red Hat Enterprise Linux 5 Reporter: Paul Wayper <pwayper>
Component: openldapAssignee: Jan Synacek <jsynacek>
Status: CLOSED WONTFIX QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 5.8CC: ebenes, jsynacek
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1186562 (view as bug list) Environment:
Last Closed: 2015-05-28 12:59:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Paul Wayper 2015-01-28 01:18:08 UTC
Description of problem:

When a remote LDAP server stalls, new connections can be made but the SSL handshake never completes.  This is not correctly detected in OpenLDAP, causing it to ignore the set timeouts on the LDAP handle.

Version-Release number of selected component (if applicable):

openldap-2.3.43-25.el5

How reproducible:

Always.

Steps to Reproduce:

In order to reproduce this problem, the following setup is used:

Client running nslcd (ssh://root.211.131) using LDAPS URI for server.
Server running RHDS or IPA. (ssh://root.234.250)
Client uses the following timeouts:

bind_timelimit 2
timelimit 4
idle_timelimit 7
reconnect_retrytime 2

1. On client, request 'id' information for user in LDAP directory.  Verify that this works.
2. On server, issue 'kill -STOP $LDAP_PROCESS_ID'.
3. On client, request 'id' of same user.

Actual results:

id process takes one minute to time out.  (This is hard coded in nslcd as the timeout on the communication between nslcd and PAM).

Expected results:

id process times out after 2 seconds.

Additional info:

After the server issues a 'kill -CONT $LDAP_PROCESS_ID', the connection will work as normal.

What is happening here is:

A) client sends SYN, gets SYN ACK, sends ACK.  server's kernel TCP stack has handled this so far.
B) client tries to initiate SSL connection.
C) server process never handles the SSL connection (in this case because it's stopped by the kernel, but this can also happen if the process has stopped for other reasons).
D) yet client has not yet sent the BIND request and is therefore not in the bind_timeout period.

Therefore, we propose that the bind timeout should apply from the start of the connection process, before the SSL handshake process starts.