488997 – nss_ldap won't failover to the next LDAP server on some network errors

Bug 488997 - nss_ldap won't failover to the next LDAP server on some network errors

Summary: nss_ldap won't failover to the next LDAP server on some network errors

Keywords:
Status:	CLOSED DUPLICATE of bug 499302
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	nss_ldap
Sub Component:
Version:	5.2
Hardware:	i686
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Nalin Dahyabhai
QA Contact:	BaseOS QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-03-06 17:10 UTC by Jan Kundrát
Modified:	2010-07-13 20:59 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-07-13 20:59:03 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
strace of `id kundratj` when the server responds correctly (ldap1.strace) (50.82 KB, text/plain) 2009-03-06 17:10 UTC, Jan Kundrát	no flags	Details
strace of `id kundratj` when the server won't respond (ldap2.strace) (20.34 KB, text/plain) 2009-03-06 17:11 UTC, Jan Kundrát	no flags	Details
backtrace from failed `id kundratj` (2.46 KB, text/plain) 2009-03-06 17:12 UTC, Jan Kundrát	no flags	Details
View All

Description Jan Kundrát 2009-03-06 17:10:03 UTC

Created attachment 334322 [details]
strace of `id kundratj` when the server responds correctly (ldap1.strace)

(This problem has been encountered on a rebuild of RHEL, Scientific Linux 5.2. However, I believe it is not related to the packaging, and that RedHat customers will benefit from added robustness if this bug is fixed.)

nss_ldap-253-13.el5_2.1 doesn't handle network errors well. Due to an error in network configuration at our site, after the initial write to the freshly opened socket, subsequental reads would time out (poll() returning zero -- see the "ldap2" strace that will be attached). The nss_ldap library reacts to this with an assert() instead of a failover to the next LDAP server, as configured.

Not that simply rejecting or dropping connection using a local instance of iptables is not enough to reproduce this bug; one has to allow the initial write() to the socket to success, but block any folowing read()s.

I'm attaching strace of `id kundratj` when nss_ldap is configured to query a working LDAP server (ldap1.strace) and an LDAP server that is not responding (ldap2.strace), as well as a backtrace from when the query fails.

Comment 1 Jan Kundrát 2009-03-06 17:11:23 UTC

Created attachment 334323 [details]
strace of `id kundratj` when the server won't respond (ldap2.strace)

Comment 2 Jan Kundrát 2009-03-06 17:12:01 UTC

Created attachment 334324 [details]
backtrace from failed `id kundratj`

Comment 3 Nalin Dahyabhai 2010-07-01 15:40:55 UTC

If I'm reading this right, does this problem not happen if you turn off start_tls?  If so, then there's a good chance this was resolved in nss_ldap-253-22.el5 and later in the guise of bug #499302.

Comment 4 Nalin Dahyabhai 2010-07-13 20:59:03 UTC

Given that the strace output shows the client issuing a StartTLS exop before the assertion, I'm going to go ahead and mark this as a duplicate of bug #499302, which was the one we used to track the fix in 5.5.  Please detach and reopen this bug report if you find that this is still happening with 5.5's nss_ldap-253-25.el5 or a later version.  Thanks!

*** This bug has been marked as a duplicate of bug 499302 ***

Note You need to log in before you can comment on or make changes to this bug.