Bug 488997 - nss_ldap won't failover to the next LDAP server on some network errors
Summary: nss_ldap won't failover to the next LDAP server on some network errors
Keywords:
Status: CLOSED DUPLICATE of bug 499302
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: nss_ldap
Version: 5.2
Hardware: i686
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Nalin Dahyabhai
QA Contact: BaseOS QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-03-06 17:10 UTC by Jan Kundrát
Modified: 2010-07-13 20:59 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-07-13 20:59:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
strace of `id kundratj` when the server responds correctly (ldap1.strace) (50.82 KB, text/plain)
2009-03-06 17:10 UTC, Jan Kundrát
no flags Details
strace of `id kundratj` when the server won't respond (ldap2.strace) (20.34 KB, text/plain)
2009-03-06 17:11 UTC, Jan Kundrát
no flags Details
backtrace from failed `id kundratj` (2.46 KB, text/plain)
2009-03-06 17:12 UTC, Jan Kundrát
no flags Details

Description Jan Kundrát 2009-03-06 17:10:03 UTC
Created attachment 334322 [details]
strace of `id kundratj` when the server responds correctly (ldap1.strace)

(This problem has been encountered on a rebuild of RHEL, Scientific Linux 5.2. However, I believe it is not related to the packaging, and that RedHat customers will benefit from added robustness if this bug is fixed.)

nss_ldap-253-13.el5_2.1 doesn't handle network errors well. Due to an error in network configuration at our site, after the initial write to the freshly opened socket, subsequental reads would time out (poll() returning zero -- see the "ldap2" strace that will be attached). The nss_ldap library reacts to this with an assert() instead of a failover to the next LDAP server, as configured.

Not that simply rejecting or dropping connection using a local instance of iptables is not enough to reproduce this bug; one has to allow the initial write() to the socket to success, but block any folowing read()s.

I'm attaching strace of `id kundratj` when nss_ldap is configured to query a working LDAP server (ldap1.strace) and an LDAP server that is not responding (ldap2.strace), as well as a backtrace from when the query fails.

Comment 1 Jan Kundrát 2009-03-06 17:11:23 UTC
Created attachment 334323 [details]
strace of `id kundratj` when the server won't respond (ldap2.strace)

Comment 2 Jan Kundrát 2009-03-06 17:12:01 UTC
Created attachment 334324 [details]
backtrace from failed `id kundratj`

Comment 3 Nalin Dahyabhai 2010-07-01 15:40:55 UTC
If I'm reading this right, does this problem not happen if you turn off start_tls?  If so, then there's a good chance this was resolved in nss_ldap-253-22.el5 and later in the guise of bug #499302.

Comment 4 Nalin Dahyabhai 2010-07-13 20:59:03 UTC
Given that the strace output shows the client issuing a StartTLS exop before the assertion, I'm going to go ahead and mark this as a duplicate of bug #499302, which was the one we used to track the fix in 5.5.  Please detach and reopen this bug report if you find that this is still happening with 5.5's nss_ldap-253-25.el5 or a later version.  Thanks!

*** This bug has been marked as a duplicate of bug 499302 ***


Note You need to log in before you can comment on or make changes to this bug.