Bug 488997 - nss_ldap won't failover to the next LDAP server on some network errors
nss_ldap won't failover to the next LDAP server on some network errors
Status: CLOSED DUPLICATE of bug 499302
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: nss_ldap (Show other bugs)
5.2
i686 Linux
low Severity medium
: rc
: ---
Assigned To: Nalin Dahyabhai
BaseOS QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-06 12:10 EST by Jan Kundrát
Modified: 2010-07-13 16:59 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-07-13 16:59:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
strace of `id kundratj` when the server responds correctly (ldap1.strace) (50.82 KB, text/plain)
2009-03-06 12:10 EST, Jan Kundrát
no flags Details
strace of `id kundratj` when the server won't respond (ldap2.strace) (20.34 KB, text/plain)
2009-03-06 12:11 EST, Jan Kundrát
no flags Details
backtrace from failed `id kundratj` (2.46 KB, text/plain)
2009-03-06 12:12 EST, Jan Kundrát
no flags Details

  None (edit)
Description Jan Kundrát 2009-03-06 12:10:03 EST
Created attachment 334322 [details]
strace of `id kundratj` when the server responds correctly (ldap1.strace)

(This problem has been encountered on a rebuild of RHEL, Scientific Linux 5.2. However, I believe it is not related to the packaging, and that RedHat customers will benefit from added robustness if this bug is fixed.)

nss_ldap-253-13.el5_2.1 doesn't handle network errors well. Due to an error in network configuration at our site, after the initial write to the freshly opened socket, subsequental reads would time out (poll() returning zero -- see the "ldap2" strace that will be attached). The nss_ldap library reacts to this with an assert() instead of a failover to the next LDAP server, as configured.

Not that simply rejecting or dropping connection using a local instance of iptables is not enough to reproduce this bug; one has to allow the initial write() to the socket to success, but block any folowing read()s.

I'm attaching strace of `id kundratj` when nss_ldap is configured to query a working LDAP server (ldap1.strace) and an LDAP server that is not responding (ldap2.strace), as well as a backtrace from when the query fails.
Comment 1 Jan Kundrát 2009-03-06 12:11:23 EST
Created attachment 334323 [details]
strace of `id kundratj` when the server won't respond (ldap2.strace)
Comment 2 Jan Kundrát 2009-03-06 12:12:01 EST
Created attachment 334324 [details]
backtrace from failed `id kundratj`
Comment 3 Nalin Dahyabhai 2010-07-01 11:40:55 EDT
If I'm reading this right, does this problem not happen if you turn off start_tls?  If so, then there's a good chance this was resolved in nss_ldap-253-22.el5 and later in the guise of bug #499302.
Comment 4 Nalin Dahyabhai 2010-07-13 16:59:03 EDT
Given that the strace output shows the client issuing a StartTLS exop before the assertion, I'm going to go ahead and mark this as a duplicate of bug #499302, which was the one we used to track the fix in 5.5.  Please detach and reopen this bug report if you find that this is still happening with 5.5's nss_ldap-253-25.el5 or a later version.  Thanks!

*** This bug has been marked as a duplicate of bug 499302 ***

Note You need to log in before you can comment on or make changes to this bug.