Created attachment 334322 [details] strace of `id kundratj` when the server responds correctly (ldap1.strace) (This problem has been encountered on a rebuild of RHEL, Scientific Linux 5.2. However, I believe it is not related to the packaging, and that RedHat customers will benefit from added robustness if this bug is fixed.) nss_ldap-253-13.el5_2.1 doesn't handle network errors well. Due to an error in network configuration at our site, after the initial write to the freshly opened socket, subsequental reads would time out (poll() returning zero -- see the "ldap2" strace that will be attached). The nss_ldap library reacts to this with an assert() instead of a failover to the next LDAP server, as configured. Not that simply rejecting or dropping connection using a local instance of iptables is not enough to reproduce this bug; one has to allow the initial write() to the socket to success, but block any folowing read()s. I'm attaching strace of `id kundratj` when nss_ldap is configured to query a working LDAP server (ldap1.strace) and an LDAP server that is not responding (ldap2.strace), as well as a backtrace from when the query fails.
Created attachment 334323 [details] strace of `id kundratj` when the server won't respond (ldap2.strace)
Created attachment 334324 [details] backtrace from failed `id kundratj`
If I'm reading this right, does this problem not happen if you turn off start_tls? If so, then there's a good chance this was resolved in nss_ldap-253-22.el5 and later in the guise of bug #499302.
Given that the strace output shows the client issuing a StartTLS exop before the assertion, I'm going to go ahead and mark this as a duplicate of bug #499302, which was the one we used to track the fix in 5.5. Please detach and reopen this bug report if you find that this is still happening with 5.5's nss_ldap-253-25.el5 or a later version. Thanks! *** This bug has been marked as a duplicate of bug 499302 ***