Description of problem: If the LDAP server connection is lost, the next getpwnam() call will report an error: Returns NULL and sets errno=ENOTCONN. Yet another getpwnam() will restore the server connection, though, and will succeed. This is wrong. Applications shouldn't need to call getpwnam() twice; managing the LDAP server connection should be internal to the nss_ldap module. Also, errno shouldn't be set to ENOTCONN: this is not a documented errno for getpwnam; applications might not be prepared for it. Version-Release number of selected component (if applicable): nss_ldap-253-13.el5_2.1
Created attachment 322527 [details] getpwnam-twice.c - test case Reproduce by starting test case and then restarting the LDAP server while getpwnam-twice is sleeping. The next getpwnam() call will fail.
Been digging a bit more into this and it turns out that nss_ldap isn't buggy per se, just horribly user unfriendly in its current form. The key culprit here is the "bind_policy" setting. It basically has two values, hard and soft, and both of them cause serious problems: - "hard" means that nss_ldap will block any operation until it can reach a working ldap server. This makes machine hang at boot when a nss call is made before the network is up. The common workaround is nss_initgroups_ignoreuser but that's a bit error prone. - "soft" means that nss_ldap will return failure to the calling application if the ldap server cannot be reached. Apparently this also means that it won't retry a dropped connection, which is what this bug is all about. So currently we're left with two bad choices here. The setup that works best is "hard" with a perfectly configured nss_initgroups_ignoreuser. Such a setup is very distribution and system specific though. It would be nice if we could make nss_ldap have a "bind_policy" that better matches real world scenarios. Just being able to differentiate behaviour for new connections and existing ones that have gone down would be a big improvement.
SSSD http://fedorahosted.org/sssd/ is coming soon to RHEL 5.x but it is also available from EPEL (SSSD 1.2 is about to land in EPEL any day). It solves a lot of prblems nss_ldap has. Would you consider trying it?
This issues is addressed by SSSD component. See https://bugzilla.redhat.com/show_bug.cgi?id=636656
*** This bug has been marked as a duplicate of bug 636656 ***