I have a RHEL 5 (fully up-to-date as of 2010-12-10) server that uses OpenLDAP for the user database. Authentication/user lookup works fine most of the time, but periodically (maybe once or twice per day), a daemon will log a burst of messages like (server name changed):
Dec 9 22:18:10 fly restorecond: nss_ldap: failed to bind to LDAP server ldapi:///: Invalid credentials
Dec 9 22:18:10 fly restorecond: nss_ldap: failed to bind to LDAP server ldaps://<backupserver>.hiwaay.net/: Invalid credentials
Sometimes the program logging the problem is dovecot-auth (version 1.2.11 compiled locally from Fedora updates), but since restorecond logged it as well, it appears to be an nss_ldap problem.
When this happens, the calling program thinks the users it is looking up don't exist (so for example, dovecot's deliver bounces emails as "unknown user", which is a major problem).
The LDAP server doesn't log any errors when this is happening. I don't know what triggers the problem or what makes it go away after a few seconds to few minutes. It might be happening more when the server is busy (I see more instances during nightly backups for example).
One other thing: I said multiple programs log this, but only one logs it at a time. For example, I got a burst of errors from dovecot-auth yesterday at 16:15, 17:31, and 23:41-23:42. I got errors from restorecond at 22:15-22:18.
Is it possible that nss_ldap has some internal resource leak (but eventually resets itself)?
It seems that you have an intermittent failure with your LDAP connection. We suggest that you consider taking a look at SSSD.
Based on the information in the ticket it is hard to try to indetify what is going wrong. It might be caused by intermittent network outages or issues in the unerlaying LDAP library.
Since it is not possible to reproduce we will not address this issue. Please let us know and reopen if you have additional information that would allow us to reproduce. However SSSD is really a much better solution for the cases when the intermittent network failures are frequent, please consider.
Development Management has reviewed and declined this request. You may appeal
this decision by reopening this request.
The problem is certainly not a network outage, since the primary OpenLDAP server is on the same host and is accessed via the Unix domain socket (ldapi:///). There is no indication of any network outage (the secondary server is connected to the same switch, both are on the same subnet/VLAN, and there are no errors on any of the interfaces).
SSSD is not a solution, given the performance problems I saw when trying it on RHEL 6 (BZ 664071, "hopefully" fixed in a new version). SSSD is also lacking tools to manage the cache (such as invaldating an entry, like "nscd -i passwd <deleted-user>").
Also, if the problem is in the underlying LDAP library, switching to SSSD wouldn't help (since it still uses the same OpenLDAP client library).
Pleas open the bug with the ldap library. It does not seem to be the case with nss_ldap.