470003 – The nss_ldap module fails directly after server connection has been lost

Bug 470003 - The nss_ldap module fails directly after server connection has been lost

Summary: The nss_ldap module fails directly after server connection has been lost

Keywords:
Status:	CLOSED DUPLICATE of bug 636656
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	nss_ldap
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Nalin Dahyabhai
QA Contact:	BaseOS QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	636656
TreeView+	depends on / blocked

Reported:	2008-11-05 07:40 UTC by Peter Åstrand
Modified:	2011-01-04 12:52 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	636656 (view as bug list)
Environment:
Last Closed:	2010-09-22 20:08:47 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
getpwnam-twice.c - test case (880 bytes, text/plain) 2008-11-05 07:42 UTC, Peter Åstrand	no flags	Details
View All

Description Peter Åstrand 2008-11-05 07:40:44 UTC

Description of problem:
If the LDAP server connection is lost, the next getpwnam() call will report an error: Returns NULL and sets errno=ENOTCONN. Yet another getpwnam() will restore the server connection, though, and will succeed. 

This is wrong. Applications shouldn't need to call getpwnam() twice; managing the LDAP server connection should be internal to the nss_ldap module. Also, errno shouldn't be set to ENOTCONN: this is not a documented errno for getpwnam; applications might not be prepared for it. 


Version-Release number of selected component (if applicable):
nss_ldap-253-13.el5_2.1

Comment 1 Peter Åstrand 2008-11-05 07:42:02 UTC

Created attachment 322527 [details]
getpwnam-twice.c - test case   	

Reproduce by starting test case and then restarting the LDAP server while getpwnam-twice is sleeping. The next getpwnam() call will fail.

Comment 2 Pierre Ossman 2010-06-03 14:45:52 UTC

Been digging a bit more into this and it turns out that nss_ldap isn't buggy
per se, just horribly user unfriendly in its current form.

The key culprit here is the "bind_policy" setting. It basically has two values,
hard and soft, and both of them cause serious problems:

 - "hard" means that nss_ldap will block any operation until it can reach a
working ldap server. This makes machine hang at boot when a nss call is made
before the network is up. The common workaround is nss_initgroups_ignoreuser
but that's a bit error prone.

 - "soft" means that nss_ldap will return failure to the calling application if
the ldap server cannot be reached. Apparently this also means that it won't
retry a dropped connection, which is what this bug is all about.

So currently we're left with two bad choices here. The setup that works best is
"hard" with a perfectly configured nss_initgroups_ignoreuser. Such a setup is
very distribution and system specific though.

It would be nice if we could make nss_ldap have a "bind_policy" that better matches real world scenarios. Just being able to differentiate behaviour for new connections and existing ones that have gone down would be a big improvement.

Comment 3 Dmitri Pal 2010-06-04 17:37:15 UTC

SSSD http://fedorahosted.org/sssd/ is coming soon to RHEL 5.x but it is also available from EPEL (SSSD 1.2 is about to land in EPEL any day). It solves a lot of prblems nss_ldap has. Would you consider trying it?

Comment 6 Dmitri Pal 2010-09-22 20:08:47 UTC

This issues is addressed by SSSD component. 
See https://bugzilla.redhat.com/show_bug.cgi?id=636656

Comment 7 Dmitri Pal 2010-09-22 21:00:31 UTC


*** This bug has been marked as a duplicate of bug 636656 ***

Note You need to log in before you can comment on or make changes to this bug.