Bug 180295

Summary: OpenLDAP script makes LDAP request
Product: [Fedora] Fedora Reporter: W. Michael Petullo <redhat>
Component: openldapAssignee: Jay Fenlason <fenlason>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: jfeeney
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-02-15 20:13:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 180657    
Bug Blocks:    

Description W. Michael Petullo 2006-02-06 22:43:46 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.7.12) Gecko/20060205 Fedora/1.7.12-4

Description of problem:
Recent versions of /etc/init.d/ldap seem to contain an odd cache-22.  The script executes the following line when starting LDAP:

/sbin/runuser -f -m -s /bin/sh -c "test -r /etc/pki/tls/certs/ca-bundle.crt -- ldap"

It seems that if this is run on a system configured to use ldap for NSS, then the above line queries LDAP.  LDAP has not started yet.  This causes the execution of test to hang for 128 seconds as the LDAP query times out.

Version-Release number of selected component (if applicable):
openldap-servers-2.3.19-2

How reproducible:
Always

Steps to Reproduce:
1.  Add "group: files ldap" to /etc/nsswitch.conf.

2.  /etc/init.d/ldap start.

3.  Remove ldap from /etc/nsswitch.conf.

4.  /etc/init.d/ldap start.

Actual Results:  After step 2, the ldap script hangs and I see the following in the logs:

Feb  6 16:45:53 golem runuser: nss_ldap: failed to bind to LDAP server ldaps://golem.flyn.org: Can't contact LDAP server
Feb  6 16:45:53 golem runuser: nss_ldap: reconnecting to LDAP server (sleeping 32 seconds)...

Also, "id ldap" hangs.

After 128 seconds, the hang times out and slapd starts.

After step 4, the ldap script completes and slapd starts.

Expected Results:  The ldap service should start quickly.

Additional info:

Comment 1 Jay Fenlason 2006-02-07 16:53:00 UTC
If you don't have "files" in your /etc/nsswitch.conf for passwd, shadow, and 
group, you'll encounter this problem well before init gets around to trying to 
start the ldap server.  You need at least the root, ldap, and nobody users in 
your /etc/passwd, and the root, ldap, utmp, tty, disk, lp, nobody, floppy and 
uucp groups in your /etc/shadow.  To minimize the timeout problem, you can put 
a shorter bind_timelimit in your /etc/ldap.conf, and "bind_policy soft", to 
have nss_ldap fail immediately when it can't bind, instead of retrying for 
not-quite-forever. 
 
If runuser tries to enumerate all users instead of just the one listed, that 
could be a bug in runuser. 
 
What's in your /etc/ldap.conf, /etc/nsswitch.conf, /etc/passwd and /etc/group 
? 

Comment 2 W. Michael Petullo 2006-02-07 22:54:57 UTC
/etc/nsswitch.conf contains:

...
group:      files ldap
...

The requisite groups are all in /etc/passwd and /etc/group.

/etc/ldap.conf contains:

base dc=flyn,dc=org
uri ldaps://golem.flyn.org
ssl start_tls
ssl on

I suspect that the problem is that runuser uses a sub-optimal system call to
determine NS information.  I have a feeling it is the use of the initgroups
function in su.c.

Specifying "bind_policy soft" does get rid of the hang.  However, this is not
really a solution.  Certainly, a running LDAP server should not be required to
start the LDAP server!

Comment 3 Jay Fenlason 2006-02-09 19:28:30 UTC
After some discussion here, it looks like the best solution is to add  
nss_initgroups_ignoreusers ldap  
to /etc/ldap.conf, so nss_ldap won't attempt to query the LDAP server for  
group membership for the ldap user.  I've opened a bug against nss_ldap  
requesting that this be done in the default /etc/ldap.conf  
  
Note that you need a recent nss_ldap to have this flag, but Nalin assures me  
it's already in rawhide.  
  
Can you try this in your configuration and confirm that it works? 

Comment 4 W. Michael Petullo 2006-02-09 23:21:34 UTC
"nss_initgroups_ignoreusers ldap" seems to work around this problem.