Bug 206399 - messagebus fails to start if the system is configured with ldap authentication
Summary: messagebus fails to start if the system is configured with ldap authentication
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: nss_ldap
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Nalin Dahyabhai
QA Contact:
URL:
Whiteboard: bzcl34nup
Depends On:
Blocks: 484489
TreeView+ depends on / blocked
 
Reported: 2006-09-14 04:38 UTC by Demosthenes T. Mateo Jr.
Modified: 2009-02-07 09:42 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 484489 (view as bug list)
Environment:
Last Closed: 2008-05-07 00:50:42 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Demosthenes T. Mateo Jr. 2006-09-14 04:38:26 UTC
Description of problem:
The script /etc/init.d/messagebus hangs during bootup if the system is
configured for LDAP authentication. Disabling LDAP authentication solves the
problem.

Version-Release number of selected component (if applicable):
dbus-0.61-3.fc5.1

How reproducible:
Always.

Steps to Reproduce:
1. configured LDAP server
2. authconfig and enable LDAP authentication to local LDAP server
3. reboot the system

  
Actual results:
System hangs during messagebus startup.

Expected results:
System should not hang.

Additional info:

Comment 1 Jerry James 2007-01-23 16:18:09 UTC
I'm seeing something similar in FC6, except that the system doesn't actually
hang.  If you let it run long enough, the messagebus eventually starts.  The
problem is that the /etc/rc.d/init.d/ldap script has priority 27 and
/etc/rc.d/init.d/messagebus has priority 22.  Therefore, when the messagebus
starts up, it repeatedly tries to contact the not-yet-running LDAP server,
continuing on only after all attempts have timed out.  Looking in
/var/log/messages shows lots of lines like this:

Jan 22 12:47:24 abbott rpc.statd[2802]: nss_ldap: reconnecting to LDAP server (s
leeping 8 seconds)...
Jan 22 12:47:25 abbott dbus-daemon: nss_ldap: reconnecting to LDAP server (sleep
ing 8 seconds)...
Jan 22 12:47:32 abbott rpc.statd[2802]: nss_ldap: reconnecting to LDAP server (s
leeping 16 seconds)...
Jan 22 12:47:33 abbott dbus-daemon: nss_ldap: reconnecting to LDAP server (sleep
ing 16 seconds)...

with the number of seconds slept doubling each time until it reaches 64.  The
script priorities need to be adjusted to fix this.

Comment 2 Jerry James 2007-01-26 16:31:54 UTC
I examined all scripts with priorities from 21 to 27, inclusive, and concluded
that the ldap server depends on none of them.  Therefore, I changed the priority
of /etc/rc.d/init/ldap to 21.  My system booted up quickly and with no failures.
 I recommend this change.

The version on this bug should be changed to fc6, but I can't do it.


Comment 3 Carwyn Edwards 2007-01-28 13:56:13 UTC
I can confirm that this bug persists in FC6. Dbus is still triggering lookups to ldap and waiting for a 
timeout. My ldap.conf contains:

nss_initgroups_ignoreusers root, ldap, named, avahi, haldaemon

.. but it is still looking for something else in LDAP.

Note that bug #186527 also refers to this problem.

Comment 4 Jarod Wilson 2007-02-08 14:56:32 UTC
My vote is to set openldap to start/stop at 21/79 instead of 27/73...

Comment 5 Jan Safranek 2007-05-22 13:06:40 UTC
Please add "dbus" to "nss_initgroups_ignoreusers" in /etc/ldap.conf:

nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon,dbus


Comment 6 Alan 2007-08-14 22:34:36 UTC
confirmed that this bug exists in RHEL5.

I saw this bug when trying to connect to a remote ldap server when my network
was not available on boot.  just hung there, far longer than the 2 minute
bind_timelimit in /etc/ldap.conf

it should also be noted that this might be more serious than "medium" as it can
prevent a production server from reaching a remote-accessible state.
maybe a new version of nss_ldap needs to be released that actually adds dbus to
the ignoreusers line by default so that people don't have this problem at all.


Comment 7 Anthony Messina 2007-08-23 23:14:39 UTC
If a server *is* the LDAP server for the LAN and you have set the host or uri to
localhost or the local host name, there are other services besides dbus that
will hold up the boot such as nfslock and rpcbind.

The questions is, do any and all services which have a username other than those
already defined in nsswitch.conf need to have those usernames added there?

Also, what is the purpose of "Local authorization is sufficient for local
users"? Isn't that supposed to say not to look past the local files if that user
is present?

I ask these questions and report that I have the same issue on Fedora Core 7
updated fully as of the date of this post.

Comment 8 Nalin Dahyabhai 2007-09-13 15:18:53 UTC
(In reply to comment #5)
> Please add "dbus" to "nss_initgroups_ignoreusers" in /etc/ldap.conf:
> 
> nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon,dbus

I'm pushing an update for Fedora 6 and 7 which sets that default list to
 root,ldap,named,avahi,haldaemon,dbus,radvd,tomcat,radiusd,news,mailman,nscd
which I think will improve things somewhat.

(In reply to comment #7)
> Also, what is the purpose of "Local authorization is sufficient for local
> users"? Isn't that supposed to say not to look past the local files if that user
> is present?

The setting only affects the PAM configuration, so it can't help us with this
problem.  The phrasing is perhaps less then perfect, though, but we're limited
by screen real estate in text mode there.

Comment 9 Anthony Messina 2007-09-29 04:26:32 UTC
Does "rpcuser" need to be added to the list?  Or I guess the big question is, 
does any user for any service that starts after ldap need to be defined in the 
nss_initgroups_ignoreusers?

Comment 10 Nalin Dahyabhai 2007-10-01 15:19:09 UTC
It shouldn't need to be, if I'm understanding the question right.  A service
which starts after the directory server should be able to contact it without any
difficulty.
It may not be needed even if a service starts before the directory server -- the
connection won't be attempted unless the daemon uses initgroups() to initialize
its supplemental groups list.  Others may simply call setgroups() to use an
empty supplemental groups list, and that operation doesn't require that any
information be looked up.

Comment 11 Richard Bullington-McGuire 2007-10-01 15:26:28 UTC
Nalin Dahyabhai (nalin) wrote:

> It shouldn't need to be, if I'm understanding the question right.  A service
which starts after the directory server should be able to contact it without any
difficulty.

This assumes that 1) the directory server that the other services depend on
resides on the same machine, and that 2) the directory server starts correctly
and cleanly. If either of these assumptions are not true, then you could see a
cascade of failures.

A good example is a computer that uses LDAP for authentication and name service
lookups that is a client only. If the client can not see the LDAP server on
boot, its boot will stall. This can happen if the client is at a remote site and
the network has a temporary interruption, if all the LDAP servers are down, or
for any number of other reasons.

Please fix this so that all services start up correctly even if the LDAP server
referenced in /etc/ldap.conf is unavailable.



Comment 12 Nalin Dahyabhai 2007-10-01 15:44:38 UTC
Well, yes, that is a problem, but the server-is-its-own-client case is the one
I'm really concerned about, because that's not a transient failure you can fix
by fixing a network connection or bringing the directory server back up.

Attempting to add every service user to that list (by name, because that's what
the calling application passes to initgroups()) can't scale to the entire
universe of possible packages.  I don't think we can win with that as a
long-term plan.

Comment 13 Anthony Messina 2007-10-12 16:52:25 UTC
even with rpcuser defined in the list of nss_initgroups_ignoreusers, i get at 
startup:

Oct 11 07:20:04 chicago rpc.statd[2237]: Version 1.1.0 Starting
Oct 11 07:20:05 chicago rpc.statd[2237]: nss_ldap: failed to bind to LDAP 
server ldap://127.0.0.1/: Can't contact LDAP server
Oct 11 07:20:05 chicago rpc.statd[2237]: nss_ldap: could not search LDAP 
server - Server is unavailable
Oct 11 07:20:05 chicago sm-notify[2239]: nss_ldap: failed to bind to LDAP 
server ldap://127.0.0.1/: Can't contact LDAP server
Oct 11 07:20:05 chicago sm-notify[2239]: nss_ldap: could not search LDAP 
server - Server is unavailable
Oct 11 07:20:05 chicago sm-notify[2239]: sm-notify running as root. 
chown /var/lib/nfs/sm to choose different user
Oct 11 07:20:06 chicago Backgrounding to notify hosts...
Oct 11 07:20:06 chicago rpc.statd[2237]: nss_ldap: failed to bind to LDAP 
server ldap://127.0.0.1/: Can't contact LDAP server
Oct 11 07:20:06 chicago rpc.statd[2237]: nss_ldap: could not search LDAP 
server - Server is unavailable
Oct 11 07:20:06 chicago rpc.statd[2237]: nss_ldap: failed to bind to LDAP 
server ldap://127.0.0.1/: Can't contact LDAP server
Oct 11 07:20:06 chicago rpc.statd[2237]: nss_ldap: could not search LDAP 
server - Server is unavailable


Comment 14 Nalin Dahyabhai 2007-10-12 21:42:17 UTC
rpc.statd's a different program, and as it happens, it's probably a different
bug.  In the case of the nfs services, the daemons are getting stuck resolving
service entries for "rquotad" and "mountd", and removing "ldap" from the list of
sources consulted for "services" in /etc/nsswitch.conf keeps it from happening.
 (That change has been made in authconfig-5.3.15-1 and newer.)

Comment 15 Richard Bullington-McGuire 2007-10-18 15:18:46 UTC
Nalin wrote:

> Attempting to add every service user to that list (by name, because that's what
the calling application passes to initgroups()) can't scale to the entire
universe of possible packages.  I don't think we can win with that as a
long-term plan.

You can't scale this to the entire universe of available packages, but you can
certainly scale this to the set of packages that are part of a single
distribution. That set is finite. You can even have a real test plan for making
sure that the bug does not regress:

* Install the distribution with everything
* Enable all services
* Set ldap auth to pull from an LDAP server that is not going to be there
* Reboot

Consider this code sketch to be in the public domain:

#!/bin/sh
# Test that LDAP authentication does not make the system hang when an LDAP
server is unavailable
# Use runlevel 4 for "run everything"
SERVICES=`chkconfig --list | awk '/0:/{print $1}'`
for x in $SERVICES do; chkconfig --level 4 $x on; done
sed -i 's/id:3:initdefault:/id:4:initdefault:/' /etc/inittab
authconfig --kickstart --enablecache --enableshadow --enablemd5 --enableldap
--enableldapauth --enableldapssl --ldapserver bogus-ldap.example.com
--ldapbasedn ou=fixmeplease,dc=redhat,dc=com
/sbin/init 6


Comment 16 Daniel Qarras 2007-12-27 16:10:21 UTC
FWIW, nss_ldap has been forked and the new incarnation, nss-ldapd, promises to
fix many of the issues found with nss_ldap. Please see

http://ch.tudelft.nl/~arthur/nss-ldapd/
http://ch.tudelft.nl/~arthur/nss-ldapd/design.html

Comment 17 Nalin Dahyabhai 2008-01-02 16:36:29 UTC
FWIW, I like the idea and design of nss-ldapd (*really* like the design, because
it's the best way to solve the problems the author describes).  But any removal
of features (whether I personally care for them or not) means it's not going to
fly as a direct replacement, at least not in the short-term.

Comment 18 Bug Zapper 2008-04-03 18:13:02 UTC
Based on the date this bug was created, it appears to have been reported
against rawhide during the development of a Fedora release that is no
longer maintained. In order to refocus our efforts as a project we are
flagging all of the open bugs for releases which are no longer
maintained. If this bug remains in NEEDINFO thirty (30) days from now,
we will automatically close it.

If you can reproduce this bug in a maintained Fedora version (7, 8, or
rawhide), please change this bug to the respective version and change
the status to ASSIGNED. (If you're unable to change the bug's version
or status, add a comment to the bug and someone will change it for you.)

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

Comment 19 Bug Zapper 2008-05-07 00:50:40 UTC
This bug has been in NEEDINFO for more than 30 days since feedback was
first requested. As a result we are closing it.

If you can reproduce this bug in the future against a maintained Fedora
version please feel free to reopen it against that version.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

Comment 20 Daniel Qarras 2008-05-07 17:21:14 UTC
FWIW, I tried this with Fedora 9 and the problem seems to be fixed.

Comment 21 Daniel Qarras 2008-05-07 17:36:01 UTC
Hmm, well, actually I tried only with messagebus and other "basic", no rpc*
services were started up. So what was initially reported seems now to be working
but there are other scenarios mentioned, too, but I can't comment on them.
Perhaps a new bug might be appropriate for them if the problems exist.

Thanks.

Comment 22 Zenon Panoussis 2008-09-06 20:14:49 UTC
On an RHEL 4.4 box with LDAP authentication enabled, ldap takes ages (~10 minutes) to start because the ldap init script calls nss_ldap, which keeps trying to connect to the to-be-started slapd. Perfectly circular, in other words. 

Setting "bind_policy soft" in /etc/ldap.conf solves this and all the other related problems.


Note You need to log in before you can comment on or make changes to this bug.