Description of problem: The script /etc/init.d/messagebus hangs during bootup if the system is configured for LDAP authentication. Disabling LDAP authentication solves the problem. Version-Release number of selected component (if applicable): dbus-0.61-3.fc5.1 How reproducible: Always. Steps to Reproduce: 1. configured LDAP server 2. authconfig and enable LDAP authentication to local LDAP server 3. reboot the system Actual results: System hangs during messagebus startup. Expected results: System should not hang. Additional info:
I'm seeing something similar in FC6, except that the system doesn't actually hang. If you let it run long enough, the messagebus eventually starts. The problem is that the /etc/rc.d/init.d/ldap script has priority 27 and /etc/rc.d/init.d/messagebus has priority 22. Therefore, when the messagebus starts up, it repeatedly tries to contact the not-yet-running LDAP server, continuing on only after all attempts have timed out. Looking in /var/log/messages shows lots of lines like this: Jan 22 12:47:24 abbott rpc.statd[2802]: nss_ldap: reconnecting to LDAP server (s leeping 8 seconds)... Jan 22 12:47:25 abbott dbus-daemon: nss_ldap: reconnecting to LDAP server (sleep ing 8 seconds)... Jan 22 12:47:32 abbott rpc.statd[2802]: nss_ldap: reconnecting to LDAP server (s leeping 16 seconds)... Jan 22 12:47:33 abbott dbus-daemon: nss_ldap: reconnecting to LDAP server (sleep ing 16 seconds)... with the number of seconds slept doubling each time until it reaches 64. The script priorities need to be adjusted to fix this.
I examined all scripts with priorities from 21 to 27, inclusive, and concluded that the ldap server depends on none of them. Therefore, I changed the priority of /etc/rc.d/init/ldap to 21. My system booted up quickly and with no failures. I recommend this change. The version on this bug should be changed to fc6, but I can't do it.
I can confirm that this bug persists in FC6. Dbus is still triggering lookups to ldap and waiting for a timeout. My ldap.conf contains: nss_initgroups_ignoreusers root, ldap, named, avahi, haldaemon .. but it is still looking for something else in LDAP. Note that bug #186527 also refers to this problem.
My vote is to set openldap to start/stop at 21/79 instead of 27/73...
Please add "dbus" to "nss_initgroups_ignoreusers" in /etc/ldap.conf: nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon,dbus
confirmed that this bug exists in RHEL5. I saw this bug when trying to connect to a remote ldap server when my network was not available on boot. just hung there, far longer than the 2 minute bind_timelimit in /etc/ldap.conf it should also be noted that this might be more serious than "medium" as it can prevent a production server from reaching a remote-accessible state. maybe a new version of nss_ldap needs to be released that actually adds dbus to the ignoreusers line by default so that people don't have this problem at all.
If a server *is* the LDAP server for the LAN and you have set the host or uri to localhost or the local host name, there are other services besides dbus that will hold up the boot such as nfslock and rpcbind. The questions is, do any and all services which have a username other than those already defined in nsswitch.conf need to have those usernames added there? Also, what is the purpose of "Local authorization is sufficient for local users"? Isn't that supposed to say not to look past the local files if that user is present? I ask these questions and report that I have the same issue on Fedora Core 7 updated fully as of the date of this post.
(In reply to comment #5) > Please add "dbus" to "nss_initgroups_ignoreusers" in /etc/ldap.conf: > > nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon,dbus I'm pushing an update for Fedora 6 and 7 which sets that default list to root,ldap,named,avahi,haldaemon,dbus,radvd,tomcat,radiusd,news,mailman,nscd which I think will improve things somewhat. (In reply to comment #7) > Also, what is the purpose of "Local authorization is sufficient for local > users"? Isn't that supposed to say not to look past the local files if that user > is present? The setting only affects the PAM configuration, so it can't help us with this problem. The phrasing is perhaps less then perfect, though, but we're limited by screen real estate in text mode there.
Does "rpcuser" need to be added to the list? Or I guess the big question is, does any user for any service that starts after ldap need to be defined in the nss_initgroups_ignoreusers?
It shouldn't need to be, if I'm understanding the question right. A service which starts after the directory server should be able to contact it without any difficulty. It may not be needed even if a service starts before the directory server -- the connection won't be attempted unless the daemon uses initgroups() to initialize its supplemental groups list. Others may simply call setgroups() to use an empty supplemental groups list, and that operation doesn't require that any information be looked up.
Nalin Dahyabhai (nalin) wrote: > It shouldn't need to be, if I'm understanding the question right. A service which starts after the directory server should be able to contact it without any difficulty. This assumes that 1) the directory server that the other services depend on resides on the same machine, and that 2) the directory server starts correctly and cleanly. If either of these assumptions are not true, then you could see a cascade of failures. A good example is a computer that uses LDAP for authentication and name service lookups that is a client only. If the client can not see the LDAP server on boot, its boot will stall. This can happen if the client is at a remote site and the network has a temporary interruption, if all the LDAP servers are down, or for any number of other reasons. Please fix this so that all services start up correctly even if the LDAP server referenced in /etc/ldap.conf is unavailable.
Well, yes, that is a problem, but the server-is-its-own-client case is the one I'm really concerned about, because that's not a transient failure you can fix by fixing a network connection or bringing the directory server back up. Attempting to add every service user to that list (by name, because that's what the calling application passes to initgroups()) can't scale to the entire universe of possible packages. I don't think we can win with that as a long-term plan.
even with rpcuser defined in the list of nss_initgroups_ignoreusers, i get at startup: Oct 11 07:20:04 chicago rpc.statd[2237]: Version 1.1.0 Starting Oct 11 07:20:05 chicago rpc.statd[2237]: nss_ldap: failed to bind to LDAP server ldap://127.0.0.1/: Can't contact LDAP server Oct 11 07:20:05 chicago rpc.statd[2237]: nss_ldap: could not search LDAP server - Server is unavailable Oct 11 07:20:05 chicago sm-notify[2239]: nss_ldap: failed to bind to LDAP server ldap://127.0.0.1/: Can't contact LDAP server Oct 11 07:20:05 chicago sm-notify[2239]: nss_ldap: could not search LDAP server - Server is unavailable Oct 11 07:20:05 chicago sm-notify[2239]: sm-notify running as root. chown /var/lib/nfs/sm to choose different user Oct 11 07:20:06 chicago Backgrounding to notify hosts... Oct 11 07:20:06 chicago rpc.statd[2237]: nss_ldap: failed to bind to LDAP server ldap://127.0.0.1/: Can't contact LDAP server Oct 11 07:20:06 chicago rpc.statd[2237]: nss_ldap: could not search LDAP server - Server is unavailable Oct 11 07:20:06 chicago rpc.statd[2237]: nss_ldap: failed to bind to LDAP server ldap://127.0.0.1/: Can't contact LDAP server Oct 11 07:20:06 chicago rpc.statd[2237]: nss_ldap: could not search LDAP server - Server is unavailable
rpc.statd's a different program, and as it happens, it's probably a different bug. In the case of the nfs services, the daemons are getting stuck resolving service entries for "rquotad" and "mountd", and removing "ldap" from the list of sources consulted for "services" in /etc/nsswitch.conf keeps it from happening. (That change has been made in authconfig-5.3.15-1 and newer.)
Nalin wrote: > Attempting to add every service user to that list (by name, because that's what the calling application passes to initgroups()) can't scale to the entire universe of possible packages. I don't think we can win with that as a long-term plan. You can't scale this to the entire universe of available packages, but you can certainly scale this to the set of packages that are part of a single distribution. That set is finite. You can even have a real test plan for making sure that the bug does not regress: * Install the distribution with everything * Enable all services * Set ldap auth to pull from an LDAP server that is not going to be there * Reboot Consider this code sketch to be in the public domain: #!/bin/sh # Test that LDAP authentication does not make the system hang when an LDAP server is unavailable # Use runlevel 4 for "run everything" SERVICES=`chkconfig --list | awk '/0:/{print $1}'` for x in $SERVICES do; chkconfig --level 4 $x on; done sed -i 's/id:3:initdefault:/id:4:initdefault:/' /etc/inittab authconfig --kickstart --enablecache --enableshadow --enablemd5 --enableldap --enableldapauth --enableldapssl --ldapserver bogus-ldap.example.com --ldapbasedn ou=fixmeplease,dc=redhat,dc=com /sbin/init 6
FWIW, nss_ldap has been forked and the new incarnation, nss-ldapd, promises to fix many of the issues found with nss_ldap. Please see http://ch.tudelft.nl/~arthur/nss-ldapd/ http://ch.tudelft.nl/~arthur/nss-ldapd/design.html
FWIW, I like the idea and design of nss-ldapd (*really* like the design, because it's the best way to solve the problems the author describes). But any removal of features (whether I personally care for them or not) means it's not going to fly as a direct replacement, at least not in the short-term.
Based on the date this bug was created, it appears to have been reported against rawhide during the development of a Fedora release that is no longer maintained. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained. If this bug remains in NEEDINFO thirty (30) days from now, we will automatically close it. If you can reproduce this bug in a maintained Fedora version (7, 8, or rawhide), please change this bug to the respective version and change the status to ASSIGNED. (If you're unable to change the bug's version or status, add a comment to the bug and someone will change it for you.) Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again.
This bug has been in NEEDINFO for more than 30 days since feedback was first requested. As a result we are closing it. If you can reproduce this bug in the future against a maintained Fedora version please feel free to reopen it against that version. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp
FWIW, I tried this with Fedora 9 and the problem seems to be fixed.
Hmm, well, actually I tried only with messagebus and other "basic", no rpc* services were started up. So what was initially reported seems now to be working but there are other scenarios mentioned, too, but I can't comment on them. Perhaps a new bug might be appropriate for them if the problems exist. Thanks.
On an RHEL 4.4 box with LDAP authentication enabled, ldap takes ages (~10 minutes) to start because the ldap init script calls nss_ldap, which keeps trying to connect to the to-be-started slapd. Perfectly circular, in other words. Setting "bind_policy soft" in /etc/ldap.conf solves this and all the other related problems.