Description of problem: mesagebus service hangs on boot on system with ldap auth configured. Version-Release number of selected component (if applicable): dbus-0.60-7.2 kernel-2.6.15-1.1969_FC5 How reproducible: always Steps to Reproduce: 1.boot 2. 3. Actual results: hang until bored Expected results: fedora niceness Additional info: workaround is to remove the entry for ldap from the group line in /etc/nsswitch.conf not an acceptible long term solution.
A bug where d-bus is put in an infinite loop because of missing group information might be what you are seeing. I am doing new upstream release and will package it up for Fedora.
I was just informed by our ldap guru here that the ldap module does not like threaded apps. D-Bus uses threads for listening to SELinux avc denial messages over netlink. You can work around the issue by rebuilding the package with SELinux disabled if it is important. Fixes to the ldap module are being looked at and I will be looking at moving the SELinux code to use the mainloop instead of a thread but I am not sure how long it will take to get these issues resolved.
This is an nss_ldap bug, which should be fixed in 248-3. Please reopen this bug if you find that this is not the case.
*** Bug 181305 has been marked as a duplicate of this bug. ***
This bug still remains for me when using: udev-084-13 nss_ldap-249-1 The udev service hangs unless my LDAP server is running. Alastair, do you still have this problem?
W., does your /etc/ldap.conf include a "nss_initgroups_ignoreusers root,ldap" setting? Without it, you'd at least hit long delays as nss_ldap timed out attempting to contact a directory server. Until 248-3, that would have deadlocked apps which linked against libpthread (which includes the D-BUS daemon).
No I do not have the problem any more. For me this has not been an issue for the last couple of weeks. I have the same udev and nss_ldap as Michael and I'm running 2.6.15-1.2054_FC5.
This still occurs with FC5. In particular, I have a machine that was originally FC4 with current patches. It uses LDAP for users/groups/automount and is itself the (only) ldap server. After performing the upgrade from FC4 to FC5 via the FC5 DVD, it hung on the "starting system message bus" line. It responded to ctrl-alt-delete, so I rebooted in single user mode and examined /var/log/messages, which gave a bunch of entries, thus: nss_ldap: failed to bind to LDAP server ldap://127.0.0.1: Can't contact LDAP server This caused me to look, of course, at LDAP, whereupon I merged in the *.rpmnew changes to /etc/ldap.conf, including the nss_initgroups_ignoreusers line, to no effect (on reboot). After locating this bug item, I've commented out 'ldap' from the 'group' line in nsswitch.conf, which has allowed me to bring the system up higher than single user mode. Obviously this is not an acceptable workaround for production, though. I'm currently merging in all other *.rpmnew files that were created during the FC4->FC5 upgrade, but the list doesn't contain any other suspect candidates. kernel-2.6.15-1.2054_FC5 udev-084-13 nss_ldap-249-1
Does adding "dbus" to the nss_initgroups_ignoreusers list solve this? I suspect that it will, because it's reasonable for the message bus daemon to set up its supplemental groups list before dropping privileges to run as that user. If it does, this is going to need a better long-term solution (one where we don't have to eventually add all system users to this line, which would suck).
No, adding dbus to nss_initgroups_ignoreusers had no effect. I did some splunking with strace, followed by code inspection of libnss_ldap. It turns out that the information referenced by nss_initgroups_ignoreusers is only used _after_ the library attempts to connect to the ldap server. As a temporary work-around, I found that setting 'bind_policy soft' in /etc/ldap.conf was sufficient to get the machine fully running when having 'group: files ldap' in nsswitch.conf. However I'd prefer to not be using 'soft' after the machine is running, so another solution is preferred. I would contend that with a 'files ldap' ordering in ldap.conf and a match on nss_initgroups_ignoreusers, the ldap connection should not occur.
Created attachment 126602 [details] strace output of dbus-daemon strace output of dbus-daemon obtained by hacking /etc/init.d/messagebus. Notice the connect(2) calls to 127.0.0.1.
another way of resolving temporary of permanently the problem without affecting the ways your box is configured is to ensure that the ldap server is running when messagebus launch. in /etc/rc3.d/ and /etc/rc5.d we can see : S22messagebus S27ldap putting messagebus as S28messagebus solves the problem.
D-BUS needs to start early in the process as future components may rely on the system bus. This is not a fix we can use as default in the distro unless of course ldap can start earlier. BTW we do not key off the dbus user but off of special user 81 which may change names in the future but will always be uid 81.
ok, so let messagebus be S22, and try S21 for ldap betweend S22 and S27, there is only bluetooth, netfs, and hidd (that is depending on bluetooth). so nothing seems to prevent ldap from being launched just before messagebus.
I guess as long as nobody decides to add messagebus support into slapd ... FWIW I also noticed that rpc.statd (which gets started before messagebus) also reports problems in /var/log/messages, however _it_ is able to not hang but rather retry later. So how is rpc.statd using libnss_ldap differently from dbus-daemon such that the former doesn't hang but the latter does? (rpc.statd is started via S14nfslock)
Starting ldap before messagebus is ihmo the right choice for now. Even after fixing the other upgrade issues for ldap (like path changes) my server still won't start on its own if I have ldap auth enabled during boot. Lots of things just hang. Would be nice if we could find a solution quick and push an upgrade so that other users who upgrade/install don't run against this wall.
REOPENED status has been deprecated. ASSIGNED with keyword of Reopened is preferred.
Fedora Core 5 and Fedora Core 6 are, as we're sure you've noticed, no longer test releases. We're cleaning up the bug database and making sure important bug reports filed against these test releases don't get lost. It would be helpful if you could test this issue with a released version of Fedora or with the latest development / test release. Thanks for your help and for your patience. [This is a bulk message for all open FC5/FC6 test release bugs. I'm adding myself to the CC list for each bug, so I'll see any comments you make after this and do my best to make sure every issue gets proper attention.]
Why was this marked as closed without an explanation? Does that mean it's fixed in FC7? What action was taken? I don't have a virgin FC6 handy to look at, but it seems to me that neither the nsswitch.conf workaround nor the change in the messagebus vs slapd startup sequence has been changed.
Not sure. The original bug reporter closed the issue. (Presumably because it works for him, as per #7.) I'm going to reopen, put to "devel", and someone familiar with this particular issue can then decide where to go from there.
Based on the date this bug was created, it appears to have been reported against rawhide during the development of a Fedora release that is no longer maintained. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained. If this bug remains in NEEDINFO thirty (30) days from now, we will automatically close it. If you can reproduce this bug in a maintained Fedora version (7, 8, or rawhide), please change this bug to the respective version and change the status to ASSIGNED. (If you're unable to change the bug's version or status, add a comment to the bug and someone will change it for you.) Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again.
I'm seeing this in rawhide. I just ran into it today. Running `service messagebus start` hangs on boot. By booting into single user mode, I found that I can get messagebus to start if and only if I replaced: group: files ldap with group: files When I ran `strace -f service messagebus start`, I noticed that it would do a bunch of stuff including network activity, call nanosleep and wait a while, do the same network stuff, call nanosleep and wait, and so on. By the way, I did not see this problem in Fedora 8, even though I had the exact same configuration.
By the way, it looks like this bug and 221199 are the same bug. However, root and nscd both already appear in nss_initgroups_ignoreusers in /etc/ldap.conf, so that didn't really help.
It also looks like 186527 might be the same.
Nalin, is comment #24 correct?
(In reply to comment #25) > Nalin, is comment #24 correct? It sure looks that way. The distinction between LDAP and LDAPS turned out to not make a difference in #186527, so I'm left concluding it's getting stuck enumerating the supplemental groups for a user listed in one of the files it's reading at startup, just as it does here.
(In reply to comment #26) > (In reply to comment #25) > > Nalin, is comment #24 correct? > > It sure looks that way. The distinction between LDAP and LDAPS turned out to > not make a difference in #186527, so I'm left concluding it's getting stuck > enumerating the supplemental groups for a user listed in one of the files it's > reading at startup, just as it does here. Nalin, close one as a duplicate or keep them both open?
Changing ot assigned for now until a conclusion/duplicate is figured out. This will let it not get closed by the bug zapper program.
I must be doing something wrong because for me boot does not hang. I configured nsswitch.conf and pam.d/system-auth to use LDAP and in /etc/ldap.conf I use localhost as my server. When rebooting either with or without ldap service, boot always proceeds normally. How do you reproduce this on pristine Fedora 9?
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This also occurs on F10 with all updates as of yesterday (20081217). I have the following in /etc/ldap.conf: nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon,dbus I am using nss_ldap to retrieve user information from an Active Directory domain with an ldaps:// URI and ssl set to yes in /etc/ldap.conf This problem (at least for me) is network-related as the network hasn't come up yet because it is controller in F10 with NetworkManager and NetworkManager depends on messagebus being started. I chkconfig'd network on since it was off and it's start priority is 10 and it will start before messagebus and this change resolves the hanging of messagebus at boot-time for me.
Have had this problem with multiple Fedora releases, including 10. A reasonable fix that has worked for Me so far it to modify /etc/ldap.conf add `bind_policy soft' to the end of the file. after adding that line, the system no longer goes into long delay loops while booting, and I am still able to use all functionality. My question, to add to this bug, is, Why does this relatively minor policy change seem to fix it for Me??
Seems to exist here on every Fedora 10 system with LDAP too!
This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
The version for this bug needs to be updated to 11.
Present in rawhide. Found the source of the other user lookups by dbus-daemon - the "user=" lines in all of the /etc/dbus-1/system.d/*.conf files. In my case I needed to add nm-openconnect, rtkit, and pulse to nss_initgroups_ignoreusers. But this is clearly not a maintainable solution. Perhaps ignoring (optionally but by default) uids under 500 would be at least a little better.
(In reply to comment #32) > A reasonable fix that has worked for Me so far it to modify /etc/ldap.conf add > `bind_policy soft' to the end of the file. > > after adding that line, the system no longer goes into long delay loops while > booting, and I am still able to use all functionality. > > My question, to add to this bug, is, Why does this relatively minor policy > change seem to fix it for Me?? It's not a minor change: # Reconnect policy: hard (default) will retry connecting to # the software with exponential backoff, soft will fail # immediately. To some extent it comes down to whether LDAP is critical or not.
Since 2006 there's been a lot of talk regarding startup sequences and config changes. In comment 10 I described the fact that (based on code inspection at the time) that it appeared that there was an inverted program flow with respect to referencing ignoreusers and connecting to ldap. There was no indication in this report that this aspect was ever considered (even to the point of someone saying, "no, you read it wrong"). I admit that I've not looked at the source since then, so I don't know if it still exists, but perhaps one of the current developers would be interested in examining that possibility?
Comments #12 through #16 suggest changing the startup order, changing from S27ldap to S21ldap to put it before S22messagebus. I haven't seen any arguments against this. It seems like this suggestion and Comment #36 would both be better than what we're having now. This bug has been open for 3 and a half years without any visible progress. It's frustrating that nothing has happened yet. By the way, bug #186527, which seems to be about the same issue, is still open, too.
As I mentioned in the other report referenced, bug #186527, I think SSSD might well fix this when enabled. It would sound as a better approach to provide a mechanism to allow OS to work properly even if external databases are offline rather than making exceptions for a numerous set of system level users and services. https://fedorahosted.org/sssd/ https://fedoraproject.org/wiki/Features/SSSD I'll try this after F12 Beta is out, probably best way to see whether these speculations are true and has there been any progress on other related fronts. Thanks.
Unfortunately, the SSSD feature page seems to say that SSSD won't be a default feature (i.e., you have to manually install and configure it). If that's true, then SSSD will be a nice workaround, but it won't really fix the bug, since anyone without SSD installed will still have this problem.
I've installed sssd and it appears to have no effect on this startup issue, no do I see any evidence as to why it would.
> I've installed sssd and it appears to have no effect on this startup issue, no > do I see any evidence as to why it would. If you use "files sss" in /etc/nsswitch.conf and "pam_sss.so" in /etc/pam.d/system-auth (and no ldap / pam_ldap.so at all) then all userinfo/authentication attempts go through SSSD which should be able to handle offline situations. You probably need a recent version like 0.6 or so.
I seem to have stumbled on this one after sucessfully runnig F10 with LDAP for over a year. I run KDE4 (this is relivant, bear with me) I installed NetworkManager-pptp in an atempt to get a PPTP VPN connection working. As opposed to KNetworkManager wich failed to work for the VPN. I'm fairly sure i Installd NetworkManager itself too, though it could have silently been sitting there already. After sucessfully doing so the following day i rebooted I had stumbled on this issue. Long delays of message bus, /var/log/messages screaming about ldap issues. There seems to be sugnificance where NetworkManager takes over the normal Network service. I was surprised to find /etc/sysconfig/networking-scripts/ifcfg-eth0 set to ONBOOT=no. Anyway I by no means pretent to be an expert on Fedoras network stack or management. Just thought the fact that I had managed to induce the problem after such a long previously stable time might help you guys figure it out. I can provide any futher details you think would help.
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle. Changing version to '12'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
The root cause apparently has not been investigated yet. Reading the source code of dbus-daemon has revealed the following: dbus-daemon reads all the groups of the user root when it parses the user="root" attributes in the configuration file. This triggers many ldap lookups, that trigger the exponential back off of the bind_policy hard setting in /etc/ldap.conf. So parsing the config file takes long, and dbus-daemon forks only after parsing the config. At that point, the boot continues. The point is that dbus-daemon has a logical error in it. It is not necessary to read the list of groups of a user ever. Such a list is dynamic, it changes when naming services become available, or when the ldap contents are changed. So dbus-daemon should rather check group memberships when it needs to, i.e. when it has to authorize a request. This could be done much more efficiently using the getgrent family of calls instead of the getgrouplist call dbus-daemon is currently using. So I propose that the upstream providers of dbus-daemon are contacted to get dbus-daemon fixed. Possible fixes; 1. quick and dirty: add an option to stop dbus-daemon from expanding group lists. 2. fix the logical error, don't use getgrouplist, check group membership late and rely on nscd's caching mechanism for performance.
Andreas - Sounds like a great analysis. Upstream is here: http://www.freedesktop.org/wiki/Software/dbus Would you be willing to file a bug there?
I also met this issue, and I report it in upstream. https://bugs.freedesktop.org/show_bug.cgi?id=28355
Sorry for not following up more quickly. I've filed the analysis above as additional information to fredesktop.org bug 28355 created by Bin Li.
This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.