Description of problem: Starting with systemd 9-2, the network service isn't started on boot. There is a long pause waiting to bring up lo that ends after very roughly a minute. I have nerwork enabled and NetworkManager disabled. It happens on two machines, one i686 the other x86_64. The i686 machine has also been tested with systemd 9-3 and it still has this issue. Version-Release number of selected component (if applicable): systemd-9-3.fc14.i686 initscripts-9.18-1.fc14.i686 How reproducible: Seems to happen every boot, but I haven't tried many times. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I just installed updates which included systemd-9-3.fc14.x86_64 and I see the same thing. I also have NetworkManager disabled and network enabled and when it gets to the end of the boot process where I normally get a KDM login prompt, it just sits like a lump for a long time then finally prints messages about abrtd and network failing to start (although I have abrtd disabled, so I have no idea why it is trying to start). I do eventually get a login prompt and then can see only the loopback device in ifcfg output. Doing "ifup br0" and "ifup eth0" gets the network functioning again. (Which enabled me to search bugzilla and find this entry and add to it :-).
I finally found the relevant entries in /var/log/messages that go with the above boot attempt in comment 1 Sep 4 14:37:16 zooty kernel: rc.sysinit used greatest stack depth: 3536 bytes left Sep 4 14:37:16 zooty kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team Sep 4 14:37:16 zooty kernel: init[1]: abrtd.service operation timed out. Terminating. Sep 4 14:37:16 zooty kernel: init[1]: network.service operation timed out. Terminating. Sep 4 14:37:16 zooty kernel: init[1]: Unit network.service entered failed state. Despite the timestamp shown in the log, there was a very long pause on the console between the ip6_tables message and the abrtd.service message.
Are you using NSS LDAP or something similar?
Created attachment 443112 [details] /var/log/messages I added the systemd.log_level=debug boot parameter and booted my system again. I'm attaching the /var/log/messages from that boot (which seems to have lots of info - I don't know how informative it is). Despite the time stamps being the same on all the lines, one the console there was a huge pause right after it printed the line about "dbus.socket changed listening -> running" After it finally finished booting and let me login, a "service network start" managed to work fine to get the network up. (Which is probably the source of the last set of messages about the network service in the log). P.S. I am certainly not using ldap or nis or anything like that - local password file only.
On a laptop where I use NetworkManager instead of network, I didn't have a problem after going to systemd 9-3. I am not using LDAP.
Could you please get me the contents of your nsswitch.conf from the PC where this happens?
Created attachment 443201 [details] nsswitch.conf Here's my nsswitch.conf. I haven't modified it manually at all, so it should be exactly the version created by the fedora install or any rpms in updates that might have modified it.
Created attachment 443302 [details] nsswitch.conf from one affected machine
I just noticed another thing that seemed to start when I got systemd 9-3. Every time I reboot the system, I get a message about recovering journal on the partition where I have f14 installed. It seems to be unable to shutdown cleanly.
Hmm, could you please get me the "dmesg" output from after the incomplete boot when booting with "systemd.log_level=debug" and "systemd.log_target=kmsg"?
I'll have a chance to do this in 2 - 3 hours.
Created attachment 443512 [details] dmesg with systemd debugging info [root@games1 bruno]# service network status Configured devices: lo eth0 Currently active devices: lo [root@games1 bruno]# service network start Starting network (via systemctl): [ OK ] It turns out I had a bit quicker way to test it than I originally thought, so I did it sooner rather than later. I have attached dmesg from the boot and my check that the network wasn't up until after starting the network service. This didn't happen on version 8, but does happen with both 9-2 and 9-3, though I haven't tested if the recent dbus update has an effect on this yet. [root@games1 bruno]# service network status Configured devices: lo eth0 Currently active devices: lo eth0 [root@games1 bruno]#
The comment was supposed to be inserted before the copied text, not in the middle.
Created attachment 445786 [details] dmesg from boot followed by reboot Here's my /var/log/dmesg file copied from my f14 partition after I booted with the debug settings, then rebooted back to f13, so this has messages from both starup and shutdown. Once again I hung in the console with the last message displayed being about dbus going into a listening state (but I don't see that message anywhere in the dmesg file).
Created attachment 445795 [details] dmesg with systemd 9-3 and initscripts 19.1-1 I tried using the updated initscripts on a second machine and I am still not seeing the network service successfully started at boot.
Some information gathering, for Tom and Bruno: - I'm assuming you have NM installed, but disabled How did you disable it? - I'm also assuming that if you uninstall NM it works fine?
So, AFAICT, what's happening here is: - you've enabled the network service, or run 'ifup lo' - it runs 'nmcli nm status' to see what NM's doing - this *tries to enable the NM service via dbus* - when done before dbus is running, weird stuff happens This can be easily reproduced post-boot with NM inactive; just run 'nmcli nm status', and NM will be activated. Whether either nmcli is changed to check NM's status differently, or how NM is dbus-activated is changed, checking the status of NM with nmcli obviously can't spontaneously start it. Moving to NetworkManager.
I used chkconfig to disable NetworkManager. The package is still installed. When I was running systemd 8, I didn't see this problem, but it is possible I had a NetworkManager update at the same time I went from 8 to 9-2.
I tested both downgrading NetworkManager and systemd and found that systemd 8-3 worked with the current NetworkManager and that the downgraded version of NetworkManager (8.1-5) didn't work with systemd 9-3. So even if this is a problem with NetworkManager, pushing the systemd update out before the NetworkManager fix is in place, is going to break things for some people.
I just tried this and if I yum -C erase NetworkManager rather than merely turning it off with chkconfig, there is no delay on boot and the network is started correctly. So the comments above definitely appear to explain the problem.
I have similar (maybe the same) problem. As soon as 3 services (network, NetworkManager, auditd) are switch on for automatic start: 1. chkconfig network on 2. chkconfig NetworkManager on 3. chkconfig auditd on I have big timeout on system boot...
*** Bug 632929 has been marked as a duplicate of this bug. ***
systemd-10-1.fc14,initscripts-9.20-1.fc14,sysvinit-2.87-5.dsf.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/systemd-10-1.fc14,initscripts-9.20-1.fc14,sysvinit-2.87-5.dsf.fc14
systemd-10-1.fc14, initscripts-9.20-1.fc14, sysvinit-2.87-5.dsf.fc14 has been pushed to the Fedora 14 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update systemd initscripts sysvinit'. You can provide feedback for this update here: https://admin.fedoraproject.org/updates/systemd-10-1.fc14,initscripts-9.20-1.fc14,sysvinit-2.87-5.dsf.fc14
Do you have "NM_CONTROLLED=yes" in your /etc/sysconfig/network-scripts/ifcfg-eth0? Does the problem go away if you change it to "no"?
I can reproduce this and I see a problem. The network service hangs inside the is_nm_running() function in initscripts' /etc/sysconfig/network-scripts/network-functions. It calls nmcli to detect if NM is running. nmcli hangs while waiting for an answer from D-Bus. After a timeout it is killed by systemd. Here's why the answer never arrives: The start of dbus.service is indirectly ordered after network.service: dbus.service After syslog.target After rsyslog.service After network.service Or another chain: dbus.service After mdmonitor.service After network.service Possibly via more chains. When the network service is starting, dbus.socket is listening and communication to it is buffered inside the kernel. But dbus.service cannot reply to it, because it cannot start yet. This problem does not happen with upstart, because there the dbus socket is not listening yet, so nmcli in is_nm_running() fails immediately.
(In reply to comment #25) > Do you have "NM_CONTROLLED=yes" in your > /etc/sysconfig/network-scripts/ifcfg-eth0? > Does the problem go away if you change it to "no"? jik2:~!1005$ grep NM_CONTROLLED /etc/sysconfig/network-scripts/ifcfg-eth0 NM_CONTROLLED=no jik2:~!1006$
[root@games1 livegames]# grep NM_CONTROLLED /etc/sysconfig/network-scripts/ifcfg-eth0 NM_CONTROLLED=no
Bruno, looking at your attached logs, they all seem incomplete to me. There are always several services, which, though added like this,: Added job network.service/start to transaction ... are never mentioned again in the log. And it's not only network.service. So I realized it must be that you are always attaching /var/log/dmesg, but this file is merely a snapshot of dmesg taken at the end of /etc/rc.sysinit. Please get the complete log with 'dmesg > dmesg.txt'.
I got that output by copying over /var/log/dmesg. Does the dmesg command actually produce more output? Getting a new one now that I am using systemd 10 is probably a good idea in any case. I have one in a little while.
(In reply to comment #26) > When the network service is starting, dbus.socket is listening and > communication to it is buffered inside the kernel. But dbus.service cannot > reply to it, because it cannot start yet. Right, we were discussing this yesterday. Arguably the dbus dependencies are incorrect.
Created attachment 447252 [details] Updated dmesg for systemd 10. (Using the dmesg command rather than /var/log/dmesg.)
Yup, it's a deadlock. Something (an attempted restart of qmail?) runs from the network service (an ifup hook?) that wants to talk to D-Bus, but dbus-daemon is started much later, after the deadlock is released by timing out network.service. init[1]: About to execute: /etc/rc.d/init.d/network start init[1]: Forked /etc/rc.d/init.d/network as 1470 init[1]: network.service changed dead -> start init[1]: Accepted connection on private bus. init[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent init[1]: ip6tables.service: cgroup is empty init[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local type=1400 audit(1284478504.444:5): avc: denied { write } for pid=1448 comm="qmail-send" path="pipe:[10334]" dev=pipefs ino=10334 scontext=system_u:system_r:qmail_send_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=fifo_file init[1]: Incoming traffic on dbus.socket init[1]: Trying to enqueue job dbus.service/start/replace ... init[1]: network.service operation timed out. Terminating. init[1]: network.service changed start -> final-sigterm init[1]: Received SIGCHLD init[1]: Got SIGCHLD for process 1470 (network) init[1]: Child 1470 died (code=killed, status=15/TERM) init[1]: Child 1470 belongs to network.service init[1]: network.service: control process exited, code=killed status=15 init[1]: network.service got final SIGCHLD for state final-sigterm init[1]: network.service changed final-sigterm -> failed init[1]: Job network.service/start finished, success=no init[1]: Unit network.service entered failed state. init[1]: Got SIGCHLD for process 1507 (ifup) init[1]: Child 1507 died (code=killed, status=15/TERM) init[1]: network.target changed dead -> active ... init[1]: Job mdmonitor.service/start finished, success=yes init[1]: About to execute: /bin/dbus-uuidgen --ensure init[1]: Forked /bin/dbus-uuidgen as 1594 init[1]: dbus.service changed dead -> start-pre ...
Created attachment 447261 [details] dmesg from a machine that isn't using daemontools Yes the first machine starts up svscanboot to support daemontools, during the boot process. However I see the network startup problem on machines that don't do that. I am attaching an example that shouldn't have that distraction.
Unfortunately this run had so many events that the beginning of the log is missing.
The many strange "After=" dependencies of dbus.service are due to systemd combining the configuration from /lib/systemd/system/dbus.service with the dependencies /etc/init.d/messagebus has as a SysV service.
Does it help if you remove /etc/init.d/messagebus ? (It works here. The file is only needed for booting with upstart and it only causes trouble for systemd.)
Lennart, have you made progress on this? It's not a beta blocker but we really should fix it as it's a regression from -8.
(In reply to comment #36) > The many strange "After=" dependencies of dbus.service are due to systemd > combining the configuration from /lib/systemd/system/dbus.service with the > dependencies /etc/init.d/messagebus has as a SysV service. Right, I believe it's doing so incorrectly, as it's converting the SysVStart priority into dependencies. That shouldn't happen for a native service that lists its dependencies explicitly.
initscripts-9.20-1.fc14, sysvinit-2.87-5.dsf.fc14, systemd-10-2.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report.
not fixed yet, reopening
*** Bug 634337 has been marked as a duplicate of this bug. ***
*** Bug 633285 has been marked as a duplicate of this bug. ***
systemd-10-3.fc15 is as broken as systemd-10-2.fc15. That means that it takes a very very long time to boot to a machine without networking. As systemd-sysvinit-10-3.fc15 obsolets upstart that means that a recovery is not that simple.
*** Bug 637101 has been marked as a duplicate of this bug. ***
*** Bug 637792 has been marked as a duplicate of this bug. ***
*** Bug 638105 has been marked as a duplicate of this bug. ***
I have similar problem in rawhide I usually got a machine without networking and without login prompt (in state 3) The workaround is to boot into state 1 and then cd /etc/rc.d/rc3.d for i in S*;do ./$i start;done It brings fully networking machine with one shell working. I'm using systemd 10-5.fc15
At least a subset of problems like this could be solved by upstream commit http://cgit.freedesktop.org/systemd/commit/?id=9f151f29fd37d9fdd3aad303cf0489482522918f
Moving systemd bugs to f15, since the systemd feature got delayed.
(In reply to comment #50) > Moving systemd bugs to f15, since the systemd feature got delayed. So we won't get it fixed in f14? It still happening here.
I'm seeing it on F14 and it is fixed by removing /etc/init.d/messagebus. Is there a way to make this script a nop for systemd? If so, then perhaps this should be reassigned to the dbus package?
(In reply to comment #52) > I'm seeing it on F14 and it is fixed by removing /etc/init.d/messagebus. AFAICT the problem is gone for the current rawhide versions of systemd and with /etc/init.d/messagebus still in place. > Is there a way to make this script a nop for systemd? Your /sbin/init is likely a symlink so you can read its value in /etc/init.d/messagebus and act accordingly; or you can check /proc/cmdline if you are switching in a boot command. I have some doubts if it is really advisable to do that.
I am pretty sure this problem doesn't exist on F15. Closing. If you manage to reproduce this, please reopen.
It's already fixed in systemd-10-2.fc14.1.x86_64 (at least here).