Bug 630225 - Service network isn't started on boot
Summary: Service network isn't started on boot
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: rawhide
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Lennart Poettering
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 632929 633285 634337 637101 637792 638105 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-09-04 06:03 UTC by Bruno Wolff III
Modified: 2011-02-17 08:32 UTC (History)
24 users (show)

Fixed In Version: initscripts-9.20-1.fc14
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-16 23:05:58 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
/var/log/messages (122.27 KB, text/plain)
2010-09-04 23:21 UTC, Tom Horsley
no flags Details
nsswitch.conf (1.65 KB, text/plain)
2010-09-06 00:10 UTC, Tom Horsley
no flags Details
nsswitch.conf from one affected machine (1.69 KB, text/plain)
2010-09-06 13:49 UTC, Bruno Wolff III
no flags Details
dmesg with systemd debugging info (64.49 KB, text/plain)
2010-09-07 14:04 UTC, Bruno Wolff III
no flags Details
dmesg from boot followed by reboot (79.15 KB, text/plain)
2010-09-07 21:02 UTC, Tom Horsley
no flags Details
dmesg with systemd 9-3 and initscripts 19.1-1 (93.96 KB, text/plain)
2010-09-07 22:27 UTC, Bruno Wolff III
no flags Details
Updated dmesg for systemd 10. (Using the dmesg command rather than /var/log/dmesg.) (121.88 KB, text/plain)
2010-09-14 15:42 UTC, Bruno Wolff III
no flags Details
dmesg from a machine that isn't using daemontools (122.43 KB, text/plain)
2010-09-14 16:24 UTC, Bruno Wolff III
no flags Details

Description Bruno Wolff III 2010-09-04 06:03:39 UTC
Description of problem:
Starting with systemd 9-2, the network service isn't started on boot. There is a long pause waiting to bring up lo that ends after very roughly a minute.
I have nerwork enabled and NetworkManager disabled.
It happens on two machines, one i686 the other x86_64. The i686 machine has also been tested with systemd 9-3 and it still has this issue.

Version-Release number of selected component (if applicable):
systemd-9-3.fc14.i686
initscripts-9.18-1.fc14.i686


How reproducible:
Seems to happen every boot, but I haven't tried many times.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Tom Horsley 2010-09-04 18:48:58 UTC
I just installed updates which included systemd-9-3.fc14.x86_64
and I see the same thing. I also have NetworkManager disabled and network
enabled and when it gets to the end of the boot process where I normally
get a KDM login prompt, it just sits like a lump for a long time then finally
prints messages about abrtd and network failing to start (although
I have abrtd disabled, so I have no idea why it is trying to start).

I do eventually get a login prompt and then can see only the loopback
device in ifcfg output. Doing "ifup br0" and "ifup eth0" gets the network
functioning again. (Which enabled me to search bugzilla and find this
entry and add to it :-).

Comment 2 Tom Horsley 2010-09-04 18:57:47 UTC
I finally found the relevant entries in /var/log/messages that go with the
above boot attempt in comment 1

Sep  4 14:37:16 zooty kernel: rc.sysinit used greatest stack depth: 3536 bytes left
Sep  4 14:37:16 zooty kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
Sep  4 14:37:16 zooty kernel: init[1]: abrtd.service operation timed out. Terminating.
Sep  4 14:37:16 zooty kernel: init[1]: network.service operation timed out. Terminating.
Sep  4 14:37:16 zooty kernel: init[1]: Unit network.service entered failed state.

Despite the timestamp shown in the log, there was a very long pause
on the console between the ip6_tables message and the abrtd.service message.

Comment 3 Lennart Poettering 2010-09-04 20:55:29 UTC
Are you using NSS LDAP or something similar?

Comment 4 Tom Horsley 2010-09-04 23:21:24 UTC
Created attachment 443112 [details]
/var/log/messages

I added the systemd.log_level=debug boot parameter and booted my system again.
I'm attaching the /var/log/messages from that boot (which seems to have lots
of info - I don't know how informative it is).

Despite the time stamps being the same on all the lines, one the console
there was a huge pause right after it printed the line about
"dbus.socket changed listening -> running"

After it finally finished booting and let me login, a "service network start"
managed to work fine to get the network up. (Which is probably the source of
the last set of messages about the network service in the log).

P.S. I am certainly not using ldap or nis or anything like that - local
password file only.

Comment 5 Bruno Wolff III 2010-09-05 16:23:47 UTC
On a laptop where I use NetworkManager instead of network, I didn't have a problem after going to systemd 9-3.
I am not using LDAP.

Comment 6 Lennart Poettering 2010-09-05 23:50:13 UTC
Could you please get me the contents of your nsswitch.conf from the PC where this happens?

Comment 7 Tom Horsley 2010-09-06 00:10:12 UTC
Created attachment 443201 [details]
nsswitch.conf

Here's my nsswitch.conf. I haven't modified it manually at all, so it should
be exactly the version created by the fedora install or any rpms in updates
that might have modified it.

Comment 8 Bruno Wolff III 2010-09-06 13:49:30 UTC
Created attachment 443302 [details]
nsswitch.conf from one affected machine

Comment 9 Tom Horsley 2010-09-06 22:21:12 UTC
I just noticed another thing that seemed to start when I got systemd 9-3.
Every time I reboot the system, I get a message about recovering journal
on the partition where I have f14 installed. It seems to be unable to
shutdown cleanly.

Comment 10 Lennart Poettering 2010-09-07 10:43:08 UTC
Hmm, could you please get me the "dmesg" output from after the incomplete boot when booting with "systemd.log_level=debug" and "systemd.log_target=kmsg"?

Comment 11 Bruno Wolff III 2010-09-07 13:26:37 UTC
I'll have a chance to do this in 2 - 3 hours.

Comment 12 Bruno Wolff III 2010-09-07 14:04:04 UTC
Created attachment 443512 [details]
dmesg with systemd debugging info

[root@games1 bruno]# service network status
Configured devices:
lo eth0
Currently active devices:
lo
[root@games1 bruno]# service network start
Starting network (via systemctl):                          [  OK  ]
It turns out I had a bit quicker way to test it than I originally thought, so I did it sooner rather than later. I have attached dmesg from the boot and my check that the network wasn't up until after starting the network service.
This didn't happen on version 8, but does happen with both 9-2 and 9-3, though I haven't tested if the recent dbus update has an effect on this yet.

[root@games1 bruno]# service network status
Configured devices:
lo eth0
Currently active devices:
lo eth0
[root@games1 bruno]#

Comment 13 Bruno Wolff III 2010-09-07 14:07:19 UTC
The comment was supposed to be inserted before the copied text, not in the middle.

Comment 14 Tom Horsley 2010-09-07 21:02:38 UTC
Created attachment 445786 [details]
dmesg from boot followed by reboot

Here's my /var/log/dmesg file copied from my f14 partition after I booted
with the debug settings, then rebooted back to f13, so this has messages
from both starup and shutdown.

Once again I hung in the console with the last message displayed being
about dbus going into a listening state (but I don't see that message
anywhere in the dmesg file).

Comment 15 Bruno Wolff III 2010-09-07 22:27:09 UTC
Created attachment 445795 [details]
dmesg with systemd 9-3 and initscripts 19.1-1

I tried using the updated initscripts on a second machine and I am still not seeing the network service successfully started at boot.

Comment 16 Bill Nottingham 2010-09-08 01:33:07 UTC
Some information gathering, for Tom and Bruno:
- I'm assuming you have NM installed, but disabled

How did you disable it?

- I'm also assuming that if you uninstall NM it works fine?

Comment 17 Bill Nottingham 2010-09-08 01:47:42 UTC
So, AFAICT, what's happening here is:
- you've enabled the network service, or run 'ifup lo'
- it runs 'nmcli nm status' to see what NM's doing
- this *tries to enable the NM service via dbus*
- when done before dbus is running, weird stuff happens

This can be easily reproduced post-boot with NM inactive; just run 'nmcli nm status', and NM will be activated.

Whether either nmcli is changed to check NM's status differently, or how NM is dbus-activated is changed, checking the status of NM with nmcli obviously can't spontaneously start it. Moving to NetworkManager.

Comment 18 Bruno Wolff III 2010-09-08 02:01:58 UTC
I used chkconfig to disable NetworkManager. The package is still installed.

When I was running systemd 8, I didn't see this problem, but it is possible I had a NetworkManager update at the same time I went from 8 to 9-2.

Comment 19 Bruno Wolff III 2010-09-08 17:59:50 UTC
I tested both downgrading NetworkManager and systemd and found that systemd 8-3 worked with the current NetworkManager and that the downgraded version of NetworkManager (8.1-5) didn't work with systemd 9-3.
So even if this is a problem with NetworkManager, pushing the systemd update out before the NetworkManager fix is in place, is going to break things for some people.

Comment 20 Tom Horsley 2010-09-08 23:51:27 UTC
I just tried this and if I yum -C erase NetworkManager rather than
merely turning it off with chkconfig, there is no delay on boot and
the network is started correctly. So the comments above definitely
appear to explain the problem.

Comment 21 Sergei LITVINENKO 2010-09-11 10:45:17 UTC
I have similar (maybe the same) problem.

As soon as 3 services (network, NetworkManager, auditd) are switch on for automatic start:
1. chkconfig network on
2. chkconfig NetworkManager on
3. chkconfig auditd on

I have big timeout on system boot...

Comment 22 Lennart Poettering 2010-09-12 18:26:38 UTC
*** Bug 632929 has been marked as a duplicate of this bug. ***

Comment 23 Fedora Update System 2010-09-14 01:04:20 UTC
systemd-10-1.fc14,initscripts-9.20-1.fc14,sysvinit-2.87-5.dsf.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/systemd-10-1.fc14,initscripts-9.20-1.fc14,sysvinit-2.87-5.dsf.fc14

Comment 24 Fedora Update System 2010-09-14 04:28:07 UTC
systemd-10-1.fc14, initscripts-9.20-1.fc14, sysvinit-2.87-5.dsf.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update systemd initscripts sysvinit'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/systemd-10-1.fc14,initscripts-9.20-1.fc14,sysvinit-2.87-5.dsf.fc14

Comment 25 Michal Schmidt 2010-09-14 08:23:20 UTC
Do you have "NM_CONTROLLED=yes" in your /etc/sysconfig/network-scripts/ifcfg-eth0?
Does the problem go away if you change it to "no"?

Comment 26 Michal Schmidt 2010-09-14 10:21:18 UTC
I can reproduce this and I see a problem.

The network service hangs inside the is_nm_running() function in initscripts' /etc/sysconfig/network-scripts/network-functions. It calls nmcli to detect if NM is running. nmcli hangs while waiting for an answer from D-Bus. After a timeout it is killed by systemd. Here's why the answer never arrives:

The start of dbus.service is indirectly ordered after network.service:

dbus.service After syslog.target After rsyslog.service After network.service
Or another chain:
dbus.service After mdmonitor.service After network.service
Possibly via more chains.

When the network service is starting, dbus.socket is listening and communication to it is buffered inside the kernel. But dbus.service cannot reply to it, because it cannot start yet.

This problem does not happen with upstart, because there the dbus socket is not listening yet, so nmcli in is_nm_running() fails immediately.

Comment 27 Jonathan Kamens 2010-09-14 10:39:01 UTC
(In reply to comment #25)
> Do you have "NM_CONTROLLED=yes" in your
> /etc/sysconfig/network-scripts/ifcfg-eth0?
> Does the problem go away if you change it to "no"?

jik2:~!1005$ grep NM_CONTROLLED /etc/sysconfig/network-scripts/ifcfg-eth0
NM_CONTROLLED=no
jik2:~!1006$

Comment 28 Bruno Wolff III 2010-09-14 12:11:35 UTC
[root@games1 livegames]# grep NM_CONTROLLED /etc/sysconfig/network-scripts/ifcfg-eth0
NM_CONTROLLED=no

Comment 29 Michal Schmidt 2010-09-14 14:37:36 UTC
Bruno,
looking at your attached logs, they all seem incomplete to me. There are always several services, which, though added like this,:
  Added job network.service/start to transaction
... are never mentioned again in the log. And it's not only network.service.
So I realized it must be that you are always attaching /var/log/dmesg, but this file is merely a snapshot of dmesg taken at the end of /etc/rc.sysinit. Please get the complete log with 'dmesg > dmesg.txt'.

Comment 30 Bruno Wolff III 2010-09-14 14:49:11 UTC
I got that output by copying over /var/log/dmesg. Does the dmesg command actually produce more output?
Getting a new one now that I am using systemd 10 is probably a good idea in any case. I have one in a little while.

Comment 31 Bill Nottingham 2010-09-14 15:32:37 UTC
(In reply to comment #26)
> When the network service is starting, dbus.socket is listening and
> communication to it is buffered inside the kernel. But dbus.service cannot
> reply to it, because it cannot start yet.

Right, we were discussing this yesterday. Arguably the dbus dependencies are incorrect.

Comment 32 Bruno Wolff III 2010-09-14 15:42:55 UTC
Created attachment 447252 [details]
Updated dmesg for systemd 10. (Using the dmesg command rather than /var/log/dmesg.)

Comment 33 Michal Schmidt 2010-09-14 16:01:12 UTC
Yup, it's a deadlock. Something (an attempted restart of qmail?) runs from the network service (an ifup hook?) that wants to talk to D-Bus, but dbus-daemon is started much later, after the deadlock is released by timing out network.service.

init[1]: About to execute: /etc/rc.d/init.d/network start
init[1]: Forked /etc/rc.d/init.d/network as 1470
init[1]: network.service changed dead -> start
init[1]: Accepted connection on private bus.
init[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent
init[1]: ip6tables.service: cgroup is empty
init[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
type=1400 audit(1284478504.444:5): avc:  denied  { write } for  pid=1448 comm="qmail-send" path="pipe:[10334]" dev=pipefs ino=10334 scontext=system_u:system_r:qmail_send_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=fifo_file
init[1]: Incoming traffic on dbus.socket
init[1]: Trying to enqueue job dbus.service/start/replace
...
init[1]: network.service operation timed out. Terminating.
init[1]: network.service changed start -> final-sigterm
init[1]: Received SIGCHLD
init[1]: Got SIGCHLD for process 1470 (network)
init[1]: Child 1470 died (code=killed, status=15/TERM)
init[1]: Child 1470 belongs to network.service
init[1]: network.service: control process exited, code=killed status=15
init[1]: network.service got final SIGCHLD for state final-sigterm
init[1]: network.service changed final-sigterm -> failed
init[1]: Job network.service/start finished, success=no
init[1]: Unit network.service entered failed state.
init[1]: Got SIGCHLD for process 1507 (ifup)
init[1]: Child 1507 died (code=killed, status=15/TERM)
init[1]: network.target changed dead -> active
...
init[1]: Job mdmonitor.service/start finished, success=yes
init[1]: About to execute: /bin/dbus-uuidgen --ensure
init[1]: Forked /bin/dbus-uuidgen as 1594
init[1]: dbus.service changed dead -> start-pre
...

Comment 34 Bruno Wolff III 2010-09-14 16:24:42 UTC
Created attachment 447261 [details]
dmesg from a machine that isn't using daemontools

Yes the first machine starts up svscanboot to support daemontools, during the boot process. However I see the network startup problem on machines that don't do that. I am attaching an example that shouldn't have that distraction.

Comment 35 Michal Schmidt 2010-09-14 16:41:11 UTC
Unfortunately this run had so many events that the beginning of the log is missing.

Comment 36 Michal Schmidt 2010-09-14 16:45:33 UTC
The many strange "After=" dependencies of dbus.service are due to systemd combining the configuration from /lib/systemd/system/dbus.service with the dependencies /etc/init.d/messagebus has as a SysV service.

Comment 37 Michal Schmidt 2010-09-14 20:25:28 UTC
Does it help if you remove /etc/init.d/messagebus ?
(It works here. The file is only needed for booting with upstart and it only causes trouble for systemd.)

Comment 38 Adam Williamson 2010-09-14 20:35:53 UTC
Lennart, have you made progress on this? It's not a beta blocker but we really should fix it as it's a regression from -8.

Comment 39 Bill Nottingham 2010-09-14 21:05:01 UTC
(In reply to comment #36)
> The many strange "After=" dependencies of dbus.service are due to systemd
> combining the configuration from /lib/systemd/system/dbus.service with the
> dependencies /etc/init.d/messagebus has as a SysV service.

Right, I believe it's doing so incorrectly, as it's converting the SysVStart priority into dependencies. That shouldn't happen for a native service that lists its dependencies explicitly.

Comment 40 Fedora Update System 2010-09-15 07:11:43 UTC
initscripts-9.20-1.fc14, sysvinit-2.87-5.dsf.fc14, systemd-10-2.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 41 Michal Schmidt 2010-09-15 07:20:33 UTC
not fixed yet, reopening

Comment 42 Bill Nottingham 2010-09-15 20:13:35 UTC
*** Bug 634337 has been marked as a duplicate of this bug. ***

Comment 43 Michal Schmidt 2010-09-15 20:49:46 UTC
*** Bug 633285 has been marked as a duplicate of this bug. ***

Comment 44 Michal Jaegermann 2010-09-19 00:36:15 UTC
systemd-10-3.fc15 is as broken as systemd-10-2.fc15.  That means that it takes a very very long time to boot to a machine without networking. As systemd-sysvinit-10-3.fc15 obsolets upstart that means that a recovery is not that simple.

Comment 45 Bill Nottingham 2010-09-24 16:26:47 UTC
*** Bug 637101 has been marked as a duplicate of this bug. ***

Comment 46 Bill Nottingham 2010-09-27 17:02:00 UTC
*** Bug 637792 has been marked as a duplicate of this bug. ***

Comment 47 Yanko Kaneti 2010-09-28 09:37:05 UTC
*** Bug 638105 has been marked as a duplicate of this bug. ***

Comment 48 Jan F. Chadima 2010-09-29 06:17:27 UTC
I have similar problem in rawhide I usually got a machine without networking and without login prompt (in state 3)
The workaround is to boot into state 1 and then
cd /etc/rc.d/rc3.d
for i in S*;do ./$i start;done
It brings fully networking machine with one shell working.
I'm using systemd 10-5.fc15

Comment 49 Michal Schmidt 2010-09-29 09:59:01 UTC
At least a subset of problems like this could be solved by upstream commit http://cgit.freedesktop.org/systemd/commit/?id=9f151f29fd37d9fdd3aad303cf0489482522918f

Comment 50 Matthias Clasen 2010-10-08 22:46:33 UTC
Moving systemd bugs to f15, since the systemd feature got delayed.

Comment 51 Tomasz Torcz 2010-10-31 16:15:31 UTC
(In reply to comment #50)
> Moving systemd bugs to f15, since the systemd feature got delayed.

So we won't get it fixed in f14? It still happening here.

Comment 52 Paolo Bonzini 2010-11-19 12:13:31 UTC
I'm seeing it on F14 and it is fixed by removing /etc/init.d/messagebus.

Is there a way to make this script a nop for systemd?  If so, then perhaps this should be reassigned to the dbus package?

Comment 53 Michal Jaegermann 2010-11-19 21:32:11 UTC
(In reply to comment #52)
> I'm seeing it on F14 and it is fixed by removing /etc/init.d/messagebus.

AFAICT the problem is gone for the current rawhide versions of systemd and with /etc/init.d/messagebus still in place.

> Is there a way to make this script a nop for systemd?

Your /sbin/init is likely a symlink so you can read its value in /etc/init.d/messagebus and act accordingly; or you can check /proc/cmdline
if you are switching in a boot command.  I have some doubts if it is really advisable to do that.

Comment 54 Lennart Poettering 2011-02-16 23:05:58 UTC
I am pretty sure this problem doesn't exist on F15. Closing. If you manage to reproduce this, please reopen.

Comment 55 Paolo Bonzini 2011-02-17 08:32:11 UTC
It's already fixed in systemd-10-2.fc14.1.x86_64 (at least here).


Note You need to log in before you can comment on or make changes to this bug.