744523 – no network after boot

Bug 744523 - no network after boot

Summary: no network after boot

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	sysklogd
Sub Component:
Version:	15
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Jeroen van Meeuwen
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-10-09 08:01 UTC by udo
Modified:	2012-08-07 16:42 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-08-07 16:42:36 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dmesg log (203.85 KB, text/plain) 2011-10-10 14:14 UTC, udo	no flags	Details
3.0.9 dmesg with debug (206.35 KB, text/plain) 2011-11-14 15:11 UTC, udo	no flags	Details
View All

Description udo 2011-10-09 08:01:05 UTC

Description of problem:
No network after boot

Version-Release number of selected component (if applicable):
systemd-26-10.fc15.x86_64

How reproducible:
boot into 3.0.4 (?)

Steps to Reproduce:
1. reboot
2. wait for reboot
3. `ifconfig -a` after boot and login
  
Actual results:
see below

Expected results:
network OK

Additional info: (straight after boot finished)
# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:1F:D0:B1:F8:36  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Interrupt:43 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:195 errors:0 dropped:0 overruns:0 frame:0
          TX packets:195 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:64753 (63.2 KiB)  TX bytes:64753 (63.2 KiB)

usb0      Link encap:Ethernet  HWaddr 86:9C:83:40:38:57  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

# service network restart
Restarting network (via systemctl):                        [  OK  ]
# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:1F:D0:B1:F8:36  
          inet addr:192.168.10.70  Bcast:192.168.10.255  Mask:255.255.255.0
          inet6 addr: 2001:980:3480:0:21f:d0ff:feb1:f836/64 Scope:Global
          inet6 addr: fe80::21f:d0ff:feb1:f836/64 Scope:Link
          inet6 addr: fd00:c0a8:a00:1::70/64 Scope:Global
          UP BROADCAST RUNNING MULTICAST  MTU:7200  Metric:1
          RX packets:61 errors:0 dropped:0 overruns:0 frame:0
          TX packets:89 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:5299 (5.1 KiB)  TX bytes:18397 (17.9 KiB)
          Interrupt:43 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:227 errors:0 dropped:0 overruns:0 frame:0
          TX packets:227 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:70583 (68.9 KiB)  TX bytes:70583 (68.9 KiB)

usb0      Link encap:Ethernet  HWaddr 86:9C:83:40:38:57  
          inet addr:192.168.0.200  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::849c:83ff:fe40:3857/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:15 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:7265 (7.0 KiB)

# chkconfig --list NetworkManager

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

# systemctl|grep Network
ntpd.service              loaded active running       Network Time Service
ups.service               loaded active running       LSB: Starts the Network UPS tools
network.target            loaded active active        Network

# service NetworkManager status
Redirecting to /bin/systemctl  status NetworkManager.service
NetworkManager.service - Network Manager
	  Loaded: loaded (/lib/systemd/system/NetworkManager.service)
	  Active: inactive (dead)
	  CGroup: name=systemd:/system/NetworkManager.service


During startup systemd sat for a few minutes (!) with the message:
Bringing up loopback interface: _

As I show with the network restart this should not happen as the configuration is 100%.
NetworkManager is disabled as it cannot handle the usb0 interface (see the bugz I created for NM) and will confuse usb0 and eth0.
I do think that I should find a well configured network after boot.
Also:

NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
r8169 0000:03:00.0: eth0: link down
r8169 0000:03:00.0: eth0: link down
ADDRCONF(NETDEV_UP): eth0: link is not ready
r8169 0000:03:00.0: eth0: link up

eth0 was never down (link-wise). So why did that happen? eth0 is connected to a switch which is on 24/7.

Comment 1 Michal Schmidt 2011-10-09 23:51:19 UTC

(In reply to comment #0)
> Version-Release number of selected component (if applicable):
> systemd-26-10.fc15.x86_64

Is this bug a regression from a previous systemd version?

> How reproducible:
> boot into 3.0.4 (?)

Is it reproducible with a Fedora kernel (kernel-2.6.40.6-0.fc15 or similar)?

> During startup systemd sat for a few minutes (!) with the message:
> Bringing up loopback interface: _

It would be nice to see some logs. Please boot with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg", save the log using "dmesg > dmesg.txt" and attach the resulting file here.

Comment 2 udo 2011-10-10 14:14:38 UTC

Created attachment 527243 [details]
dmesg log

Comment 3 udo 2011-10-10 14:14:55 UTC

Might be a regression as this did not happen before.
Did not test a Fedora kernel.
Logs attached.

Probable core problem:
systemd[1]: network.service operation timed out. Terminating.

`service network restart` works fine though.

Comment 4 udo 2011-10-10 14:17:13 UTC

can I lower the dmesg verbosity without rebooting?

Comment 5 Michal Schmidt 2011-10-10 15:19:13 UTC

(In reply to comment #2)
> dmesg log

Timestamps are missing in the dmesg log. This makes it difficult to see where the delays occurred. Fedora kernels use CONFIG_PRINTK_TIME=y for a reason.

(In reply to comment #3)
> Might be a regression as this did not happen before.

It would be helpful to know that for sure. Try downgrading to an older version of systemd. You can download old builds from http://koji.fedoraproject.org/koji/packageinfo?packageID=10477

> Did not test a Fedora kernel.

Could you test if it behaves in the same way with the Fedora kernel?

> Probable core problem:
> systemd[1]: network.service operation timed out. Terminating.

Indeed. Let's try to get some information out of network.service.
Please follow the advice given in this comment of another bug:
https://bugzilla.redhat.com/show_bug.cgi?id=692008#c33
(you don't have to read the whole discussion there, only the comment with the attachment "network.service unit for debugging")

(In reply to comment #4)
> can I lower the dmesg verbosity without rebooting?

You can send PID 1 the signal SIGRTMIN+23 to switch the log level to info.

Comment 6 udo 2011-10-10 15:34:03 UTC

Thanks for the SIGRTMIN tip.

W.r.t. network I see:

# grep network ~/noip.txt 
systemd[1]: Installed new job network.service/start as 157
systemd[1]: Installed new job network.target/start as 158
systemd[1]: About to execute: /etc/rc.d/init.d/network start
systemd[1]: Forked /etc/rc.d/init.d/network as 3113
systemd[1]: network.service changed dead -> start
systemd[1]: network.service operation timed out. Terminating.
systemd[1]: network.service changed start -> final-sigterm
systemd[1]: Received SIGCHLD from PID 3113 (network).
systemd[1]: Got SIGCHLD for process 3113 (network)
systemd[1]: Child 3113 belongs to network.service
systemd[1]: network.service: control process exited, code=killed status=15
systemd[1]: network.service got final SIGCHLD for state final-sigterm
systemd[1]: network.service changed final-sigterm -> failed
systemd[1]: Job network.service/start finished, result=failed
systemd[1]: Unit network.service entered failed state.
systemd[1]: network.target changed dead -> active
systemd[1]: Job network.target/start finished, result=done
systemd[1]: network.service: cgroup is empty

So no details about why the network timed out.
Is the network service the same as my `service network restart` thing?
If so, what could I do to shine more light on this failing?
sh -x in someplace? 
Oh, I see that https://bugzilla.redhat.com/show_bug.cgi?id=692008#c33 basically does such a thing.; thanks, will try that.

Comment 7 udo 2011-10-12 15:41:48 UTC

Tried the advice from https://bugzilla.redhat.com/show_bug.cgi?id=692008#c33.
Upon bootup this way the PC sits and sits and does not continue.
The screensaver darkened the screen at least twice before I gave up. (had my evening meal, went for a walk to the local ATM and no progress)
So how long is this supposed to take?

Comment 8 Michal Schmidt 2011-10-12 15:48:03 UTC

Sorry, change the TimeoutSec=0 in network.service to TimeoutSec=90 or so. In the other bug it made sense to disable the timeout, but not here.
Well, at least now we know that the network script gets truly locked up on startup.

Comment 9 Michal Schmidt 2011-10-12 22:36:34 UTC

I suspect the only point where the network script could possibly hang while bringing up loopback is when it tests whether NM is running. It tries to talk to D-Bus there.
With this suspicion in mind I took another look at the dmesg from comment #2. 
The interesting events are:
  2402	systemd[1]: dbus.socket changed dead -> listening
  2482	systemd[1]: network.service changed dead -> start
  2504	systemd[1]: Incoming traffic on dbus.socket
  2507	systemd[1]: dbus.socket changed listening -> running
  2543	systemd[1]: network.service operation timed out. Terminating
  2635	systemd[1]: dbus.service changed dead -> start-pre

The network script talked to dbus, but dbus.service started only after network.service timed out. dbus.service is not supposed to have such an ordering dependency on network.service!

Looking further I found where the dependency comes from. You are using sysklogd instead of the default rsyslog. sysklogd in F15 is an LSB service (unlike rsyslog, which is native). It has "Provides: $syslog" in its header. The ordering dependencies are thus:
dbus.service
 -> syslog.target  (explicitly stated in dbus.service)
  -> sysklogd.service (inferred from the LSB header)
   -> auditd.service (due to SysV priorities)
    -> network.service (due to SysV priorities)
Since network.service wants to talk to dbus.service, the deadlock is now obvious. The easiest solution is just to switch to rsyslog.
Reassigning to sysklogd for comments.

Comment 10 udo 2011-10-13 03:09:53 UTC

Thanks for your analysis.
NM is not running due to https://bugzilla.redhat.com/show_bug.cgi?id=691327.
Does this influence the situation?

Comment 11 Michal Schmidt 2011-10-13 06:59:00 UTC

To some extent yes. Enabling NM would allow you to disable network.service and avoid this particular deadlock loop.

But the main problem is really sysklogd.
sysklogd must be modified to the rules listed in http://www.freedesktop.org/wiki/Software/systemd/syslog or be removed from Fedora.

Comment 12 Michal Schmidt 2011-10-24 15:52:03 UTC

sysklogd is going away. See bug 748495.

Comment 13 udo 2011-11-13 17:22:49 UTC

Booted into 3.0.9 with just rsyslog present. Same deal.
No network.
'service network restart`
Presto.
Network.

# rpm -qa|grep log
fedora-logos-15.0.1-1.fc15.noarch
krb5-auth-dialog-3.0.1-1.fc15.x86_64
abrt-plugin-logger-2.0.3-4.fc15.x86_64
logwatch-7.3.6-70.20110203svn25.fc15.noarch
shared-desktop-ontologies-0.6.0-1.fc15.noarch
xorg-x11-drv-elographics-1.2.99-3.20101206git6fd22a9d6.fc15.x86_64
rsyslog-5.8.5-1.fc15.x86_64
telepathy-logger-0.2.8-1.fc15.x86_64
logrotate-3.7.9-14.fc15.x86_64
dialog-1.1-12.20100428.fc15.x86_64
mcelog-1.0-0.3.pre3.fc15.x86_64
# uname -a
Linux surfplank2.hierzo 3.0.9 #3 SMP PREEMPT Sat Nov 12 12:06:40 CET 2011 x86_64 x86_64 x86_64 GNU/Linux

Comment 14 Michal Schmidt 2011-11-14 10:05:59 UTC

In that case please attach a new dmesg log using the boot parameters in comment #1.

Comment 15 udo 2011-11-14 15:11:10 UTC

Created attachment 533550 [details]
3.0.9 dmesg with debug

Comment 16 udo 2011-11-14 15:11:37 UTC

I do not see so much network happening in this dmesg.

Comment 17 Michal Schmidt 2011-11-14 15:25:59 UTC

network.service is not mentioned at all in the dmesg. Is it actually enabled?
systemctl enable network.service

Comment 18 udo 2011-11-14 15:55:58 UTC

Weird if it became disabled.
systemctl status showed the result of the manual correction (OK...).
So I will schedule a new boot to test behaviour again.

Comment 19 Fedora End Of Life 2012-08-07 16:42:38 UTC

This message is a notice that Fedora 15 is now at end of life. Fedora
has stopped maintaining and issuing updates for Fedora 15. It is
Fedora's policy to close all bug reports from releases that are no
longer maintained. At this time, all open bugs with a Fedora 'version'
of '15' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we were unable to fix it before Fedora 15 reached end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged to click on
"Clone This Bug" (top right of this page) and open it against that
version of Fedora.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Note You need to log in before you can comment on or make changes to this bug.