Bug 708207 - very long boot time in fedora 15
Summary: very long boot time in fedora 15
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 15
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Dan Williams
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 709009 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-26 22:19 UTC by Ian Malone
Modified: 2013-01-13 11:46 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-08-07 15:59:46 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
dmesg output with systemd logging (221.52 KB, text/plain)
2011-05-27 08:02 UTC, Ian Malone
no flags Details
slow startup /var/log/messages (478.26 KB, text/plain)
2011-05-31 19:48 UTC, Ian Malone
no flags Details
slightly faster /var/log/messages (443.75 KB, text/plain)
2011-05-31 19:56 UTC, Ian Malone
no flags Details

Description Ian Malone 2011-05-26 22:19:10 UTC
Description of problem:
Fedora 15 takes an incredibly long time to start on this machine (Intel Core 2 Duo T5500, 1GB RAM laptop), booting takes several minutes as opposed to F13. No custom modules, just stock Fedora, intel wireless and graphics, used preupgrade to go F13 to F15. I can try and provide more information, but I'm not sure where to start with this, any ideas? Some things that might be relevant below.

dmesg, at the 'gap':
[   43.006805] sky2 0000:02:00.0: eth0: enabling interface
[   43.010915] ADDRCONF(NETDEV_UP): eth0: link is not ready
[   43.044360] iwl3945 0000:06:00.0: loaded firmware version 15.32.2.9
[   43.111086] iwl3945 0000:06:00.0: Error setting Tx power (-5).
[   43.132498] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[  344.645061] RPC: Registered udp transport module.
[  344.645066] RPC: Registered tcp transport module.
[  344.645069] RPC: Registered tcp NFSv4.1 backchannel transport module.

The point in /var/log/boot.log where it stops:
Starting LSB: Port reservation utility...
Started LSB: Port reservation utility.
Started LSB: processor frequency scaling support.
**** <- several minutes spent here
Starting LSB: Bring up/down networking ESC[1;31mfailedESC[0m, see 'systemctl status network.service' for details.
Starting SYSV: The rpcbind utility is a server that converts RPC program numbers into universal addresses. It must be running on the host to be able to make RPC calls on a server on that machine....

Unfortunately those instructions aren't very informative:
# systemctl status network.service
network.service - LSB: Bring up/down networking
         Loaded: loaded (/etc/rc.d/init.d/network)
         Active: failed since Thu, 26 May 2011 21:29:35 +0100; 7min ago
        Process: 882 ExecStart=/etc/rc.d/init.d/network start (code=killed, signal=TERM)
         CGroup: name=systemd:/system/network.service

Doesn't explain why it waits for networking at this point, nor why it attempts (or seems to attempt) to start several services after this, which seems contrary to the way systemd is supposed to work:
Starting LSB: Starts the NFSv4 id mapping daemon...
Starting LSB: Start up the NFS file locking sevice...
Starting LSB: The CUPS scheduler...
Starting LSB: Start and stop the MD software RAID monitor...
Starting LSB: Mount and unmount network filesystems....
Starting LSB: Starts the RPCSEC GSS client daemon...
Started LSB: Start and stop the MD software RAID monitor.
Started LSB: Starts the RPCSEC GSS client daemon.
Started LSB: Mount and unmount network filesystems..

etc., of course this might just be that it's not reporting these until after the network.service fail.

Top ten in systemd-analyze, network doesn't appear at all:
$ systemd-analyze blame
 15464ms fedora-sysinit-hack.service
 14464ms var-lock.mount
 14449ms var-run.mount
 14047ms systemd-vconsole-setup.service
 13680ms systemd-binfmt.service
 13608ms media.mount
 13149ms remount-rootfs.service
 13142ms systemd-remount-api-vfs.service
 13134ms hwclock-load.service
 9852ms proc-sys-fs-binfmt_misc.mount

How reproducible:
100%

Steps to Reproduce:
1. Turn computer on.
2. Make cup of tea.
3.
  
Actual results:
Tea gets cold before login appears.

Expected results:
Tea gets cold after login appears.

Comment 1 Michal Schmidt 2011-05-26 23:04:27 UTC
(In reply to comment #0)
> dmesg, at the 'gap':
> [   43.006805] sky2 0000:02:00.0: eth0: enabling interface
> [   43.010915] ADDRCONF(NETDEV_UP): eth0: link is not ready
> [   43.044360] iwl3945 0000:06:00.0: loaded firmware version 15.32.2.9
> [   43.111086] iwl3945 0000:06:00.0: Error setting Tx power (-5).
> [   43.132498] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> [  344.645061] RPC: Registered udp transport module.
> [  344.645066] RPC: Registered tcp transport module.
> [  344.645069] RPC: Registered tcp NFSv4.1 backchannel transport module.

300 s = 5 minutes = the default timeout for starting SysV services.
The network service is not starting right. What is your network configuration? /etc/sysconfig/network-scripts/ifcfg-*
Do any of the network interfaces work after the boot finishes?

Please boot with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg" and attach the complete dmesg.

Another useful option is "systemd.default_standard_output=syslog". It will make the output of the network initscript (and others) go to /var/log/messages.

Comment 2 Ian Malone 2011-05-27 08:02:52 UTC
Created attachment 501265 [details]
dmesg output with systemd logging

The dmesg output is attached.
/etc/sysconfig/network-scripts/ is full of -Auto_ configurations (generated by NM in the past). I moved those out (in fact moved everything ifcfg-* out to start with, but am now putting -lo -eth0 and the ones I normally use back) and the long hang time disappeared, the login screen now appears at about the time disk activity stops.

So my problem is solved, however it seems that the hang occurs during start as every ifcfg is being tested (had a look at /var/log/messages). It seems wrong that this should be able to affect boot times in this way as:
1. You can quite easily accumulate lots of ifcfg-Auto_* over time.
2. Having the network up isn't really necessary for login and, if you have NM set up as I did coming from F13, then it picks the correct one after login and you can see how long it takes at that point.
Happy to test attempts to improve this.

Thanks for your help.

Comment 3 Bill Nottingham 2011-05-27 18:29:41 UTC
One possible change would be to have NM modify these connections it writes such that ONBOOT=no, but that may affect NM's behavior.

Comment 4 Jirka Klimes 2011-05-31 10:48:24 UTC
There should not be any (big) delay even in case of many connection files.
Ian, could you post /var/log/messages so that we can see the logs?

Comment 5 Bill Nottingham 2011-05-31 16:58:31 UTC
My experience was that it would serially go through the wireless interfaces and try to connect to each one, scanning for the AP each time until it times out.

Comment 6 Bill Nottingham 2011-05-31 17:12:56 UTC
*** Bug 709009 has been marked as a duplicate of this bug. ***

Comment 7 Ian Malone 2011-05-31 19:48:00 UTC
Created attachment 502081 [details]
slow startup /var/log/messages

Comment 8 Ian Malone 2011-05-31 19:56:23 UTC
Created attachment 502090 [details]
slightly faster /var/log/messages

The slower one of these two is booting with ONBOOT=ON in two ifgcfg- files for wireless networks with a certificate login. Not quite as slow as having all the -auto enabled, but it makes a real difference.

Comment 9 Jirka Klimes 2011-06-02 11:13:33 UTC
I see a few issues here.

First, you have both NetworkManager.service and network.service active, which is not necessary and causes problems here, in fact. 'network' service will try to activate interfaces (with ONBOOT=yes) via nmcli if NM is running and actually waits for the connections to complete.
If you disable the network.service the delays should go away:
# systemctl disable network.service

Second, there is probably an issue with nmcli/libnm-glib or something, because nmcli didn't receive the signal that the connection failed and it timed out only after the default timeout (Error: Timeout 90 sec expired.). We will have to look into that.
Nonetheless, even without this issue, the network service waits until the connection is finished (either successfully or not). So, there's always some delay.
The timeout or the possibility not to wait at all can be adjust with [--nowait] and [--timeout <timeout>] options to nmcli, if network service wants to do that (probably not).

Third, there is NetworkManager-wait-online.service if we want to wait for NetworkManager to connect before continuing in boot.

Fourth, not related to this issue; I've noticed 

/etc/init.d/functions: line 58: /dev/stderr: No such device or address

many times in the logs. What is your version of initscripts?

Comment 10 Michal Schmidt 2011-06-02 11:59:37 UTC
(In reply to comment #9)
> Fourth, not related to this issue; I've noticed 
> 
> /etc/init.d/functions: line 58: /dev/stderr: No such device or address
> 
> many times in the logs. What is your version of initscripts?

This message appears always when the stdout redirection "systemd.default_standard_output=syslog" is used (it also redirects stderr). The initscripts try to detect what type of terminal /dev/stderr is. This detection fails for the syslog-redirected stderr. But the only consequence is that the initscripts put colorizing ANSI sequences into their output. The error can be safely ignored.

Comment 11 Ian Malone 2011-06-02 19:32:30 UTC
In reply to Jirka Klimes:
> If you disable the network.service the delays should go away:
> # systemctl disable network.service

Yes, that prevents it happening, network was probably still enabled from an earlier install (this machine has been upgraded a few times). With respect to your second and third points, it doesn't seem right to me that it's the default for the system to wait NetworkManager, though the particular hold-up I ran into is probably rare. Anyway, thanks for your help.

Ian

Comment 12 solanum 2011-06-28 20:04:39 UTC
I just updated my Fedora 15 install, which I believe I did last week or the week before. Firefox didn't stop correctly, and the GUI drop down for shutdown didn't work. 

I had an issue with the yum upgrade where it left a whole bunch of duplicate packages on the the system or in the database at least. I ran package-cleanup and package-cleanup --removedupes. (or whatever that flag is..)

(I ran into a similar issue with a different portable on saturday, but there the kernel didn't work correctly, after i fixed all the issues and I interupted the update. I didnt interupt this update unless the network flaked and I didn't notice it.. The only thing I noticed was a warning about glibc update script didnt complete)



It is hanging at:


Starting Display Manager:
Started Display Manager. 
Started LSB: start and stop ksm.
Started LSB: tune the spped of ksm.
Started LSB: Starts the NFSv4 id mapping daemon. 

it is hung here, i let it sit for 30+ minutes..
-- 
I think it is a intel T2300 core duo processor. Thinkpad z71t..

Wireless is disabled with the hardware switch.

I had an issue with the yum upgrade where it left a whole bunch of duplicate packages on the the system or in the database at least. I ran package-cleanup and package-cleanup --removedupes. (or whatever that flag is..)

I looked at the logs in /var/messages
and it said roughly, wireless radio disabled by switch. Wireless enabled via state file. 

---
I removed the wireless ifcfg-wlan type configurations.

It got a little further. to 
Started LSB: processor frequench scaling support. 
[  922.557950] tg3 000:02:00:0: em1: Link is up at 100 Mbps, full duplex
[  922.558125] tg3 000:02:00:0: em1: Flow control is on for TX and on for RX
[  922.558632] ADDRCONF(NETDEV_CHANGE): em1: link becomes ready

its hung again. (it sat for about 20 minutes.) I tried to disable networking via
systemctl disable network.system 
that didn't work..

I unplugged the ethernet cables :) 
 used linux single

It is now hung at:
[  573.9084371] ADDRCONF(NETDEV_CHANGE): em1: link is not ready
Starting auditd:  [OK]
Starting rpcbind: [OK]
Starting ksmtuned: [OK]
Starting ksm: [OK]
Starting RPC idmapd: [ 574.864208] RPC Registered udp transport module.
[574.864352] RPC: Registered tcp transport module. 
[ 574.864490] RPC: Registered tcp NFSv4.1 backchannel transport module. 
                                                                 [OK]

for about 15 minutes. 
I am probably going to have to reinstall. 


I only autoconnect to wireless, and the hardwired port em1 is only enabled via network manager. And the radio was disabled via the hardware switch.

Im actually wondering if this is reverting on the filesystem level. During the force reboot, I noticed a LOT of policty kit/ audit errors.  I decide to change to an older kernel, since the grub single flag is on the newest kernel, and now it is hanging in the same spot where it hung intially.

Started LSB: Starts the NFSv4 id mapping daemon. 

--hang--


I hope this helps, i probably need to start a reinstall. I don't have time to have this offline for very long.

Comment 13 solanum 2011-06-30 15:24:38 UTC
(I should have written this down yesterday.) 

Reinstalled using the F15 beta dvd and did a minimal install. 
 
If I use NM_CONTROLLED="no" and ONBOOT="yes" in ifcfg-em1 even after I reboot it doesn't configure the nic. (However, i think this worked before i installed either the "Graphical Desktop" or X Window Server groups. 

The network start script also isn't putting the DNS1 and DNS2 entries into the resolv.conf file if NM is disabled. 

---

The hang doesn't occur in the multi-user runlevel, it only occurs if you go to graphical mode where it hangs in the same spot:
It is hanging at:


Starting Display Manager:
Started Display Manager. 
Started LSB: start and stop ksm.
Started LSB: tune the speed of ksm.
Started LSB: Starts the NFSv4 id mapping daemon. 

it is hung here, i let it sit for 10+ minutes. until I installed X Window Server. (im still getting some complaints during shutdown about a missing firmware directory.. I might still be missing some pieces.)


I am switching runlevels using the systemd links, and possibly I am doing that incorrectly. ?

There is a dbus.NetworkManager in the systemd startup scripts. I moved it and it booted up into GUI, but with no login appearing or menus.  (If you hit ctrl-alt-backspace you get a segfault and it restarts.) If you run startx from the command line in multi-user target, im getting a blank screen with no menus, But I do get a logout when i hit ctrl-alt delete, which then drops me back to the command line. 

(When I did a groupinstall "Graphical Desktop" it didn't install "X Window Servers") 
 
Also even though I am in multi-user mode, I get the F splashscreen so I miss all the shutdown messages. (although it isn't disabled during boot either, but the escape key isnt working either.)

With the dbus-org.freedesktop.NetworkManager.service (located in /etc/systemd/system) disabled(moved elsewhere), I am now getting a graphical version of the disk encryption key entry window during boot for /home (even though I hit esc to read the boot messages, after entry you have to hit esc again to see the messages.) Before I was just getting a command line version. (/ is still getting a command line version.. (I moved it back and im not getting the graphical cryptokey entry for /home anymore, however it returned after enabling the graphical user interface.)

Comment 14 Paul Lambert 2011-07-24 16:38:42 UTC
I was/am experiencing the same issue.  I did turn off the network service (chkconfig network off) and that has made booting acceptable.

There is still a longer lag from when the Fedora icon fill is complete until the login screen appears.  More lag than in F14.

The fact that too many and conflicting services are launching begs the question why is this the default configuration?  I have found this link 

http://www.fedoraforum.org/forum/showthread.php?t=215900

that claims there are at least 15 non-essential services launching when Fedora boots.  Why?

Comment 15 solanum 2011-08-01 13:27:47 UTC
If you disable avahi, mdns(?), cups, with systemctl it speeds it up the whole experience considerably. It gets rids of most of the network timeouts at least on my Toshiba celeron-M I think model a105 s1014. 

If you disable acpi, you get rid of some other gui timeouts, especially the black screen trying to wake up the machine after sleep mode and a lockup which i think is caused by an idle (ie you are waiting for a page to load and walk away for a few minutes.)

Also using btrfs for the whole disk (except /boot) sped up the machine and network experience. 

The weird thing that made me jump to btrfs, was that after getting the "black Screen of death" after sleep mode, was successive reboots kept getting me a little further to where I wanted to get to which was a web-based flash game. 
(ie the machine went to sleep, possibly because of heat, you rebooted, and it never got to the login screen. You rebooted again, it got to the login screen, you logged in, the desktop didnt come up, you rebooted and got to where you could launch a web browser it timed out. You rebooted again, it actually loaded a page, you had previously loaded, then finally. I could get to a page. type of scenario.) which makes you think it is a file caching issue somewhere. 

Part of this is where Firefox isn't giving up the cursor, so you cannot move it when it is timing out on a network connection. It is giving the appearance of a locked machine. Other times, when gnome doesnt load all the way, you get none of the menus and you are a sitting duck because you can move the mouse, but there is nothing to click on and you don't have the right-click menu. 

I -think- the iPads, Windows7, iPhones, etc that I was sharing the network with was in part caching stale dns information. I think one of the windows7 machines was infected with a virus. 

On the bright side btrfs is blazing fast relative to ext4. It was very nice to have that option (yes i know it is a lil dangerous).

Comment 16 Anil Seth 2011-10-29 14:30:27 UTC
Reference comment 9 - I had the problem of system.service being enabled when it was not needed. However, it showed up only after an upgrade to kernel 2.6.38.8-35 and later - and I filed a bug report for the kernel upgrade:
https://bugzilla.redhat.com/show_bug.cgi?id=723923

Just as in comment 11, network.service was enabled from, may be, fedora 9 or 10 and working fine; hence, it was hard to realise the conflict between this and NetworkManager.

Only recently I found the cause because of the following line in error log:
Oct 25 15:22:28 amd avahi-daemon[1792]: Registering new address record for fe80::217:31ff:fe93:e65a on eth0.*.
Oct 25 15:27:26 amd systemd[1]: network.service operation timed out. Terminating.

My request would be that if NetworkManager could log a warning that network.service may not be needed - it would help.

Thanks

Comment 17 Carlo de Wolf 2011-11-25 08:55:47 UTC
For the moment I need network service as a work-around for https://bugzilla.redhat.com/show_bug.cgi?id=756937.

I think after that one is fixed, this one be closed by stating that the network service needs to be disabled when using NetworkManager.

Comment 18 Vu Minh Khue 2011-12-12 05:33:55 UTC
I have the same problem. I am using fedora 15 x86 (kernel 2.6.41)
The boot process takes very long time (around 10 - 15min):

Started Avahi mDNS/DNS-SD Stack.
Started ABRT Automated Bug Reporting Tool.
Started Bluetooth Manager.
Stopping Syslog Kernel Log Buffer Bridge...
Stopped Syslog Kernel Log Buffer Bridge.
Started System Logging Service.
Started SYSV: sandbox, xguest and other apps that want to use pam_namespace require this script be run at boot.  This service script does not actually run any service but sets up: /var/tmp, /tmp and home directories to be used by these tools. If you do not use sandbox, xguest or pam_namespace you can turn this service off..
Starting LSB: Port reservation utility...
Starting LSB: start and stop ip6tables firewall...
Starting LSB: start and stop iptables firewall...
Starting LSB: Starts and stops login iSCSI daemon....
Starting LSB: Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
Started LSB: Port reservation utility.
Started LSB: start and stop iptables firewall.
Started Network Manager.
Started LSB: Starts and stops login iSCSI daemon..
Started LSB: Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Started LSB: start and stop ip6tables firewall.
Starting LSB: Bring up/down networking...

It seems to take very long after the previous line.

Comment 19 Kaare Fiedler Christiansen 2012-04-01 09:56:12 UTC
What worked for me was looking in /etc/rc5.d and looking at what services were started with sysv init. There were three suspicious ones:
  iscsi
  iscsid
  network
I disabled them all with
  sudo chkconfig iscsi off
  sudo chkconfig iscsid off
  sudo chkconfig network off
After this, my system boots very fast.
I don't know if there is a better alternative to iscsi and iscsid if you actually need them - I didn't.

Comment 20 Fedora End Of Life 2012-08-07 15:59:49 UTC
This message is a notice that Fedora 15 is now at end of life. Fedora
has stopped maintaining and issuing updates for Fedora 15. It is
Fedora's policy to close all bug reports from releases that are no
longer maintained. At this time, all open bugs with a Fedora 'version'
of '15' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we were unable to fix it before Fedora 15 reached end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged to click on
"Clone This Bug" (top right of this page) and open it against that
version of Fedora.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping


Note You need to log in before you can comment on or make changes to this bug.