Bug 1093942

Summary: NetworkManager-wait-online.service takes too long
Product: [Fedora] Fedora Reporter: Chris Murphy <bugzilla>
Component: NetworkManagerAssignee: Lubomir Rintel <lkundrak>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 22CC: bugzilla, dcbw, rkhan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-19 11:26:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 963210    
Attachments:
Description Flags
dmesg
none
journal
none
journal with systemd.log_level=debug
none
journalctl with NM debug
none
journalctl with NM debug and ignore-carrier none

Description Chris Murphy 2014-05-03 18:20:38 UTC
Description of problem: Nearly 6 seconds for NetworkManager-wait-online.service seems excessive, especially for a VM.


Version-Release number of selected component (if applicable):
NetworkManager-0.9.9.1-5.git20140319.fc21.x86_64

How reproducible:
Every boot


Steps to Reproduce:
1. Boot


Actual results:
# systemd-analyze blame
          5.764s NetworkManager-wait-online.service
          1.198s firewalld.service
           692ms vboxadd.service
           528ms ModemManager.service
           524ms vboxadd-x11.service
           486ms systemd-logind.service
           484ms abrt-ccpp.service
           482ms lvm2-monitor.service
           469ms chronyd.service
           432ms mcelog.service
           424ms nfs-config.service
           421ms rsyslog.service
           387ms avahi-daemon.service
           340ms libvirtd.service
           254ms plymouth-start.service
           213ms NetworkManager.service
           120ms vboxadd-service.service
           106ms systemd-fsck@dev-disk-by\x2duuid-D4A4\x2dEE15.service
           103ms systemd-fsck@dev-disk-by\x2duuid-ccfb8f09\x2df0e2\x2d40b1\x2da899\x2d4e758d58b590.service
            96ms systemd-udev-trigger.service
            81ms systemd-sysctl.service
            69ms sys-kernel-debug.mount
            67ms systemd-vconsole-setup.service
            60ms kmod-static-nodes.service
            58ms systemd-tmpfiles-setup.service
            57ms plymouth-quit-wait.service
            55ms systemd-tmpfiles-setup-dev.service
            53ms var-lib-nfs-rpc_pipefs.mount
            50ms polkit.service
            50ms plymouth-quit.service
            48ms fedora-import-state.service
            48ms dev-mqueue.mount
            44ms boot-efi.mount
            39ms dev-hugepages.mount
            38ms fedora-readonly.service
            36ms sshd.service
            36ms user
            32ms home.mount
            31ms var.mount
            30ms plymouth-read-write.service
            26ms auditd.service
            26ms systemd-remount-fs.service
            21ms tmp.mount
            20ms systemd-journal-flush.service
            19ms boot.mount
            18ms systemd-random-seed.service
            17ms systemd-udevd.service
            15ms systemd-update-utmp.service


Expected results:

Certainly much less than 5 seconds.


Additional info:
VirtualBox 4.3.10

Comment 1 Chris Murphy 2014-05-03 18:24:03 UTC
Created attachment 892173 [details]
dmesg

3.15.0-0.rc3.git0.1.fc21.x86_64

Comment 2 Chris Murphy 2014-05-03 18:35:15 UTC
Created attachment 892174 [details]
journal

kernel-3.15.0-0.rc3.git0.1.fc21.x86_64

These things seem suspicious, but maybe aren't at all related to the problem.

1.
Why is NetworkManager seemingly causing the kernel to load cfg80211? This is a VM and doesn't have wireless hardware.

[    6.034200] rawhide.localdomain systemd[1]: Starting Network Manager...
[    6.111313] rawhide.localdomain NetworkManager[559]: <info> NetworkManager (version 0.9.9.1-5.git20140319.fc21) is starting...
[    6.111672] rawhide.localdomain NetworkManager[559]: <info> Read config: /etc/NetworkManager/NetworkManager.conf
[    6.112205] rawhide.localdomain NetworkManager[559]: <info> WEXT support is enabled
[    6.139399] rawhide.localdomain kernel: cfg80211: Calling CRDA to update world regulatory domain


2.
There's a large time gap between NetworkManager startup complete, and the start of NetworkManager-wait-online.

[    8.708717] rawhide.localdomain avahi-daemon[426]: Registering new address record for fe80::a00:27ff:fee3:26e3 on enp0s3.*.
[   12.005997] rawhide.localdomain NetworkManager[559]: <info> startup complete
[   12.017118] rawhide.localdomain systemd[1]: Started Network Manager Wait Online.

Comment 3 Chris Murphy 2014-05-03 18:44:43 UTC
Created attachment 892175 [details]
journal with systemd.log_level=debug

Nearly one second gap:
[   18.626906] rawhide.localdomain NetworkManager[557]: <info> Activation (enp0s3) Stage 2 of 5 (Device Configure) complete.
[   18.633364] rawhide.localdomain systemd[625]: Executing: /usr/bin/nm-online -q --timeout=30
[   19.418640] rawhide.localdomain NetworkManager[557]: <info> Activation (enp0s3) Stage 3 of 5 (IP Configure Start) scheduled.

Nearly 4 second gap:
[   19.518454] rawhide.localdomain dbus-daemon[442]: dbus[442]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
[   19.523738] rawhide.localdomain systemd[1]: Started Network Manager Script Dispatcher Service.
[   20.003527] rawhide.localdomain ModemManager[429]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00:03.0': not supported by any plugin
[   20.445155] rawhide.localdomain avahi-daemon[425]: Registering new address record for fe80::a00:27ff:fee3:26e3 on enp0s3.*.
[   24.006218] rawhide.localdomain NetworkManager[557]: <info> startup complete
[   24.023384] rawhide.localdomain systemd[1]: Started Network Manager Wait Online.
[   24.024888] rawhide.localdomain systemd[1]: Starting Network is Online.
[   24.025352] rawhide.localdomain systemd[1]: Reached target Network is Online.


Is NetworkManager-wait-online.service somehow getting hit with an Avahi startup delay? systemd-analyze blame puts it at
315ms avahi-daemon.service


By far the longest pause in console during startup is the last line in this sequence:

[  OK  ] Started Network Manager.
         Starting Network Manager Wait Online...
         Starting Network Manager Script Dispatcher Service...
[  OK  ] Started Network Manager Script Dispatcher Service.

There is about a 1 second pause at "Started Network Manager" and fully 8 seconds of pause at "Started Network Manager Script Dispatcher Service." And I cannot login locally or remotely until this clears.

Comment 4 Jirka Klimes 2014-05-05 13:22:02 UTC
I don't seem to be able to reproduce the problem on a physical machine.

The 4 second delay before "<info> startup complete" looks suspicious to me. It could be a timeout inside NM waiting for some event. It might be related to carried detection and the virtual driver. You may try setting
ignore-carrier=* in NetworkManager.conf to see whether it helps.

In any case it would be helpful to see debugging logs from NM. Can you put these lines to NetworkManager.conf and reboot:
[logging]
level=DEBUG
domains=ALL

Comment 5 Chris Murphy 2014-05-06 03:26:55 UTC
Created attachment 892752 [details]
journalctl with NM debug

Comment 6 Chris Murphy 2014-05-06 03:34:25 UTC
Created attachment 892755 [details]
journalctl with NM debug and ignore-carrier

NetworkManager.conf currently reads:

[main]
plugins=ifcfg-rh
ignore-carrier=*
[logging]
level=DEBUG
domains=ALL

Comment 7 Chris Murphy 2014-05-13 04:10:39 UTC
In the same VM, but a new virtual disk, I've clean installed Rawhide minimal system from netinstall media. The problem is not reproducible, NetworkManager-wait-online.service isn't even a listed service.

Switching to the VDI with the older Rawhide, which is nevertheless as up to date as this one, still has NetworkManager-wait-online.service slowness. It has the same kernel and version of Network Manager as the new install. The two differences between them, oldslow is a Gnome packageset install on Btrfs; the newfast is a Minimal packageset install on ext4 yet NetworkManager completes 18x faster.

So something in the older install with Gnome packageset is pulling in NetworkManager-wait-online.service and also causing a delay. Is it possible this is virbr0-nic or TUN/TAP related? Since Gnome includes Boxes, it pulls in libvirt. And that's sorta pointless to have in a VM. So maybe that's why I have this problem in a VM, while it doesn't happen on baremetal?

# systemd-analyze blame
           725ms firewalld.service
           558ms sshd.service
           275ms NetworkManager.service
           144ms fedora-readonly.service
           126ms systemd-logind.service
           126ms plymouth-read-write.service
           125ms plymouth-start.service
           109ms systemd-udev-trigger.service
            99ms systemd-journal-flush.service
            99ms dev-mqueue.mount
            95ms dev-hugepages.mount
            95ms systemd-fsck-root.service
            91ms auditd.service
            83ms systemd-vconsole-setup.service
            72ms systemd-tmpfiles-setup-dev.service
            63ms polkit.service
            55ms systemd-user-sessions.service
            53ms plymouth-quit-wait.service
            53ms plymouth-quit.service
            47ms kmod-static-nodes.service
            39ms systemd-random-seed.service
            35ms user
            32ms systemd-update-utmp.service
            30ms systemd-remount-fs.service
            26ms systemd-tmpfiles-setup.service
            21ms systemd-sysctl.service
            19ms sys-kernel-debug.mount
            16ms systemd-udevd.service
            14ms tmp.mount
             4ms systemd-update-utmp-runlevel.service
             2ms sys-kernel-config.mount

Comment 8 Jaroslav Reznik 2015-03-03 15:45:30 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22

Comment 9 Fedora Admin XMLRPC Client 2015-08-18 14:59:42 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 10 Fedora End Of Life 2016-07-19 11:26:54 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.