Description of problem: Booting my Fedora 19 takes a lot of time. It's probably 500% slower than Fedora 18. I used the btrfs installation option (over LVM). Version-Release number of selected component (if applicable): Fedora 19 RC3 (GOLD) How reproducible: Always Steps to Reproduce: 1. Boot 2. Wait 3. Wait even more Actual results: Slow boot Expected results: Fast boot Additional info: I installed the Fedora 19 RC3 (which was later decalred GOLD). I used btrfs over LVM partitoning.
What is the output of: systemctl status NetworkManager-wait-online.service ?
NetworkManager-wait-online.service - Network Manager Wait Online Loaded: loaded (/usr/lib/systemd/system/NetworkManager-wait-online.service; disabled) Active: inactive (dead)
OK. Can you attach the output of 'systemd-analyze blame'?
Also provide the information if this is an fresh install or an upgrade. Thanks
This is a clean install. 1min 16.366s firewalld.service 1min 16.364s avahi-daemon.service 1min 16.304s gdm.service 1min 15.741s bluetooth.service 21.448s udisks2.service 20.070s dkms_autoinstaller.service 20.002s plymouth-quit-wait.service 8.222s systemd-cryptsetup@luks\x2d374f532a\x2d2922\x2d45b6\x2d9420\x2dc49eb4c7d942.service 7.827s abrt-ccpp.service 7.640s bumblebee-nvidia.service 6.958s mcelog.service 4.573s systemd-cryptsetup@luks\x2d13f01f31\x2d0a40\x2d4bda\x2dbe29\x2de75833bb67f7.service 2.901s dmraid-activation.service 1.687s systemd-fsck@dev-disk-by\x2duuid-7527a9c0\x2d08f3\x2d407f\x2d9c8a\x2d065738ae9511.service 1.491s systemd-udev-settle.service 1.462s lm_sensors.service 1.346s fedora-loadmodules.service 1.319s systemd-binfmt.service 1.310s boot.mount 1.089s systemd-tmpfiles-setup.service 1.058s proc-sys-fs-binfmt_misc.mount 1.034s plymouth-read-write.service 973ms lvm2-monitor.service 934ms systemd-logind.service 929ms rtkit-daemon.service 924ms plymouth-start.service 858ms NetworkManager.service 850ms sys-kernel-debug.mount 850ms dev-mqueue.mount 849ms dev-hugepages.mount 807ms fedora-readonly.service 710ms tmp.mount 653ms systemd-sysctl.service 624ms boot-efi.mount 620ms sys-kernel-config.mount 583ms home.mount 569ms systemd-udev-trigger.service 434ms systemd-remount-fs.service 377ms sendmail.service 365ms accounts-daemon.service 347ms colord.service 287ms systemd-fsck-root.service 260ms lvm2-lvmetad.service 257ms systemd-tmpfiles-clean.service 242ms systemd-udevd.service 188ms chronyd.service 183ms systemd-readahead-replay.service 183ms systemd-readahead-collect.service 181ms systemd-journal-flush.service 113ms systemd-fsck@dev-disk-by\x2duuid-4aa2b89d\x2dce0d\x2d41b4\x2db23f\x2dccf3db91a2f9.service 110ms rpcbind.service 87ms sm-client.service 48ms dev-mapper-luks\x2d13f01f31\x2d0a40\x2d4bda\x2dbe29\x2de75833bb67f7.swap 36ms polkit.service 19ms systemd-readahead-done.service 14ms livesys.service 13ms upower.service 9ms wpa_supplicant.service 9ms auditd.service 8ms systemd-user-sessions.service 7ms systemd-ask-password-wall.service 7ms systemd-tmpfiles-setup-dev.service 6ms livesys-late.service 5ms systemd-update-utmp-runlevel.service 3ms systemd-vconsole-setup.service 3ms sys-fs-fuse-connections.mount 2ms systemd-random-seed-load.service
I am experiencing the same problem with an upgraded fedora 18 -> 19. This is the output of systemd-analyze blame: 1min 24.583s firewalld.service 1min 24.354s avahi-daemon.service 1min 2.235s lightdm.service 22.109s akmods.service 21.929s akmods-shutdown.service 12.300s systemd-cryptsetup@luks\x2d4b479d38\x2d5462\x2d4ad9\x2d9c25\x2d784f8fb8b1d3.service 8.745s NetworkManager-wait-online.service 2.981s rtkit-daemon.service 2.980s systemd-logind.service 2.424s systemd-tmpfiles-clean.service 1.785s sysstat.service 1.498s lm_sensors.service 1.279s systemd-tmpfiles-setup.service 1.183s chronyd.service 1.128s fedora-loadmodules.service 1.022s udisks2.service 959ms systemd-binfmt.service 938ms abrt-ccpp.service 877ms colord.service 870ms proc-sys-fs-binfmt_misc.mount 857ms console-kit-log-system-start.service 768ms NetworkManager.service 762ms sys-kernel-debug.mount 762ms dev-mqueue.mount 761ms dev-hugepages.mount 736ms plymouth-read-write.service 690ms auditd.service 672ms systemd-cryptsetup@luks\x2daf0ce3bf\x2d8fdf\x2d413f\x2dab89\x2dd43efba3883f.service 658ms systemd-modules-load.service 636ms systemd-tmpfiles-setup-dev.service 636ms fedora-readonly.service 589ms dev-disk-by\x2duuid-86723b87\x2d4884\x2d4079\x2d8575\x2d4ab319ac83ae.swap 555ms sendmail.service 528ms plymouth-start.service 417ms systemd-fsck@dev-disk-by\x2duuid-ba53da2a\x2d20b7\x2d410b\x2d9f2b\x2d5b1518568eda.service 416ms systemd-udev-trigger.service 394ms mcelog.service 381ms systemd-remount-fs.service 331ms systemd-readahead-done.service 286ms plymouth-quit.service 286ms plymouth-quit-wait.service 253ms mnt-h320.mount 247ms systemd-user-sessions.service 188ms dev-disk-by\x2duuid-3f8a1a86\x2dbbdb\x2d48de\x2d9ef0\x2de88cca658e63.swap 150ms boot.mount 126ms systemd-readahead-replay.service 86ms systemd-fsck-root.service 82ms sm-client.service 62ms polkit.service 39ms systemd-readahead-collect.service 38ms systemd-udevd.service 37ms systemd-random-seed-load.service 37ms upower.service 36ms bluetooth.service 26ms console-kit-daemon.service 25ms sshd.service 22ms accounts-daemon.service 17ms rpcbind.service 12ms systemd-update-utmp-runlevel.service 12ms systemd-journal-flush.service 8ms sys-fs-fuse-connections.mount 5ms systemd-vconsole-setup.service 4ms systemd-sysctl.service 4ms boot-efi.mount 2ms home-horazont-tmp.mount 1ms sys-kernel-config.mount 1ms tmp.mount Additionally, I plotted the startup with systemd-analyze plot and notice that avahi-daemon, firewalld and lightdm finished at the very same instant, as if they blocked for something very long (each other possibly?).
If you disable firewalld.service and avahi-daemon.service, does the rest of the system then boot fast? You both use some 3rd party kernel modules (using dkms and akmods, respectively). Is the problem reproducible without them? There may be some hints in the logs. Please attach the output of "journalctl -b".
Turns out, disabling avahi-daemon and firewalld was not a good idea. Cannot get a login screen right now, I'll try harder. No ttys either, systemd-logind complained about some timeout with a timestamp of approximately 60 in front of it. Grub doesn't allow me to edit commands (it freezes after I press `e') and grml isn't able to mount my btrfs… I guess I'll be poking at that one for a longer time…
Update: Booting the shiny new fedora 19 rescue boot entry got me a graphical login. It also booted much faster than usual. lightdm.service only takes 65ms to initialize, the usual times for the other jobs. I reenabled avahi-daemon and firewalld, but disabled akmods and akmods-shutdown for the next test. Here, the boot was also nice and fast, with firewalld taking 17 seconds to initialize (systemd-analyze blame) and avahi-daemon taking approx. 8 seconds. Note that the modules usually built by the akmod are nevertheless loaded, as no kernel version change happened.
(In reply to Jonas Wielicki from comment #9) > Update: Booting the shiny new fedora 19 rescue boot entry got me a graphical > login. Then there seems to be something wrong with the (non-rescue) initramfs. Back it up and recreate it using dracut.
Good to know it's not just me that suspects a big boot slowdown. I'm not using dkms or any other unusual stuff on startup. This was a fresh install from F19 Beta. $ systemd-analyze blame 1min 25.678s firewalld.service 1min 25.572s avahi-daemon.service 1min 25.320s chronyd.service 20.008s plymouth-quit-wait.service 7.760s dnf-makecache.service 7.704s systemd-suspend.service 7.658s systemd-udev-settle.service 6.434s bluetooth.service 6.326s systemd-logind.service 6.307s rtkit-daemon.service 5.237s gdm.service 3.568s iprupdate.service 3.565s iprinit.service 2.819s NetworkManager-wait-online.service 2.268s dev-mqueue.mount 2.110s systemd-binfmt.service 1.984s tmp.mount 1.969s fedora-loadmodules.service 1.813s sys-kernel-config.mount 1.716s proc-sys-fs-binfmt_misc.mount 1.634s sys-kernel-debug.mount 1.632s dev-hugepages.mount 1.543s abrt-ccpp.service 1.537s mcelog.service 1.447s systemd-fsck@dev-disk-by\x2duuid-d1d4a8e3\x2d702b\x2d4e5f\x2d901d\x2dec5e7550f6f5.service 1.395s colord.service 1.361s systemd-remount-fs.service 1.165s systemd-tmpfiles-setup-dev.service 1.136s iprdump.service 1.043s lvm2-monitor.service 889ms systemd-user-sessions.service 879ms NetworkManager.service 878ms plymouth-start.service 827ms systemd-udev-trigger.service 753ms systemd-tmpfiles-clean.service 732ms dmraid-activation.service 720ms systemd-tmpfiles-setup.service 681ms dev-disk-by\x2duuid-f7d6841c\x2d9522\x2d437c\x2d8414\x2dc7940b3b4e3a.swap 641ms plymouth-read-write.service 556ms auditd.service 508ms fedora-readonly.service 488ms systemd-fsck-root.service 477ms systemd-journal-flush.service 446ms systemd-sysctl.service 382ms udisks2.service 341ms systemd-readahead-replay.service 340ms systemd-readahead-collect.service 260ms boot.mount 248ms sendmail.service 216ms accounts-daemon.service 215ms polkit.service 208ms sm-client.service 197ms systemd-udevd.service 187ms lvm2-lvmetad.service 178ms upower.service 143ms home.mount 127ms livesys.service 116ms workarounds.service 29ms systemd-readahead-done.service 10ms sshd.service 5ms systemd-vconsole-setup.service 5ms systemd-fsck@dev-disk-by\x2duuid-198daf9e\x2d6267\x2d43f7\x2db579\x2d0205d6790489.service 5ms livesys-late.service 5ms rpcbind.service 4ms wpa_supplicant.service 4ms systemd-update-utmp-runlevel.service 3ms sys-fs-fuse-connections.mount 2ms systemd-random-seed-load.service
James, Do you use btrfs as well? It's curious that you have systemd-suspend.service in the list. Have you suspended the machine while booting? I would still like to see the "journalctl -b" output.
(In reply to Michal Schmidt from comment #12) > James, > Do you use btrfs as well? Yes. > It's curious that you have systemd-suspend.service in the list. Have you > suspended the machine while booting? Not while booting, but the machine has seen a couple of suspend/resume cycles. > I would still like to see the "journalctl -b" output. I'll post it below shortly.
Created attachment 776001 [details] journalctl -b output
Happening here (although in smaller measure): 21.343s accounts-daemon.service 21.328s firewalld.service 8.115s systemd-cryptsetup 5.001s dnf-makecache.service 1.132s fedora-loadmodules.service 1.070s plymouth-start.service 985ms boot-EFI.mount 949ms systemd-tmpfiles-clean.service 918ms systemd-binfmt.service 819ms NetworkManager.service 614ms fedora-readonly.service 555ms home.mount 550ms systemd-udev-trigger.service 548ms dev-mqueue.mount 548ms sys-kernel-debug.mount 547ms dev-hugepages.mount 546ms tmp.mount 545ms systemd-fsck@dev-disk-by\x2duuid-8D12\x2d7405.service 513ms proc-sys-fs-binfmt_misc.mount 480ms bluetooth.service 467ms systemd-logind.service 466ms abrt-ccpp.service 462ms lm_sensors.service 336ms rtkit-daemon.service 334ms systemd-fsck 305ms systemd-tmpfiles-setup.service 295ms colord.service 293ms systemd-remount-fs.service 249ms systemd-sysctl.service 245ms plymouth-read-write.service 205ms systemd-fsck-root.service 175ms systemd-journal-flush.service 168ms systemd-readahead-replay.service 127ms multi.mount 101ms upower.service 95ms chronyd.service 55ms udisks2.service 55ms plymouth-quit-wait.service 53ms plymouth-quit.service 34ms systemd-readahead-collect.service 33ms systemd-udevd.service 30ms sys-kernel-config.mount 26ms polkit.service 24ms sshd.service 16ms teamviewerd.service 14ms rpcbind.service 8ms gdm.service 8ms avahi-daemon.service 4ms wpa_supplicant.service 3ms auditd.service 3ms systemd-tmpfiles-setup-dev.service 2ms systemd-user-sessions.service 1ms systemd-update-utmp-runlevel.service 1ms systemd-random-seed-load.service 1ms systemd-readahead-done.service 979us systemd-vconsole-setup.service On my other machine, also core i7, but running arch I get: 6.499s firewalld.service 3.133s avahi-daemon.service 3.119s systemd-logind.service 2.985s gdm.service 1.526s accounts-daemon.service 1.176s systemd-binfmt.service 1.158s NetworkManager.service 1.130s sys-kernel-debug.mount 1.125s systemd-fsck@dev-disk-by\x2duuid-0a83d5ea\x2deceb\x2d4fb9\x2db231\x2d36d7993d8e49.service 1.118s dev-hugepages.mount 1.116s dev-mqueue.mount 1.107s systemd-udev-trigger.service 1.094s polkit.service 1.042s systemd-remount-fs.service 1.038s tmp.mount 1.009s systemd-vconsole-setup.service 933ms proc-sys-fs-binfmt_misc.mount 923ms systemd-modules-load.service 828ms systemd-tmpfiles-setup-dev.service 494ms alsa-restore.service 446ms systemd-sysctl.service 393ms colord.service 318ms systemd-udevd.service 305ms boot.mount 286ms systemd-user-sessions.service 267ms systemd-update-utmp.service 166ms sys-kernel-config.mount 146ms rpi.mount 145ms udisks2.service 126ms systemd-fsck@dev-disk-by\x2duuid-5dd4a246\x2dc65c\x2d49c8\x2d9fd2\x2dff17e72a7036.service 104ms systemd-random-seed-load.service 62ms systemd-journal-flush.service 40ms nmbd.service 39ms rtkit-daemon.service 39ms upower.service 30ms systemd-tmpfiles-setup.service 22ms home.mount 17ms smbd.service 3ms home-celebdor-games.mount 2ms systemd-tmpfiles-clean.service 837us sys-fs-fuse-connections.mount The 3x boot time on firewalld (without having touched any rules on the first machine (F19) seems excessive.
Since removing /var/log/journal to disable the persistent journal, boot times have gone down somewhat: 20.005s plymouth-quit-wait.service 19.555s firewalld.service 9.437s NetworkManager-wait-online.service 8.869s accounts-daemon.service 6.150s systemd-udev-settle.service 4.778s gdm.service 3.596s avahi-daemon.service although they still seem rather long. Takes X/gdm quite a while to start up, too. I don't have a machine on ext4 at the moment to compare. firewalld does seem slow to start, particularly if I drop the file caches beforehand: # echo 3 > /proc/sys/vm/drop_caches # time service firewalld restart Redirecting to /bin/systemctl restart firewalld.service real 0m11.483s user 0m0.012s sys 0m0.024s Once it's all in memory, restarting takes under a second.
Same issue over the last few days wrt a Fresh Fedora 19 install on a Samsung laptop having Intel i5-3210M. Initial few days it used to boot slow. However now sometime it boots slow and some times it just hangs after boot with out showing the GDB login screen. Also I am not able to switch to a text based console by pressing Ctl+Alt+F2 or so ... when the Boot hangs. NOTE: I also think the systemd configuration should be changed by default to allow the text console to start early atleast, I don't see any reason to block text console start also. So that atleast one can debug these issues. Removing rhgb and quiet from kernel command line does seem to allow the system to boot always (i.e avoid the hang I mentioned above), even thou it also blocks for the firewall, avahi and plymouth-quit-wait services to complete and thus takes time to boot into the GUI. This is the output of one of the slow boots. Startup finished in 3.766s (firmware) + 4.846s (loader) + 1.115s (kernel) + 3.356s (initrd) + 1min 5.017s (userspace) = 1min 18.102s The output of systemd-analyse blame is 49.419s firewalld.service 49.406s avahi-daemon.service 20.006s plymouth-quit-wait.service 3.890s gdm.service 3.773s bluetooth.service 3.762s systemd-logind.service 3.755s rtkit-daemon.service 2.244s chronyd.service 2.103s lvm2-monitor.service 2.083s systemd-fsck@dev-disk-by\x2duuid-5aac0cb1\x2d8708\x2d4b95\x2d9cc7\x2df43922e95244.service 2.021s abrt-ccpp.service 1.999s systemd-udev-settle.service 1.896s mcelog.service 1.804s sys-kernel-debug.mount 1.804s dev-mqueue.mount 1.785s dev-hugepages.mount 1.770s tmp.mount 1.163s plymouth-start.service 1.121s fedora-loadmodules.service 1.115s sys-kernel-config.mount 846ms systemd-tmpfiles-setup.service 815ms systemd-udev-trigger.service 779ms systemd-remount-fs.service 684ms fedora-readonly.service 655ms dev-disk-by\x2duuid-61e2cb71\x2d74a1\x2d4f63\x2d9400\x2d001f17ec3c83.swap 495ms plymouth-read-write.service 469ms colord.service 417ms lvm2-lvmetad.service 386ms systemd-readahead-replay.service 365ms systemd-user-sessions.service 356ms udisks2.service 344ms systemd-sysctl.service 340ms systemd-fsck-root.service 338ms livesys-late.service 309ms home.mount 303ms NetworkManager.service 264ms auditd.service 192ms sendmail.service 175ms boot.mount 161ms systemd-localed.service 158ms upower.service 111ms systemd-readahead-collect.service 80ms accounts-daemon.service 52ms boot-efi.mount 25ms polkit.service 21ms sm-client.service 17ms systemd-udevd.service 17ms dmraid-activation.service 13ms rpcbind.service 13ms livesys.service 8ms systemd-tmpfiles-setup-dev.service 7ms systemd-journal-flush.service 6ms systemd-readahead-done.service 5ms systemd-vconsole-setup.service 5ms systemd-fsck@dev-disk-by\x2duuid-6fcf2ebf\x2dd6bb\x2d40f1\x2d81ca\x2d23771e21f21c.service 4ms systemd-random-seed-load.service 4ms systemd-update-utmp-runlevel.service 3ms sys-fs-fuse-connections.mount
Created attachment 782433 [details] Another journalctl -b --until after boot finished For comparision, another journalctl -b output, until after login. After login, it's dominated by munin-related setroubleshootd output. Is there any progress on the investigation, anything we can do to help with this?
I have another log of a machine taking 40s to initialize gdm and five other services, would that be helpful?
It appears that removing the content of /var/log and masking systemd-journal.service greatly improves the boot time.
(In reply to lysbleu from comment #20) > It appears that removing the content of /var/log and masking > systemd-journal.service greatly improves the boot time. You must not mask systemd-journal.service! Otherwise you won't get any log messages from your system!
(In reply to Harald Hoyer from comment #21) > (In reply to lysbleu from comment #20) > > It appears that removing the content of /var/log and masking > > systemd-journal.service greatly improves the boot time. > > You must not mask systemd-journal.service! Otherwise you won't get any log > messages from your system! Perhaps it's best for us to prevent users from being able to do that in the first place ;)
(In reply to Jóhann B. Guðmundsson from comment #22) > (In reply to Harald Hoyer from comment #21) > > (In reply to lysbleu from comment #20) > > > It appears that removing the content of /var/log and masking > > > systemd-journal.service greatly improves the boot time. > > > > You must not mask systemd-journal.service! Otherwise you won't get any log > > messages from your system! > > Perhaps it's best for us to prevent users from being able to do that in the > first place ;) Or maybe prevent users from wanting to do that instead :P
(In reply to lysbleu from comment #20) > It appears that removing the content of /var/log and masking > systemd-journal.service greatly improves the boot time. James Ettle and others (in some other BZs) already said that removing /var/log/journal improves the boot speed for them. Are you saying that masking systemd-journald.service brings an additional speedup on top of that?
I'd like to ask you folks affected by the bug for: 1. some more of "journalctl -b" logs, but with chronyd.service (and any other NTP clients) disabled - to make sure we are not getting confused by possible jumps of the realtime clock. 2. SVG charts generated by systemd-bootchart (see its manpage for what to add to the kernel command line) and by "systemd-analyze plot > sa.svg"
Will do upon next reboot (likely tomorrow). The manpage says it will collect for 30 seconds by default. How can I increase that time, because it will most likely not be sufficient to capture the whole boot (the current one having taken 1min 30s). Is it sufficient to edit /etc/systemd/bootchart.conf or do I have to change something in the initramfs? I'll also disable chronyd.service in that run, I hope that's okay.
I tried to append init=/usr/lib/systemd/systemd-bootchart to my linuxefi command line. If I do that, the boot freezes (for at least 3 minutes) after [ OK ] Recreate Volatile Files and Directories... I found no way around that, is there anything obvious I could have done wrong?
(In reply to Michal Schmidt from comment #24) > (In reply to lysbleu from comment #20) > > It appears that removing the content of /var/log and masking > > systemd-journal.service greatly improves the boot time. > > James Ettle and others (in some other BZs) already said that removing > /var/log/journal improves the boot speed for them. Are you saying that > masking systemd-journald.service brings an additional speedup on top of that? Actually no, it was a mistake and unmasking it did not add any boot time. Removing /var/log/journal is enough.
(In reply to Michal Schmidt from comment #25) > 2. SVG charts generated by systemd-bootchart (see its manpage for what to > add to the kernel command line) and by "systemd-analyze plot > sa.svg" systemd-bootchart stalls everything for me too. If i understand the man page correctly i set it to 15000 samples at 25Hz which is 10 mins. I gave it 15 mins to move but nothing. I don't know if it is related or not but it apparently cannot open "/proc/schedstat". Will post the other items when i next reboot.
Just to clarify how does this bug slow down the other services, it seems the processes are blocked in the sendto() system call when they are sending messages to the system logger. Would a larger buffer on the receiving side help? From a chronyd strace output: 10:33:16.982187 socket(PF_LOCAL, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 0 10:33:16.982228 connect(0, {sa_family=AF_LOCAL, sun_path="/dev/log"}, 110) = 0 10:33:16.982267 sendto(0, "<30>Sep 02 10:33:16 chronyd[373]"..., 63, MSG_NOSIGNAL, NULL, 0) = 63 10:33:33.642469 open("/etc/chrony.conf", O_RDONLY) = 1 It would be nice if the systemd-journald service was marked as started only when it's fully initialized and ready to accept the log messages. This would make the bootchart logs more clear.
I have another data point, although I'm not entirely sure it's related. Booting my machine today, systemd-logind failed to start due to timeouts (I am currently booting my system mit verbose=1): Sep 05 12:11:59 zion systemd-logind[769]: New seat seat0. Sep 05 12:11:59 zion systemd-logind[769]: Watching system buttons on /dev/input/event1 (Power Button) Sep 05 12:11:59 zion systemd-logind[769]: Watching system buttons on /dev/input/event0 (Power Button) Sep 05 12:11:59 zion systemd-logind[695]: Failed to get system D-Bus connection: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. Sep 05 12:11:59 zion systemd-logind[695]: Failed to fully start up daemon: Connection refused Sep 05 12:11:59 zion systemd[1]: Unit systemd-logind.service entered failed state. Sep 05 12:11:59 zion systemd[1]: systemd-logind.service holdoff time over, scheduling restart. Similar timeout messages come up for different services: Sep 05 12:11:59 zion dbus-daemon[721]: dbus[721]: [system] Activating systemd to hand-off: service name='org.freedesktop.ColorManager' unit='colord.service' Sep 05 12:11:59 zion dbus-daemon[721]: dbus[721]: [system] Failed to activate service 'org.freedesktop.ColorManager': timed out […] Sep 05 12:12:24 zion dbus-daemon[721]: dbus[721]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out Sep 05 12:12:24 zion dbus-daemon[721]: dbus[721]: [system] Failed to activate service 'org.freedesktop.ColorManager': timed out Sep 05 12:12:24 zion dbus-daemon[721]: dbus[721]: [system] Failed to activate service 'org.freedesktop.Accounts': timed out At this point, I reset the machine. The next boot is different and surprisingly fast. My suspicion is that it correlates with the truncated journal. Oddly, I cannot find the message from journald about renaming and recreating the journal file after truncation (due to reset) in the journalctl logs. The boot time was way better than anything in the past: Startup finished in 2.288s (kernel) + 15.886s (initrd) + 44.452s (userspace) = 1min 2.627s Another datapoint: When searching for the logs for this comment, journalctl blocked nearly one minute until showing output for journalctl --since today --full. During that time, it did some I/O on the hard drive, mainly reading, but incredibly slow (only 600kByte to 1 MByte/s; measured with conky). The drive is able to do up to 140 MByte/s. So either the files accessed by journalctl are horribly fragmented in the filesystem (btrfs), journalctl accesses lots of different files or the file format is fragmented itself, causing lots of expensive seeks. For consecutive reads, the effective read rate is easily going up to 60 or even over 100 MByte/s, sometimes more. journalctl says it will keep up to 4 GBytes of journal in one file: Sep 05 12:16:25 zion systemd-journal[424]: Allowing system journal files to grow to 4.0G. Is it possible to lower that number? I could imagine that it's just too large to be performant. Does journald load the whole syslog into memory upon boot or iterate over it for any other reason?
The same here. I have a fast system with a SSD-drive. The system could easily boot under 20 secs. But fedora 19 takes sometimes 30 secs, sometimes 40 secs and sometimes it doesn't run to login screen (then I could only switch to another tty and login in textmode). I have installed smartmontools and lm-sensors (with all sensors are checked). Maybe "smartd" takes a long time (see log). But there must other problems exist. The plymouth booting sequence (the little fedora logo in the middle) doesn't appear in 95% of booting sessions. Only a black screen appears and after a long time the Login manager is seen. And sometimes It remains black (I waited sometimes 5 min.). I attached the logs for 5 booting sessions: session 1: ca. 48 secs (graphical login was possible) session 2: ca. 52 secs (no graphical login screen) session 3: ca. 37 secs (graphical login was possible) session 4: ca. 45 secs (no graphical login screen) session 5: ca. 37 secs (graphical login was possible) and the lsmod-output. If there is no login screen visible, the "gdm.service" takes long. If graphical login is possible the "gdm.service" takes under 1 secs. So I see the following problematic things: - plymouth-quit-wait.service - accounts-daemon.service - chronyd.service - firewalld.service - smartd - lm-sensors - gnome window manager (-> in the logs is the german word "Fensterverwalter" Maybe one of these are problematic.
Created attachment 794734 [details] Rolle: system-analyze blame Session1
Created attachment 794735 [details] Rolle: system-analyze blame Session2
Created attachment 794736 [details] Rolle: system-analyze blame Session3
Created attachment 794737 [details] Rolle: system-analyze blame Session4
Created attachment 794738 [details] Rolle: system-analyze blame Session5
Created attachment 794740 [details] Rolle: system-analyze Session2
Created attachment 794741 [details] Rolle: system-analyze Session3
Created attachment 794742 [details] Rolle: system-analyze Session4
Created attachment 794743 [details] Rolle: system-analyze Session5
Created attachment 794744 [details] Rolle: journalctl -b Session1
Created attachment 794745 [details] Rolle: journalctl -b Session2
Created attachment 794746 [details] Rolle: journalctl -b Session3
Created attachment 794747 [details] Rolle: journalctl -b Session4
Created attachment 794748 [details] Rolle: journalctl -b Session5
Created attachment 794750 [details] Rolle: lsmod
I'd really like to know what happens inside the dbus daemon when the boot is that slow. Anybody can get me an strace -tt of dbus-daemon when this happens?
*** Bug 981669 has been marked as a duplicate of this bug. ***
*** Bug 985076 has been marked as a duplicate of this bug. ***
bug 1006386 and bug 1005399 are likely the same problem. Something appears to be stalling polkit/accounts-daemon/dbus/journald for a long time, I wonder though what it might be. I'd be thankful for any reports what these processes might be working on when the system stalls. I'd be particularly interested in strace -tt of these processes while they block and until thinks continue again. Also good would be the gdb full bt of those processes then. Are you guys using anything weird NSS-related like IPA/sssd on the systems where this happens?
Also: https://bugs.freedesktop.org/show_bug.cgi?id=68217 Can this be an interaction between systemd-logind providing service only after subscribing to "org.freedesktop.systemd1.Manager" but "org.freedesktop.systemd1.Manager" not available for some time?
I tried running strace on dbus-daemon by adding '/bin/strace ...' to the dbus.service file. This causes system to fail to boot, as dbus-daemon dies. Any hints on how to do this?
Actual 'ExecStart' line I used: ExecStart=/bin/strace -tt -o /var/tmp/dbus-strace /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
So, here's my wild but educated guess what is going on: journald is written carefully to never require any other service to be running. This is because most services connect to it to log something, and if journald would ever block on some IPC call to another service and that service would then log something back to journald and block on that to finish we'd run into a deadlock. Now, even though journald is carefully written that way there's one place where an IPC call can sneak in: we try to resolve the "systemd-journal" group name to a GID (with getgrid()) to that we can chgrp() all journal files to it. In most setups that is without risk, since getgrid() will load the "files" NSS module which will just go to /etc/group and return the requested data. However, in some setups the NSS lookup might actually involve IPC, and that's cases where sssd/IPA or suchlike are used, or where nscd is enable. Now, this will not necessarily always result in a deadlock, since logging to journald is asynchronous which means as long as there is room in the log socket buffer no deadlock will happen. But if a lot of things happen at the same time the log socket buffer is easily full (since on Linux it will only take 15 datagrams or so by default) and there you go. Now, the question is what to do about this. We could just hardcode the resolved GID into journald. This would work fine on Fedora since the GID is assigned fixed anyway, but would still be ugly. Or we could require that /var/log/journal is owned by the group, and then simply read the GID from the dir before applying it to the journal files... Neither of of those solutions I particularly like I must say. Anyway, I figure if you guys remove nscd/sssd/IPA/... from your NSS setup things will not stall anymore. Could you test this?
Created attachment 797390 [details] Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal and with nscd disabled (debug_level=debug) Not sure if this is exactly what you requested, but I disabled nscd.service and rebooted (sssd.service was already disabled, and I don't believe I have any IPA stuff other than sssd-ipa on my system). The result seems similar to other "20 second slower boots".
Lennart, any idea, why the timestamps are so mixed? [ 60.521021] tlondon.localhost.org colord[516]: Executing: /usr/libexec/colord [ 21.854473] tlondon.localhost.org systemd[1]: Started Security Auditing Service. [ 21.856712] tlondon.localhost.org systemd[1]: Accepted connection on private bus. [ 21.856967] tlondon.localhost.org systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent [ 21.857056] tlondon.localhost.org systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local [ 22.172371] tlondon.localhost.org systemd[1]: Received SIGCHLD from PID 415 (systemd-tmpfile). [ 22.174503] tlondon.localhost.org systemd[1]: Got SIGCHLD for process 415 (systemd-tmpfile) [ 60.703582] tlondon.localhost.org sshd[531]: Server listening on 0.0.0.0 port 22. [ 60.703941] tlondon.localhost.org sshd[531]: Server listening on :: port 22. [ 22.176395] tlondon.localhost.org systemd[1]: Child 415 died (code=exited, status=0/SUCCESS)
Hmm, so guys, can you attach the full output of "strace -o getent-group getent group systemd-journal", please, on a system where the delay happens at boot? This should tell us if the NSS lookup results in IPC!
(In reply to Tom London from comment #56) > Not sure if this is exactly what you requested, but I disabled nscd.service > and rebooted (sssd.service was already disabled, and I don't believe I have > any IPA stuff other than sssd-ipa on my system). Please also disable nscd.socket, or de-install the entire package, and try again. I guess nscd cannot be socket activated, without gross hacks that would be needed in systemd code. The nscd.socket probably needs to be entirely removed from the nscd package.
(In reply to Kay Sievers from comment #59) > I guess nscd cannot be socket activated, without gross hacks that would > be needed in systemd code. The nscd.socket probably needs to be entirely > removed from the nscd package. Nah, I think systemd must be capable of supporting nscd being socket activatable. That's something to fix on our side. No finger pointing here please except on ourselves.
(In reply to Harald Hoyer from comment #57) > Lennart, any idea, why the timestamps are so mixed? Possibly because messages have been for quite some time in the queue. To make more sense of this it would be interesting to see "-o verbose" output of that stream fragment which will show timestamps when systemd got the messages and when they were enqueued.
(In reply to Tom London from comment #53) > I tried running strace on dbus-daemon by adding '/bin/strace ...' to the > dbus.service file. > > This causes system to fail to boot, as dbus-daemon dies. > > Any hints on how to do this? See https://bugzilla.redhat.com/show_bug.cgi?id=1006386#c18
I posted output of 'strace dbus-daemon', 'systemd-analyze plot', and 'journalctl -o short-monotonic -ab' to https://bugzilla.redhat.com/show_bug.cgi?id=1006386
(In reply to Lennart Poettering from comment #61) > (In reply to Harald Hoyer from comment #57) > > Lennart, any idea, why the timestamps are so mixed? > > Possibly because messages have been for quite some time in the queue. To > make more sense of this it would be interesting to see "-o verbose" output > of that stream fragment which will show timestamps when systemd got the > messages and when they were enqueued. Attaching below output of 'journalctl -o verbose -ab' when 'journalctl -o short-monotonic -ab' showed miss sorted entries.
Created attachment 797658 [details] Output of "journalctl -verbose -ab"; booted with journal, removed nscd, debug_level=debug
Big improvement in bootspeed since last software update. from 35sec tot 20sec. No big stalls anymore. Added bootplot
Created attachment 801323 [details] Boot plot systemd-analyze plot
*** Bug 1005399 has been marked as a duplicate of this bug. ***
With kdm, my computers would just hang, no graphical, sometimes no ability to switch to vt2, even. I had to reboot endlessly to get to the kdm screen on the desktop and on the laptop, the only solution was the hack of removing /var/log/journal and booting into multi-user.target and running startx :-( Since my switch to gdm and the latest systemd update, both my laptop (i686) and my desktop (x86_64) computers boot to graphical on the first try! I still have the subjective impression that boot is slow and the drives are working a lot more than they used to, but I can use my computers normally without resorting to hacks :-)
I am looking at nearly a 2 minute startup on ext4. Startup finished in 830ms (kernel) + 10.623s (initrd) + 1min 31.282s (userspace) = 1min 42.736s 20.002s plymouth-quit-wait.service 17.646s NetworkManager.service 17.130s accounts-daemon.service 16.981s haveged.service 16.948s rtkit-daemon.service 14.319s gdm.service 2.916s akmods.service 2.900s systemd-cryptsetup@luks\x2dbc1e37d2\x2d243b\x2d4f9b\x2d91d0\x2df8fa0632a0ca.service 2.758s akmods-shutdown.service 1.589s systemd-fsck-root.service 1.405s fedora-loadmodules.service 1.303s lvm2-monitor.service 1.152s systemd-fsck@dev-disk-by\x2duuid-965cbc1b\x2d83ec\x2d4ca5\x2d99a6\x2d65d03c9649bb.service 1.052s chronyd.service 976ms dev-disk-by\x2duuid-106ad9a0\x2d39bf\x2d44eb\x2da085\x2dd324f3e151be.swap 719ms systemd-binfmt.service 652ms psacct.service 629ms systemd-udev-trigger.service 619ms sys-kernel-debug.mount 619ms dev-hugepages.mount 619ms dev-mqueue.mount 581ms proc-sys-fs-binfmt_misc.mount 505ms tmp.mount 470ms sys-kernel-config.mount 458ms lm_sensors.service 405ms plymouth-start.service 356ms systemd-ask-password-wall.service 331ms systemd-udevd.service 325ms boot.mount 323ms systemd-random-seed-load.service 295ms systemd-user-sessions.service 295ms ip6tables.service 295ms iptables.service 295ms atieventsd.service 294ms mcelog.service 288ms abrt-ccpp.service 287ms systemd-logind.service 232ms systemd-readahead-replay.service 232ms systemd-readahead-collect.service 196ms systemd-remount-fs.service 186ms systemd-sysctl.service 181ms polkit.service 120ms systemd-fsck@dev-disk-by\x2duuid-ef7e08a0\x2d9ca2\x2d43e2\x2da275\x2dece3eddff8b9.service 105ms auditd.service 102ms boot-efi.mount 89ms colord.service 81ms systemd-tmpfiles-setup.service 67ms udisks2.service 63ms plymouth-read-write.service 26ms home.mount 11ms sshd.service 8ms fedora-readonly.service 7ms rpcbind.service 6ms systemd-readahead-done.service 6ms NetworkManager-dispatcher.service 5ms lvm2-lvmetad.service 5ms upower.service 4ms systemd-tmpfiles-setup-dev.service 4ms systemd-update-utmp-runlevel.service 3ms systemd-cryptsetup@luks\x2d1d184e06\x2dc1c6\x2d4bc0\x2db980\x2dd862a37cbcfc.service 3ms systemd-localed.service 3ms systemd-journal-flush.service 3ms systemd-vconsole-setup.service 2ms sys-fs-fuse-connections.mount
Created attachment 803028 [details] josephwagner-boot.xml
Created attachment 803029 [details] josephwagner-journal-boutput.txt
I regret to report that my previous relief is short-lived. I have been unable to boot my desktop computer just now. I tried 2-3 times and no gdm ever appeared. I had no choice but to resort to the hack of removing /var/log/journal.
*** Bug 1011895 has been marked as a duplicate of this bug. ***
I had an error in booting on F19 today and slow boot as well. Due to other issues I was directed to delete /var/log/journal and try. It fixed most issues and boot time seems much faster. May be worth a try.
I installed the ssd and kernel updates from this morning on both computers. I have had /var/log/journal in place on both computers for a number of days. Things are really looking good: I hope I haven't spoken too soon ;-). Desktop booted to gdm right away on my f19 kde system; laptop booted normally on my f19 kde system on the first try.
Attaching my own data point, with 2 SVG boot plots booting the same 3.12.0-rc4 kernel on ext4, before and after cp'ing /dev/null on /var/log/journal/system.journal and erasing old user journal files. Difference: 28"4 vs 65"9 boot time. All of that time difference is spent with the disk LED blinking frantically. Before cleanup: [root@xbox ~]# ls -l /var/log/journal.bak/697c9225e38c4cc3b5666e109076911d/ total 140968 -rw-r-----+ 1 root systemd-journal 4005888 Sep 16 17:45 system~ -rw-r-----+ 1 root systemd-journal 118161408 Oct 8 15:34 system.journal -rw-r-----+ 1 root systemd-journal 3796992 Aug 26 22:32 user-1000.journal -rw-r-----+ 1 root systemd-journal 3735552 Sep 16 17:46 user-41~ -rw-r-----+ 1 root systemd-journal 6971392 Oct 8 15:30 user-41.journal -rw-r-----+ 1 root systemd-journal 3747840 Aug 26 22:32 user-42.journal -rw-r-----+ 1 root systemd-journal 3756032 Aug 26 19:02 user-992.journal After cleanup, one reboot (testing without avahi-daemon/bluetooth services) + 28"4 boot (reinstating avahi-daemon/bluetooth): [root@xbox ~]# ls -l /var/log/journal/697c9225e38c4cc3b5666e109076911d/ total 11548 -rw-r-----+ 1 root systemd-journal 4825088 Oct 8 15:50 system.journal -rw-r-----+ 1 root systemd-journal 6983680 Oct 8 15:48 user-41.journal
Created attachment 809392 [details] ASuardi: SVG before truncating journal
Created attachment 809393 [details] ASuardi: SVG after truncating journal
Clearing the journal resolved the issue for me. # rm -fr -- /var/log/journal/* Before fix: Startup finished in 830ms (kernel) + 10.623s (initrd) + 1min 31.282s (userspace) = 1min 42.736s After fix: Startup finished in 816ms (kernel) + 11.194s (initrd) + 22.596s (userspace) = 34.606s
just to say that I've been experiencing this problem after a clean install of Fedora 19. I've not tried # rm -fr -- /var/log/journal/* it would be good to know if we have figured out what the issue is and if removing the journal logs works why this is the case and if it is a real fix or just helps mask the underlying problem.
The problem was entirely resolved for me, ie., normal boot without removing the journal, but, today, after updating to systemd-204-17, the problem has returned, insofar as it was necessary to boot twice to get to gdm.
I believe I'm seeing the problem as described in comment #55. The top few results from systemd-analyze blame are below. The requested strace will be attached after this comment. I'm running the pre-release of Fedora 20, but I was seeing the same thing under 19, only worse. Clearing /var/log/journal made the boot time bearable (~30 sec as compared to 2+ min), but it's still not what it should be. I do run a FreeIPA server and the affected machine is a client to it. NSCD is not installed, but ipa through sssd is. I don't have the time to remove IPA and test right now, but I can later if it is still needed. 16.755s firewalld.service 9.955s sssd.service 9.673s accounts-daemon.service 7.281s rtkit-daemon.service 5.781s systemd-logind.service 2.524s rsyslog.service 2.305s NetworkManager.service 1.880s chronyd.service 1.581s home.mount 1.518s abrt-ccpp.service 1.442s certmonger.service 1.402s polkit.service 1.207s systemd-tmpfiles-setup.service 1.154s systemd-tmpfiles-setup-dev.service 1.128s systemd-update-utmp.service 1.100s mcelog.service
Created attachment 835580 [details] strace of getent group
Is it possible that this is, at least partially, a filesystem fragmentation issue? I am seeing over 20000 fragments for one journal file on my btrfs.
This bug looks like a duplicate of F20 bug #1006386 and its F19 clone in bug #1039200.
*** This bug has been marked as a duplicate of bug 1039200 ***