Bug 983688
|
Description
alexhultman
2013-07-11 17:48:58 UTC
What is the output of: systemctl status NetworkManager-wait-online.service ? NetworkManager-wait-online.service - Network Manager Wait Online Loaded: loaded (/usr/lib/systemd/system/NetworkManager-wait-online.service; disabled) Active: inactive (dead) OK. Can you attach the output of 'systemd-analyze blame'? Also provide the information if this is an fresh install or an upgrade. Thanks This is a clean install.
1min 16.366s firewalld.service
1min 16.364s avahi-daemon.service
1min 16.304s gdm.service
1min 15.741s bluetooth.service
21.448s udisks2.service
20.070s dkms_autoinstaller.service
20.002s plymouth-quit-wait.service
8.222s systemd-cryptsetup@luks\x2d374f532a\x2d2922\x2d45b6\x2d9420\x2dc49eb4c7d942.service
7.827s abrt-ccpp.service
7.640s bumblebee-nvidia.service
6.958s mcelog.service
4.573s systemd-cryptsetup@luks\x2d13f01f31\x2d0a40\x2d4bda\x2dbe29\x2de75833bb67f7.service
2.901s dmraid-activation.service
1.687s systemd-fsck@dev-disk-by\x2duuid-7527a9c0\x2d08f3\x2d407f\x2d9c8a\x2d065738ae9511.service
1.491s systemd-udev-settle.service
1.462s lm_sensors.service
1.346s fedora-loadmodules.service
1.319s systemd-binfmt.service
1.310s boot.mount
1.089s systemd-tmpfiles-setup.service
1.058s proc-sys-fs-binfmt_misc.mount
1.034s plymouth-read-write.service
973ms lvm2-monitor.service
934ms systemd-logind.service
929ms rtkit-daemon.service
924ms plymouth-start.service
858ms NetworkManager.service
850ms sys-kernel-debug.mount
850ms dev-mqueue.mount
849ms dev-hugepages.mount
807ms fedora-readonly.service
710ms tmp.mount
653ms systemd-sysctl.service
624ms boot-efi.mount
620ms sys-kernel-config.mount
583ms home.mount
569ms systemd-udev-trigger.service
434ms systemd-remount-fs.service
377ms sendmail.service
365ms accounts-daemon.service
347ms colord.service
287ms systemd-fsck-root.service
260ms lvm2-lvmetad.service
257ms systemd-tmpfiles-clean.service
242ms systemd-udevd.service
188ms chronyd.service
183ms systemd-readahead-replay.service
183ms systemd-readahead-collect.service
181ms systemd-journal-flush.service
113ms systemd-fsck@dev-disk-by\x2duuid-4aa2b89d\x2dce0d\x2d41b4\x2db23f\x2dccf3db91a2f9.service
110ms rpcbind.service
87ms sm-client.service
48ms dev-mapper-luks\x2d13f01f31\x2d0a40\x2d4bda\x2dbe29\x2de75833bb67f7.swap
36ms polkit.service
19ms systemd-readahead-done.service
14ms livesys.service
13ms upower.service
9ms wpa_supplicant.service
9ms auditd.service
8ms systemd-user-sessions.service
7ms systemd-ask-password-wall.service
7ms systemd-tmpfiles-setup-dev.service
6ms livesys-late.service
5ms systemd-update-utmp-runlevel.service
3ms systemd-vconsole-setup.service
3ms sys-fs-fuse-connections.mount
2ms systemd-random-seed-load.service
I am experiencing the same problem with an upgraded fedora 18 -> 19. This is the output of systemd-analyze blame:
1min 24.583s firewalld.service
1min 24.354s avahi-daemon.service
1min 2.235s lightdm.service
22.109s akmods.service
21.929s akmods-shutdown.service
12.300s systemd-cryptsetup@luks\x2d4b479d38\x2d5462\x2d4ad9\x2d9c25\x2d784f8fb8b1d3.service
8.745s NetworkManager-wait-online.service
2.981s rtkit-daemon.service
2.980s systemd-logind.service
2.424s systemd-tmpfiles-clean.service
1.785s sysstat.service
1.498s lm_sensors.service
1.279s systemd-tmpfiles-setup.service
1.183s chronyd.service
1.128s fedora-loadmodules.service
1.022s udisks2.service
959ms systemd-binfmt.service
938ms abrt-ccpp.service
877ms colord.service
870ms proc-sys-fs-binfmt_misc.mount
857ms console-kit-log-system-start.service
768ms NetworkManager.service
762ms sys-kernel-debug.mount
762ms dev-mqueue.mount
761ms dev-hugepages.mount
736ms plymouth-read-write.service
690ms auditd.service
672ms systemd-cryptsetup@luks\x2daf0ce3bf\x2d8fdf\x2d413f\x2dab89\x2dd43efba3883f.service
658ms systemd-modules-load.service
636ms systemd-tmpfiles-setup-dev.service
636ms fedora-readonly.service
589ms dev-disk-by\x2duuid-86723b87\x2d4884\x2d4079\x2d8575\x2d4ab319ac83ae.swap
555ms sendmail.service
528ms plymouth-start.service
417ms systemd-fsck@dev-disk-by\x2duuid-ba53da2a\x2d20b7\x2d410b\x2d9f2b\x2d5b1518568eda.service
416ms systemd-udev-trigger.service
394ms mcelog.service
381ms systemd-remount-fs.service
331ms systemd-readahead-done.service
286ms plymouth-quit.service
286ms plymouth-quit-wait.service
253ms mnt-h320.mount
247ms systemd-user-sessions.service
188ms dev-disk-by\x2duuid-3f8a1a86\x2dbbdb\x2d48de\x2d9ef0\x2de88cca658e63.swap
150ms boot.mount
126ms systemd-readahead-replay.service
86ms systemd-fsck-root.service
82ms sm-client.service
62ms polkit.service
39ms systemd-readahead-collect.service
38ms systemd-udevd.service
37ms systemd-random-seed-load.service
37ms upower.service
36ms bluetooth.service
26ms console-kit-daemon.service
25ms sshd.service
22ms accounts-daemon.service
17ms rpcbind.service
12ms systemd-update-utmp-runlevel.service
12ms systemd-journal-flush.service
8ms sys-fs-fuse-connections.mount
5ms systemd-vconsole-setup.service
4ms systemd-sysctl.service
4ms boot-efi.mount
2ms home-horazont-tmp.mount
1ms sys-kernel-config.mount
1ms tmp.mount
Additionally, I plotted the startup with systemd-analyze plot and notice that avahi-daemon, firewalld and lightdm finished at the very same instant, as if they blocked for something very long (each other possibly?).
If you disable firewalld.service and avahi-daemon.service, does the rest of the system then boot fast? You both use some 3rd party kernel modules (using dkms and akmods, respectively). Is the problem reproducible without them? There may be some hints in the logs. Please attach the output of "journalctl -b". Turns out, disabling avahi-daemon and firewalld was not a good idea. Cannot get a login screen right now, I'll try harder. No ttys either, systemd-logind complained about some timeout with a timestamp of approximately 60 in front of it. Grub doesn't allow me to edit commands (it freezes after I press `e') and grml isn't able to mount my btrfs… I guess I'll be poking at that one for a longer time… Update: Booting the shiny new fedora 19 rescue boot entry got me a graphical login. It also booted much faster than usual. lightdm.service only takes 65ms to initialize, the usual times for the other jobs. I reenabled avahi-daemon and firewalld, but disabled akmods and akmods-shutdown for the next test. Here, the boot was also nice and fast, with firewalld taking 17 seconds to initialize (systemd-analyze blame) and avahi-daemon taking approx. 8 seconds. Note that the modules usually built by the akmod are nevertheless loaded, as no kernel version change happened. (In reply to Jonas Wielicki from comment #9) > Update: Booting the shiny new fedora 19 rescue boot entry got me a graphical > login. Then there seems to be something wrong with the (non-rescue) initramfs. Back it up and recreate it using dracut. Good to know it's not just me that suspects a big boot slowdown. I'm not using dkms or any other unusual stuff on startup. This was a fresh install from F19 Beta.
$ systemd-analyze blame
1min 25.678s firewalld.service
1min 25.572s avahi-daemon.service
1min 25.320s chronyd.service
20.008s plymouth-quit-wait.service
7.760s dnf-makecache.service
7.704s systemd-suspend.service
7.658s systemd-udev-settle.service
6.434s bluetooth.service
6.326s systemd-logind.service
6.307s rtkit-daemon.service
5.237s gdm.service
3.568s iprupdate.service
3.565s iprinit.service
2.819s NetworkManager-wait-online.service
2.268s dev-mqueue.mount
2.110s systemd-binfmt.service
1.984s tmp.mount
1.969s fedora-loadmodules.service
1.813s sys-kernel-config.mount
1.716s proc-sys-fs-binfmt_misc.mount
1.634s sys-kernel-debug.mount
1.632s dev-hugepages.mount
1.543s abrt-ccpp.service
1.537s mcelog.service
1.447s systemd-fsck@dev-disk-by\x2duuid-d1d4a8e3\x2d702b\x2d4e5f\x2d901d\x2dec5e7550f6f5.service
1.395s colord.service
1.361s systemd-remount-fs.service
1.165s systemd-tmpfiles-setup-dev.service
1.136s iprdump.service
1.043s lvm2-monitor.service
889ms systemd-user-sessions.service
879ms NetworkManager.service
878ms plymouth-start.service
827ms systemd-udev-trigger.service
753ms systemd-tmpfiles-clean.service
732ms dmraid-activation.service
720ms systemd-tmpfiles-setup.service
681ms dev-disk-by\x2duuid-f7d6841c\x2d9522\x2d437c\x2d8414\x2dc7940b3b4e3a.swap
641ms plymouth-read-write.service
556ms auditd.service
508ms fedora-readonly.service
488ms systemd-fsck-root.service
477ms systemd-journal-flush.service
446ms systemd-sysctl.service
382ms udisks2.service
341ms systemd-readahead-replay.service
340ms systemd-readahead-collect.service
260ms boot.mount
248ms sendmail.service
216ms accounts-daemon.service
215ms polkit.service
208ms sm-client.service
197ms systemd-udevd.service
187ms lvm2-lvmetad.service
178ms upower.service
143ms home.mount
127ms livesys.service
116ms workarounds.service
29ms systemd-readahead-done.service
10ms sshd.service
5ms systemd-vconsole-setup.service
5ms systemd-fsck@dev-disk-by\x2duuid-198daf9e\x2d6267\x2d43f7\x2db579\x2d0205d6790489.service
5ms livesys-late.service
5ms rpcbind.service
4ms wpa_supplicant.service
4ms systemd-update-utmp-runlevel.service
3ms sys-fs-fuse-connections.mount
2ms systemd-random-seed-load.service
James, Do you use btrfs as well? It's curious that you have systemd-suspend.service in the list. Have you suspended the machine while booting? I would still like to see the "journalctl -b" output. (In reply to Michal Schmidt from comment #12) > James, > Do you use btrfs as well? Yes. > It's curious that you have systemd-suspend.service in the list. Have you > suspended the machine while booting? Not while booting, but the machine has seen a couple of suspend/resume cycles. > I would still like to see the "journalctl -b" output. I'll post it below shortly. Created attachment 776001 [details]
journalctl -b output
Happening here (although in smaller measure):
21.343s accounts-daemon.service
21.328s firewalld.service
8.115s systemd-cryptsetup
5.001s dnf-makecache.service
1.132s fedora-loadmodules.service
1.070s plymouth-start.service
985ms boot-EFI.mount
949ms systemd-tmpfiles-clean.service
918ms systemd-binfmt.service
819ms NetworkManager.service
614ms fedora-readonly.service
555ms home.mount
550ms systemd-udev-trigger.service
548ms dev-mqueue.mount
548ms sys-kernel-debug.mount
547ms dev-hugepages.mount
546ms tmp.mount
545ms systemd-fsck@dev-disk-by\x2duuid-8D12\x2d7405.service
513ms proc-sys-fs-binfmt_misc.mount
480ms bluetooth.service
467ms systemd-logind.service
466ms abrt-ccpp.service
462ms lm_sensors.service
336ms rtkit-daemon.service
334ms systemd-fsck
305ms systemd-tmpfiles-setup.service
295ms colord.service
293ms systemd-remount-fs.service
249ms systemd-sysctl.service
245ms plymouth-read-write.service
205ms systemd-fsck-root.service
175ms systemd-journal-flush.service
168ms systemd-readahead-replay.service
127ms multi.mount
101ms upower.service
95ms chronyd.service
55ms udisks2.service
55ms plymouth-quit-wait.service
53ms plymouth-quit.service
34ms systemd-readahead-collect.service
33ms systemd-udevd.service
30ms sys-kernel-config.mount
26ms polkit.service
24ms sshd.service
16ms teamviewerd.service
14ms rpcbind.service
8ms gdm.service
8ms avahi-daemon.service
4ms wpa_supplicant.service
3ms auditd.service
3ms systemd-tmpfiles-setup-dev.service
2ms systemd-user-sessions.service
1ms systemd-update-utmp-runlevel.service
1ms systemd-random-seed-load.service
1ms systemd-readahead-done.service
979us systemd-vconsole-setup.service
On my other machine, also core i7, but running arch I get:
6.499s firewalld.service
3.133s avahi-daemon.service
3.119s systemd-logind.service
2.985s gdm.service
1.526s accounts-daemon.service
1.176s systemd-binfmt.service
1.158s NetworkManager.service
1.130s sys-kernel-debug.mount
1.125s systemd-fsck@dev-disk-by\x2duuid-0a83d5ea\x2deceb\x2d4fb9\x2db231\x2d36d7993d8e49.service
1.118s dev-hugepages.mount
1.116s dev-mqueue.mount
1.107s systemd-udev-trigger.service
1.094s polkit.service
1.042s systemd-remount-fs.service
1.038s tmp.mount
1.009s systemd-vconsole-setup.service
933ms proc-sys-fs-binfmt_misc.mount
923ms systemd-modules-load.service
828ms systemd-tmpfiles-setup-dev.service
494ms alsa-restore.service
446ms systemd-sysctl.service
393ms colord.service
318ms systemd-udevd.service
305ms boot.mount
286ms systemd-user-sessions.service
267ms systemd-update-utmp.service
166ms sys-kernel-config.mount
146ms rpi.mount
145ms udisks2.service
126ms systemd-fsck@dev-disk-by\x2duuid-5dd4a246\x2dc65c\x2d49c8\x2d9fd2\x2dff17e72a7036.service
104ms systemd-random-seed-load.service
62ms systemd-journal-flush.service
40ms nmbd.service
39ms rtkit-daemon.service
39ms upower.service
30ms systemd-tmpfiles-setup.service
22ms home.mount
17ms smbd.service
3ms home-celebdor-games.mount
2ms systemd-tmpfiles-clean.service
837us sys-fs-fuse-connections.mount
The 3x boot time on firewalld (without having touched any rules on the first machine (F19) seems excessive.
Since removing /var/log/journal to disable the persistent journal, boot times have gone down somewhat:
20.005s plymouth-quit-wait.service
19.555s firewalld.service
9.437s NetworkManager-wait-online.service
8.869s accounts-daemon.service
6.150s systemd-udev-settle.service
4.778s gdm.service
3.596s avahi-daemon.service
although they still seem rather long. Takes X/gdm quite a while to start up, too. I don't have a machine on ext4 at the moment to compare.
firewalld does seem slow to start, particularly if I drop the file caches beforehand:
# echo 3 > /proc/sys/vm/drop_caches
# time service firewalld restart
Redirecting to /bin/systemctl restart firewalld.service
real 0m11.483s
user 0m0.012s
sys 0m0.024s
Once it's all in memory, restarting takes under a second.
Same issue over the last few days wrt a Fresh Fedora 19 install on a Samsung laptop having Intel i5-3210M. Initial few days it used to boot slow. However now sometime it boots slow and some times it just hangs after boot with out showing the GDB login screen. Also I am not able to switch to a text based console by pressing Ctl+Alt+F2 or so ... when the Boot hangs.
NOTE: I also think the systemd configuration should be changed by default to allow the text console to start early atleast, I don't see any reason to block text console start also. So that atleast one can debug these issues. Removing rhgb and quiet from kernel command line does seem to allow the system to boot always (i.e avoid the hang I mentioned above), even thou it also blocks for the firewall, avahi and plymouth-quit-wait services to complete and thus takes time to boot into the GUI.
This is the output of one of the slow boots.
Startup finished in 3.766s (firmware) + 4.846s (loader) + 1.115s (kernel) + 3.356s (initrd) + 1min 5.017s (userspace) = 1min 18.102s
The output of systemd-analyse blame is
49.419s firewalld.service
49.406s avahi-daemon.service
20.006s plymouth-quit-wait.service
3.890s gdm.service
3.773s bluetooth.service
3.762s systemd-logind.service
3.755s rtkit-daemon.service
2.244s chronyd.service
2.103s lvm2-monitor.service
2.083s systemd-fsck@dev-disk-by\x2duuid-5aac0cb1\x2d8708\x2d4b95\x2d9cc7\x2df43922e95244.service
2.021s abrt-ccpp.service
1.999s systemd-udev-settle.service
1.896s mcelog.service
1.804s sys-kernel-debug.mount
1.804s dev-mqueue.mount
1.785s dev-hugepages.mount
1.770s tmp.mount
1.163s plymouth-start.service
1.121s fedora-loadmodules.service
1.115s sys-kernel-config.mount
846ms systemd-tmpfiles-setup.service
815ms systemd-udev-trigger.service
779ms systemd-remount-fs.service
684ms fedora-readonly.service
655ms dev-disk-by\x2duuid-61e2cb71\x2d74a1\x2d4f63\x2d9400\x2d001f17ec3c83.swap
495ms plymouth-read-write.service
469ms colord.service
417ms lvm2-lvmetad.service
386ms systemd-readahead-replay.service
365ms systemd-user-sessions.service
356ms udisks2.service
344ms systemd-sysctl.service
340ms systemd-fsck-root.service
338ms livesys-late.service
309ms home.mount
303ms NetworkManager.service
264ms auditd.service
192ms sendmail.service
175ms boot.mount
161ms systemd-localed.service
158ms upower.service
111ms systemd-readahead-collect.service
80ms accounts-daemon.service
52ms boot-efi.mount
25ms polkit.service
21ms sm-client.service
17ms systemd-udevd.service
17ms dmraid-activation.service
13ms rpcbind.service
13ms livesys.service
8ms systemd-tmpfiles-setup-dev.service
7ms systemd-journal-flush.service
6ms systemd-readahead-done.service
5ms systemd-vconsole-setup.service
5ms systemd-fsck@dev-disk-by\x2duuid-6fcf2ebf\x2dd6bb\x2d40f1\x2d81ca\x2d23771e21f21c.service
4ms systemd-random-seed-load.service
4ms systemd-update-utmp-runlevel.service
3ms sys-fs-fuse-connections.mount
Created attachment 782433 [details]
Another journalctl -b --until after boot finished
For comparision, another journalctl -b output, until after login. After login, it's dominated by munin-related setroubleshootd output.
Is there any progress on the investigation, anything we can do to help with this?
I have another log of a machine taking 40s to initialize gdm and five other services, would that be helpful? It appears that removing the content of /var/log and masking systemd-journal.service greatly improves the boot time. (In reply to lysbleu from comment #20) > It appears that removing the content of /var/log and masking > systemd-journal.service greatly improves the boot time. You must not mask systemd-journal.service! Otherwise you won't get any log messages from your system! (In reply to Harald Hoyer from comment #21) > (In reply to lysbleu from comment #20) > > It appears that removing the content of /var/log and masking > > systemd-journal.service greatly improves the boot time. > > You must not mask systemd-journal.service! Otherwise you won't get any log > messages from your system! Perhaps it's best for us to prevent users from being able to do that in the first place ;) (In reply to Jóhann B. Guðmundsson from comment #22) > (In reply to Harald Hoyer from comment #21) > > (In reply to lysbleu from comment #20) > > > It appears that removing the content of /var/log and masking > > > systemd-journal.service greatly improves the boot time. > > > > You must not mask systemd-journal.service! Otherwise you won't get any log > > messages from your system! > > Perhaps it's best for us to prevent users from being able to do that in the > first place ;) Or maybe prevent users from wanting to do that instead :P (In reply to lysbleu from comment #20) > It appears that removing the content of /var/log and masking > systemd-journal.service greatly improves the boot time. James Ettle and others (in some other BZs) already said that removing /var/log/journal improves the boot speed for them. Are you saying that masking systemd-journald.service brings an additional speedup on top of that? I'd like to ask you folks affected by the bug for: 1. some more of "journalctl -b" logs, but with chronyd.service (and any other NTP clients) disabled - to make sure we are not getting confused by possible jumps of the realtime clock. 2. SVG charts generated by systemd-bootchart (see its manpage for what to add to the kernel command line) and by "systemd-analyze plot > sa.svg" Will do upon next reboot (likely tomorrow). The manpage says it will collect for 30 seconds by default. How can I increase that time, because it will most likely not be sufficient to capture the whole boot (the current one having taken 1min 30s). Is it sufficient to edit /etc/systemd/bootchart.conf or do I have to change something in the initramfs? I'll also disable chronyd.service in that run, I hope that's okay. I tried to append init=/usr/lib/systemd/systemd-bootchart to my linuxefi command line. If I do that, the boot freezes (for at least 3 minutes) after [ OK ] Recreate Volatile Files and Directories... I found no way around that, is there anything obvious I could have done wrong? (In reply to Michal Schmidt from comment #24) > (In reply to lysbleu from comment #20) > > It appears that removing the content of /var/log and masking > > systemd-journal.service greatly improves the boot time. > > James Ettle and others (in some other BZs) already said that removing > /var/log/journal improves the boot speed for them. Are you saying that > masking systemd-journald.service brings an additional speedup on top of that? Actually no, it was a mistake and unmasking it did not add any boot time. Removing /var/log/journal is enough. (In reply to Michal Schmidt from comment #25) > 2. SVG charts generated by systemd-bootchart (see its manpage for what to > add to the kernel command line) and by "systemd-analyze plot > sa.svg" systemd-bootchart stalls everything for me too. If i understand the man page correctly i set it to 15000 samples at 25Hz which is 10 mins. I gave it 15 mins to move but nothing. I don't know if it is related or not but it apparently cannot open "/proc/schedstat". Will post the other items when i next reboot. Just to clarify how does this bug slow down the other services, it seems the processes are blocked in the sendto() system call when they are sending messages to the system logger. Would a larger buffer on the receiving side help?
From a chronyd strace output:
10:33:16.982187 socket(PF_LOCAL, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 0
10:33:16.982228 connect(0, {sa_family=AF_LOCAL, sun_path="/dev/log"}, 110) = 0
10:33:16.982267 sendto(0, "<30>Sep 02 10:33:16 chronyd[373]"..., 63, MSG_NOSIGNAL, NULL, 0) = 63
10:33:33.642469 open("/etc/chrony.conf", O_RDONLY) = 1
It would be nice if the systemd-journald service was marked as started only when it's fully initialized and ready to accept the log messages. This would make the bootchart logs more clear.
I have another data point, although I'm not entirely sure it's related. Booting my machine today, systemd-logind failed to start due to timeouts (I am currently booting my system mit verbose=1): Sep 05 12:11:59 zion systemd-logind[769]: New seat seat0. Sep 05 12:11:59 zion systemd-logind[769]: Watching system buttons on /dev/input/event1 (Power Button) Sep 05 12:11:59 zion systemd-logind[769]: Watching system buttons on /dev/input/event0 (Power Button) Sep 05 12:11:59 zion systemd-logind[695]: Failed to get system D-Bus connection: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. Sep 05 12:11:59 zion systemd-logind[695]: Failed to fully start up daemon: Connection refused Sep 05 12:11:59 zion systemd[1]: Unit systemd-logind.service entered failed state. Sep 05 12:11:59 zion systemd[1]: systemd-logind.service holdoff time over, scheduling restart. Similar timeout messages come up for different services: Sep 05 12:11:59 zion dbus-daemon[721]: dbus[721]: [system] Activating systemd to hand-off: service name='org.freedesktop.ColorManager' unit='colord.service' Sep 05 12:11:59 zion dbus-daemon[721]: dbus[721]: [system] Failed to activate service 'org.freedesktop.ColorManager': timed out […] Sep 05 12:12:24 zion dbus-daemon[721]: dbus[721]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out Sep 05 12:12:24 zion dbus-daemon[721]: dbus[721]: [system] Failed to activate service 'org.freedesktop.ColorManager': timed out Sep 05 12:12:24 zion dbus-daemon[721]: dbus[721]: [system] Failed to activate service 'org.freedesktop.Accounts': timed out At this point, I reset the machine. The next boot is different and surprisingly fast. My suspicion is that it correlates with the truncated journal. Oddly, I cannot find the message from journald about renaming and recreating the journal file after truncation (due to reset) in the journalctl logs. The boot time was way better than anything in the past: Startup finished in 2.288s (kernel) + 15.886s (initrd) + 44.452s (userspace) = 1min 2.627s Another datapoint: When searching for the logs for this comment, journalctl blocked nearly one minute until showing output for journalctl --since today --full. During that time, it did some I/O on the hard drive, mainly reading, but incredibly slow (only 600kByte to 1 MByte/s; measured with conky). The drive is able to do up to 140 MByte/s. So either the files accessed by journalctl are horribly fragmented in the filesystem (btrfs), journalctl accesses lots of different files or the file format is fragmented itself, causing lots of expensive seeks. For consecutive reads, the effective read rate is easily going up to 60 or even over 100 MByte/s, sometimes more. journalctl says it will keep up to 4 GBytes of journal in one file: Sep 05 12:16:25 zion systemd-journal[424]: Allowing system journal files to grow to 4.0G. Is it possible to lower that number? I could imagine that it's just too large to be performant. Does journald load the whole syslog into memory upon boot or iterate over it for any other reason? The same here. I have a fast system with a SSD-drive. The system could easily boot under 20 secs. But fedora 19 takes sometimes 30 secs, sometimes 40 secs and sometimes it doesn't run to login screen (then I could only switch to another tty and login in textmode). I have installed smartmontools and lm-sensors (with all sensors are checked). Maybe "smartd" takes a long time (see log). But there must other problems exist. The plymouth booting sequence (the little fedora logo in the middle) doesn't appear in 95% of booting sessions. Only a black screen appears and after a long time the Login manager is seen. And sometimes It remains black (I waited sometimes 5 min.). I attached the logs for 5 booting sessions: session 1: ca. 48 secs (graphical login was possible) session 2: ca. 52 secs (no graphical login screen) session 3: ca. 37 secs (graphical login was possible) session 4: ca. 45 secs (no graphical login screen) session 5: ca. 37 secs (graphical login was possible) and the lsmod-output. If there is no login screen visible, the "gdm.service" takes long. If graphical login is possible the "gdm.service" takes under 1 secs. So I see the following problematic things: - plymouth-quit-wait.service - accounts-daemon.service - chronyd.service - firewalld.service - smartd - lm-sensors - gnome window manager (-> in the logs is the german word "Fensterverwalter" Maybe one of these are problematic. Created attachment 794734 [details]
Rolle: system-analyze blame Session1
Created attachment 794735 [details]
Rolle: system-analyze blame Session2
Created attachment 794736 [details]
Rolle: system-analyze blame Session3
Created attachment 794737 [details]
Rolle: system-analyze blame Session4
Created attachment 794738 [details]
Rolle: system-analyze blame Session5
Created attachment 794740 [details]
Rolle: system-analyze Session2
Created attachment 794741 [details]
Rolle: system-analyze Session3
Created attachment 794742 [details]
Rolle: system-analyze Session4
Created attachment 794743 [details]
Rolle: system-analyze Session5
Created attachment 794744 [details]
Rolle: journalctl -b Session1
Created attachment 794745 [details]
Rolle: journalctl -b Session2
Created attachment 794746 [details]
Rolle: journalctl -b Session3
Created attachment 794747 [details]
Rolle: journalctl -b Session4
Created attachment 794748 [details]
Rolle: journalctl -b Session5
Created attachment 794750 [details]
Rolle: lsmod
I'd really like to know what happens inside the dbus daemon when the boot is that slow. Anybody can get me an strace -tt of dbus-daemon when this happens? *** Bug 981669 has been marked as a duplicate of this bug. *** *** Bug 985076 has been marked as a duplicate of this bug. *** bug 1006386 and bug 1005399 are likely the same problem. Something appears to be stalling polkit/accounts-daemon/dbus/journald for a long time, I wonder though what it might be. I'd be thankful for any reports what these processes might be working on when the system stalls. I'd be particularly interested in strace -tt of these processes while they block and until thinks continue again. Also good would be the gdb full bt of those processes then. Are you guys using anything weird NSS-related like IPA/sssd on the systems where this happens? Also: https://bugs.freedesktop.org/show_bug.cgi?id=68217 Can this be an interaction between systemd-logind providing service only after subscribing to "org.freedesktop.systemd1.Manager" but "org.freedesktop.systemd1.Manager" not available for some time? I tried running strace on dbus-daemon by adding '/bin/strace ...' to the dbus.service file. This causes system to fail to boot, as dbus-daemon dies. Any hints on how to do this? Actual 'ExecStart' line I used: ExecStart=/bin/strace -tt -o /var/tmp/dbus-strace /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation So, here's my wild but educated guess what is going on: journald is written carefully to never require any other service to be running. This is because most services connect to it to log something, and if journald would ever block on some IPC call to another service and that service would then log something back to journald and block on that to finish we'd run into a deadlock. Now, even though journald is carefully written that way there's one place where an IPC call can sneak in: we try to resolve the "systemd-journal" group name to a GID (with getgrid()) to that we can chgrp() all journal files to it. In most setups that is without risk, since getgrid() will load the "files" NSS module which will just go to /etc/group and return the requested data. However, in some setups the NSS lookup might actually involve IPC, and that's cases where sssd/IPA or suchlike are used, or where nscd is enable. Now, this will not necessarily always result in a deadlock, since logging to journald is asynchronous which means as long as there is room in the log socket buffer no deadlock will happen. But if a lot of things happen at the same time the log socket buffer is easily full (since on Linux it will only take 15 datagrams or so by default) and there you go. Now, the question is what to do about this. We could just hardcode the resolved GID into journald. This would work fine on Fedora since the GID is assigned fixed anyway, but would still be ugly. Or we could require that /var/log/journal is owned by the group, and then simply read the GID from the dir before applying it to the journal files... Neither of of those solutions I particularly like I must say. Anyway, I figure if you guys remove nscd/sssd/IPA/... from your NSS setup things will not stall anymore. Could you test this? Created attachment 797390 [details]
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal and with nscd disabled (debug_level=debug)
Not sure if this is exactly what you requested, but I disabled nscd.service and rebooted (sssd.service was already disabled, and I don't believe I have any IPA stuff other than sssd-ipa on my system).
The result seems similar to other "20 second slower boots".
Lennart, any idea, why the timestamps are so mixed? [ 60.521021] tlondon.localhost.org colord[516]: Executing: /usr/libexec/colord [ 21.854473] tlondon.localhost.org systemd[1]: Started Security Auditing Service. [ 21.856712] tlondon.localhost.org systemd[1]: Accepted connection on private bus. [ 21.856967] tlondon.localhost.org systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent [ 21.857056] tlondon.localhost.org systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local [ 22.172371] tlondon.localhost.org systemd[1]: Received SIGCHLD from PID 415 (systemd-tmpfile). [ 22.174503] tlondon.localhost.org systemd[1]: Got SIGCHLD for process 415 (systemd-tmpfile) [ 60.703582] tlondon.localhost.org sshd[531]: Server listening on 0.0.0.0 port 22. [ 60.703941] tlondon.localhost.org sshd[531]: Server listening on :: port 22. [ 22.176395] tlondon.localhost.org systemd[1]: Child 415 died (code=exited, status=0/SUCCESS) Hmm, so guys, can you attach the full output of "strace -o getent-group getent group systemd-journal", please, on a system where the delay happens at boot? This should tell us if the NSS lookup results in IPC! (In reply to Tom London from comment #56) > Not sure if this is exactly what you requested, but I disabled nscd.service > and rebooted (sssd.service was already disabled, and I don't believe I have > any IPA stuff other than sssd-ipa on my system). Please also disable nscd.socket, or de-install the entire package, and try again. I guess nscd cannot be socket activated, without gross hacks that would be needed in systemd code. The nscd.socket probably needs to be entirely removed from the nscd package. (In reply to Kay Sievers from comment #59) > I guess nscd cannot be socket activated, without gross hacks that would > be needed in systemd code. The nscd.socket probably needs to be entirely > removed from the nscd package. Nah, I think systemd must be capable of supporting nscd being socket activatable. That's something to fix on our side. No finger pointing here please except on ourselves. (In reply to Harald Hoyer from comment #57) > Lennart, any idea, why the timestamps are so mixed? Possibly because messages have been for quite some time in the queue. To make more sense of this it would be interesting to see "-o verbose" output of that stream fragment which will show timestamps when systemd got the messages and when they were enqueued. (In reply to Tom London from comment #53) > I tried running strace on dbus-daemon by adding '/bin/strace ...' to the > dbus.service file. > > This causes system to fail to boot, as dbus-daemon dies. > > Any hints on how to do this? See https://bugzilla.redhat.com/show_bug.cgi?id=1006386#c18 I posted output of 'strace dbus-daemon', 'systemd-analyze plot', and 'journalctl -o short-monotonic -ab' to https://bugzilla.redhat.com/show_bug.cgi?id=1006386 (In reply to Lennart Poettering from comment #61) > (In reply to Harald Hoyer from comment #57) > > Lennart, any idea, why the timestamps are so mixed? > > Possibly because messages have been for quite some time in the queue. To > make more sense of this it would be interesting to see "-o verbose" output > of that stream fragment which will show timestamps when systemd got the > messages and when they were enqueued. Attaching below output of 'journalctl -o verbose -ab' when 'journalctl -o short-monotonic -ab' showed miss sorted entries. Created attachment 797658 [details]
Output of "journalctl -verbose -ab"; booted with journal, removed nscd, debug_level=debug
Big improvement in bootspeed since last software update. from 35sec tot 20sec. No big stalls anymore. Added bootplot Created attachment 801323 [details]
Boot plot systemd-analyze plot
*** Bug 1005399 has been marked as a duplicate of this bug. *** With kdm, my computers would just hang, no graphical, sometimes no ability to switch to vt2, even. I had to reboot endlessly to get to the kdm screen on the desktop and on the laptop, the only solution was the hack of removing /var/log/journal and booting into multi-user.target and running startx :-( Since my switch to gdm and the latest systemd update, both my laptop (i686) and my desktop (x86_64) computers boot to graphical on the first try! I still have the subjective impression that boot is slow and the drives are working a lot more than they used to, but I can use my computers normally without resorting to hacks :-) I am looking at nearly a 2 minute startup on ext4.
Startup finished in 830ms (kernel) + 10.623s (initrd) + 1min 31.282s (userspace) = 1min 42.736s
20.002s plymouth-quit-wait.service
17.646s NetworkManager.service
17.130s accounts-daemon.service
16.981s haveged.service
16.948s rtkit-daemon.service
14.319s gdm.service
2.916s akmods.service
2.900s systemd-cryptsetup@luks\x2dbc1e37d2\x2d243b\x2d4f9b\x2d91d0\x2df8fa0632a0ca.service
2.758s akmods-shutdown.service
1.589s systemd-fsck-root.service
1.405s fedora-loadmodules.service
1.303s lvm2-monitor.service
1.152s systemd-fsck@dev-disk-by\x2duuid-965cbc1b\x2d83ec\x2d4ca5\x2d99a6\x2d65d03c9649bb.service
1.052s chronyd.service
976ms dev-disk-by\x2duuid-106ad9a0\x2d39bf\x2d44eb\x2da085\x2dd324f3e151be.swap
719ms systemd-binfmt.service
652ms psacct.service
629ms systemd-udev-trigger.service
619ms sys-kernel-debug.mount
619ms dev-hugepages.mount
619ms dev-mqueue.mount
581ms proc-sys-fs-binfmt_misc.mount
505ms tmp.mount
470ms sys-kernel-config.mount
458ms lm_sensors.service
405ms plymouth-start.service
356ms systemd-ask-password-wall.service
331ms systemd-udevd.service
325ms boot.mount
323ms systemd-random-seed-load.service
295ms systemd-user-sessions.service
295ms ip6tables.service
295ms iptables.service
295ms atieventsd.service
294ms mcelog.service
288ms abrt-ccpp.service
287ms systemd-logind.service
232ms systemd-readahead-replay.service
232ms systemd-readahead-collect.service
196ms systemd-remount-fs.service
186ms systemd-sysctl.service
181ms polkit.service
120ms systemd-fsck@dev-disk-by\x2duuid-ef7e08a0\x2d9ca2\x2d43e2\x2da275\x2dece3eddff8b9.service
105ms auditd.service
102ms boot-efi.mount
89ms colord.service
81ms systemd-tmpfiles-setup.service
67ms udisks2.service
63ms plymouth-read-write.service
26ms home.mount
11ms sshd.service
8ms fedora-readonly.service
7ms rpcbind.service
6ms systemd-readahead-done.service
6ms NetworkManager-dispatcher.service
5ms lvm2-lvmetad.service
5ms upower.service
4ms systemd-tmpfiles-setup-dev.service
4ms systemd-update-utmp-runlevel.service
3ms systemd-cryptsetup@luks\x2d1d184e06\x2dc1c6\x2d4bc0\x2db980\x2dd862a37cbcfc.service
3ms systemd-localed.service
3ms systemd-journal-flush.service
3ms systemd-vconsole-setup.service
2ms sys-fs-fuse-connections.mount
Created attachment 803028 [details]
josephwagner-boot.xml
Created attachment 803029 [details]
josephwagner-journal-boutput.txt
I regret to report that my previous relief is short-lived. I have been unable to boot my desktop computer just now. I tried 2-3 times and no gdm ever appeared. I had no choice but to resort to the hack of removing /var/log/journal. *** Bug 1011895 has been marked as a duplicate of this bug. *** I had an error in booting on F19 today and slow boot as well. Due to other issues I was directed to delete /var/log/journal and try. It fixed most issues and boot time seems much faster. May be worth a try. I installed the ssd and kernel updates from this morning on both computers. I have had /var/log/journal in place on both computers for a number of days. Things are really looking good: I hope I haven't spoken too soon ;-). Desktop booted to gdm right away on my f19 kde system; laptop booted normally on my f19 kde system on the first try. Attaching my own data point, with 2 SVG boot plots booting the same 3.12.0-rc4 kernel on ext4, before and after cp'ing /dev/null on /var/log/journal/system.journal and erasing old user journal files. Difference: 28"4 vs 65"9 boot time. All of that time difference is spent with the disk LED blinking frantically. Before cleanup: [root@xbox ~]# ls -l /var/log/journal.bak/697c9225e38c4cc3b5666e109076911d/ total 140968 -rw-r-----+ 1 root systemd-journal 4005888 Sep 16 17:45 system~ -rw-r-----+ 1 root systemd-journal 118161408 Oct 8 15:34 system.journal -rw-r-----+ 1 root systemd-journal 3796992 Aug 26 22:32 user-1000.journal -rw-r-----+ 1 root systemd-journal 3735552 Sep 16 17:46 user-41~ -rw-r-----+ 1 root systemd-journal 6971392 Oct 8 15:30 user-41.journal -rw-r-----+ 1 root systemd-journal 3747840 Aug 26 22:32 user-42.journal -rw-r-----+ 1 root systemd-journal 3756032 Aug 26 19:02 user-992.journal After cleanup, one reboot (testing without avahi-daemon/bluetooth services) + 28"4 boot (reinstating avahi-daemon/bluetooth): [root@xbox ~]# ls -l /var/log/journal/697c9225e38c4cc3b5666e109076911d/ total 11548 -rw-r-----+ 1 root systemd-journal 4825088 Oct 8 15:50 system.journal -rw-r-----+ 1 root systemd-journal 6983680 Oct 8 15:48 user-41.journal Created attachment 809392 [details]
ASuardi: SVG before truncating journal
Created attachment 809393 [details]
ASuardi: SVG after truncating journal
Clearing the journal resolved the issue for me. # rm -fr -- /var/log/journal/* Before fix: Startup finished in 830ms (kernel) + 10.623s (initrd) + 1min 31.282s (userspace) = 1min 42.736s After fix: Startup finished in 816ms (kernel) + 11.194s (initrd) + 22.596s (userspace) = 34.606s just to say that I've been experiencing this problem after a clean install of Fedora 19. I've not tried # rm -fr -- /var/log/journal/* it would be good to know if we have figured out what the issue is and if removing the journal logs works why this is the case and if it is a real fix or just helps mask the underlying problem. The problem was entirely resolved for me, ie., normal boot without removing the journal, but, today, after updating to systemd-204-17, the problem has returned, insofar as it was necessary to boot twice to get to gdm. I believe I'm seeing the problem as described in comment #55. The top few results from systemd-analyze blame are below. The requested strace will be attached after this comment. I'm running the pre-release of Fedora 20, but I was seeing the same thing under 19, only worse. Clearing /var/log/journal made the boot time bearable (~30 sec as compared to 2+ min), but it's still not what it should be. I do run a FreeIPA server and the affected machine is a client to it. NSCD is not installed, but ipa through sssd is. I don't have the time to remove IPA and test right now, but I can later if it is still needed. 16.755s firewalld.service 9.955s sssd.service 9.673s accounts-daemon.service 7.281s rtkit-daemon.service 5.781s systemd-logind.service 2.524s rsyslog.service 2.305s NetworkManager.service 1.880s chronyd.service 1.581s home.mount 1.518s abrt-ccpp.service 1.442s certmonger.service 1.402s polkit.service 1.207s systemd-tmpfiles-setup.service 1.154s systemd-tmpfiles-setup-dev.service 1.128s systemd-update-utmp.service 1.100s mcelog.service Created attachment 835580 [details]
strace of getent group
Is it possible that this is, at least partially, a filesystem fragmentation issue? I am seeing over 20000 fragments for one journal file on my btrfs. This bug looks like a duplicate of F20 bug #1006386 and its F19 clone in bug #1039200. *** This bug has been marked as a duplicate of bug 1039200 *** |