Bug 983688

Summary:

Fedora 19 boots very, very slow compared to Fedora 18

Product:

[Fedora] Fedora

Reporter:

alexhultman

Component:

systemd

Assignee:

systemd-maint

Status:

CLOSED DUPLICATE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

abc.mikey, alessandro.suardi, armin, bazulay, bondhu.paul, collura, dennis, fedora, hanishkvc, harald, hgarg.india, james, joe, johannbg, john5342, j.wielicki, lnykryn, lpoetter, lysbleu, mesat, mike.parker, mlichvar, m.menheere, mschmidt, msekleta, pasik, pavlik.horak, pedemonte, plautrba, rdieter, rolle.hoffmann, rvokal, selinux, Simon.Gerhards, systemd-maint, vpavlin, webgeek1234, yaroslav.sapozhnik, yjcoshc, zbyszek

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-01-08 01:58:32 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
journalctl -b output	none
Another journalctl -b --until after boot finished	none
Rolle: system-analyze blame Session1	none
Rolle: system-analyze blame Session2	none
Rolle: system-analyze blame Session3	none
Rolle: system-analyze blame Session4	none
Rolle: system-analyze blame Session5	none
Rolle: system-analyze Session2	none
Rolle: system-analyze Session3	none
Rolle: system-analyze Session4	none
Rolle: system-analyze Session5	none
Rolle: journalctl -b Session1	none
Rolle: journalctl -b Session2	none
Rolle: journalctl -b Session3	none
Rolle: journalctl -b Session4	none
Rolle: journalctl -b Session5	none
Rolle: lsmod	none
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal and with nscd disabled (debug_level=debug)	none
Output of "journalctl -verbose -ab"; booted with journal, removed nscd, debug_level=debug	none
Boot plot systemd-analyze plot	none
josephwagner-boot.xml	none
josephwagner-journal-boutput.txt	none
ASuardi: SVG before truncating journal	none
ASuardi: SVG after truncating journal	none
strace of getent group	none

Description alexhultman 2013-07-11 17:48:58 UTC

Description of problem:
Booting my Fedora 19 takes a lot of time. It's probably 500% slower than Fedora 18. I used the btrfs installation option (over LVM).

Version-Release number of selected component (if applicable):
Fedora 19 RC3 (GOLD)

How reproducible:
Always

Steps to Reproduce:
1. Boot
2. Wait
3. Wait even more

Actual results:
Slow boot

Expected results:
Fast boot

Additional info:
I installed the Fedora 19 RC3 (which was later decalred GOLD). I used btrfs over LVM partitoning.

Comment 1 Bill Nottingham 2013-07-11 18:00:44 UTC

What is the output of:

systemctl status NetworkManager-wait-online.service

?

Comment 2 alexhultman 2013-07-11 18:49:14 UTC

NetworkManager-wait-online.service - Network Manager Wait Online
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager-wait-online.service; disabled)
   Active: inactive (dead)

Comment 3 Bill Nottingham 2013-07-11 19:11:14 UTC

OK. Can you attach the output of 'systemd-analyze blame'?

Comment 4 Jóhann B. Guðmundsson 2013-07-11 19:13:18 UTC

Also provide the information if this is an fresh install or an upgrade.

Thanks

Comment 5 alexhultman 2013-07-11 19:37:26 UTC

This is a clean install.

    1min 16.366s firewalld.service
    1min 16.364s avahi-daemon.service
    1min 16.304s gdm.service
    1min 15.741s bluetooth.service
         21.448s udisks2.service
         20.070s dkms_autoinstaller.service
         20.002s plymouth-quit-wait.service
          8.222s systemd-cryptsetup@luks\x2d374f532a\x2d2922\x2d45b6\x2d9420\x2dc49eb4c7d942.service
          7.827s abrt-ccpp.service
          7.640s bumblebee-nvidia.service
          6.958s mcelog.service
          4.573s systemd-cryptsetup@luks\x2d13f01f31\x2d0a40\x2d4bda\x2dbe29\x2de75833bb67f7.service
          2.901s dmraid-activation.service
          1.687s systemd-fsck@dev-disk-by\x2duuid-7527a9c0\x2d08f3\x2d407f\x2d9c8a\x2d065738ae9511.service
          1.491s systemd-udev-settle.service
          1.462s lm_sensors.service
          1.346s fedora-loadmodules.service
          1.319s systemd-binfmt.service
          1.310s boot.mount
          1.089s systemd-tmpfiles-setup.service
          1.058s proc-sys-fs-binfmt_misc.mount
          1.034s plymouth-read-write.service
           973ms lvm2-monitor.service
           934ms systemd-logind.service
           929ms rtkit-daemon.service
           924ms plymouth-start.service
           858ms NetworkManager.service
           850ms sys-kernel-debug.mount
           850ms dev-mqueue.mount
           849ms dev-hugepages.mount
           807ms fedora-readonly.service
           710ms tmp.mount
           653ms systemd-sysctl.service
           624ms boot-efi.mount
           620ms sys-kernel-config.mount
           583ms home.mount
           569ms systemd-udev-trigger.service
           434ms systemd-remount-fs.service
           377ms sendmail.service
           365ms accounts-daemon.service
           347ms colord.service
           287ms systemd-fsck-root.service
           260ms lvm2-lvmetad.service
           257ms systemd-tmpfiles-clean.service
           242ms systemd-udevd.service
           188ms chronyd.service
           183ms systemd-readahead-replay.service
           183ms systemd-readahead-collect.service
           181ms systemd-journal-flush.service
           113ms systemd-fsck@dev-disk-by\x2duuid-4aa2b89d\x2dce0d\x2d41b4\x2db23f\x2dccf3db91a2f9.service
           110ms rpcbind.service
            87ms sm-client.service
            48ms dev-mapper-luks\x2d13f01f31\x2d0a40\x2d4bda\x2dbe29\x2de75833bb67f7.swap
            36ms polkit.service
            19ms systemd-readahead-done.service
            14ms livesys.service
            13ms upower.service
             9ms wpa_supplicant.service
             9ms auditd.service
             8ms systemd-user-sessions.service
             7ms systemd-ask-password-wall.service
             7ms systemd-tmpfiles-setup-dev.service
             6ms livesys-late.service
             5ms systemd-update-utmp-runlevel.service
             3ms systemd-vconsole-setup.service
             3ms sys-fs-fuse-connections.mount
             2ms systemd-random-seed-load.service

Comment 6 Jonas Wielicki 2013-07-14 13:31:55 UTC

I am experiencing the same problem with an upgraded fedora 18 -> 19. This is the output of systemd-analyze blame:

    1min 24.583s firewalld.service
    1min 24.354s avahi-daemon.service
     1min 2.235s lightdm.service
         22.109s akmods.service
         21.929s akmods-shutdown.service
         12.300s systemd-cryptsetup@luks\x2d4b479d38\x2d5462\x2d4ad9\x2d9c25\x2d784f8fb8b1d3.service
          8.745s NetworkManager-wait-online.service
          2.981s rtkit-daemon.service
          2.980s systemd-logind.service
          2.424s systemd-tmpfiles-clean.service
          1.785s sysstat.service
          1.498s lm_sensors.service
          1.279s systemd-tmpfiles-setup.service
          1.183s chronyd.service
          1.128s fedora-loadmodules.service
          1.022s udisks2.service
           959ms systemd-binfmt.service
           938ms abrt-ccpp.service
           877ms colord.service
           870ms proc-sys-fs-binfmt_misc.mount
           857ms console-kit-log-system-start.service
           768ms NetworkManager.service
           762ms sys-kernel-debug.mount
           762ms dev-mqueue.mount
           761ms dev-hugepages.mount
           736ms plymouth-read-write.service
           690ms auditd.service
           672ms systemd-cryptsetup@luks\x2daf0ce3bf\x2d8fdf\x2d413f\x2dab89\x2dd43efba3883f.service
           658ms systemd-modules-load.service
           636ms systemd-tmpfiles-setup-dev.service
           636ms fedora-readonly.service
           589ms dev-disk-by\x2duuid-86723b87\x2d4884\x2d4079\x2d8575\x2d4ab319ac83ae.swap
           555ms sendmail.service
           528ms plymouth-start.service
           417ms systemd-fsck@dev-disk-by\x2duuid-ba53da2a\x2d20b7\x2d410b\x2d9f2b\x2d5b1518568eda.service
           416ms systemd-udev-trigger.service
           394ms mcelog.service
           381ms systemd-remount-fs.service
           331ms systemd-readahead-done.service
           286ms plymouth-quit.service
           286ms plymouth-quit-wait.service
           253ms mnt-h320.mount
           247ms systemd-user-sessions.service
           188ms dev-disk-by\x2duuid-3f8a1a86\x2dbbdb\x2d48de\x2d9ef0\x2de88cca658e63.swap
           150ms boot.mount
           126ms systemd-readahead-replay.service
            86ms systemd-fsck-root.service
            82ms sm-client.service
            62ms polkit.service
            39ms systemd-readahead-collect.service
            38ms systemd-udevd.service
            37ms systemd-random-seed-load.service
            37ms upower.service
            36ms bluetooth.service
            26ms console-kit-daemon.service
            25ms sshd.service
            22ms accounts-daemon.service
            17ms rpcbind.service
            12ms systemd-update-utmp-runlevel.service
            12ms systemd-journal-flush.service
             8ms sys-fs-fuse-connections.mount
             5ms systemd-vconsole-setup.service
             4ms systemd-sysctl.service
             4ms boot-efi.mount
             2ms home-horazont-tmp.mount
             1ms sys-kernel-config.mount
             1ms tmp.mount

Additionally, I plotted the startup with systemd-analyze plot and notice that avahi-daemon, firewalld and lightdm finished at the very same instant, as if they blocked for something very long (each other possibly?).

Comment 7 Michal Schmidt 2013-07-15 11:38:37 UTC

If you disable firewalld.service and avahi-daemon.service, does the rest of the system then boot fast?

You both use some 3rd party kernel modules (using dkms and akmods, respectively). Is the problem reproducible without them?

There may be some hints in the logs. Please attach the output of "journalctl -b".

Comment 8 Jonas Wielicki 2013-07-16 13:28:07 UTC

Turns out, disabling avahi-daemon and firewalld was not a good idea. Cannot get a login screen right now, I'll try harder. No ttys either, systemd-logind complained about some timeout with a timestamp of approximately 60 in front of it. Grub doesn't allow me to edit commands (it freezes after I press `e') and grml isn't able to mount my btrfs… I guess I'll be poking at that one for a longer time…

Comment 9 Jonas Wielicki 2013-07-16 13:37:00 UTC

Update: Booting the shiny new fedora 19 rescue boot entry got me a graphical login. It also booted much faster than usual. lightdm.service only takes 65ms to initialize, the usual times for the other jobs.

I reenabled avahi-daemon and firewalld, but disabled akmods and akmods-shutdown for the next test. Here, the boot was also nice and fast, with firewalld taking 17 seconds to initialize (systemd-analyze blame) and avahi-daemon taking approx. 8 seconds.

Note that the modules usually built by the akmod are nevertheless loaded, as no kernel version change happened.

Comment 10 Michal Schmidt 2013-07-16 13:42:44 UTC

(In reply to Jonas Wielicki from comment #9)
> Update: Booting the shiny new fedora 19 rescue boot entry got me a graphical
> login.

Then there seems to be something wrong with the (non-rescue) initramfs. Back it up and recreate it using dracut.

Comment 11 James 2013-07-19 06:56:20 UTC

Good to know it's not just me that suspects a big boot slowdown. I'm not using dkms or any other unusual stuff on startup. This was a fresh install from F19 Beta.

$ systemd-analyze blame
    1min 25.678s firewalld.service
    1min 25.572s avahi-daemon.service
    1min 25.320s chronyd.service
         20.008s plymouth-quit-wait.service
          7.760s dnf-makecache.service
          7.704s systemd-suspend.service
          7.658s systemd-udev-settle.service
          6.434s bluetooth.service
          6.326s systemd-logind.service
          6.307s rtkit-daemon.service
          5.237s gdm.service
          3.568s iprupdate.service
          3.565s iprinit.service
          2.819s NetworkManager-wait-online.service
          2.268s dev-mqueue.mount
          2.110s systemd-binfmt.service
          1.984s tmp.mount
          1.969s fedora-loadmodules.service
          1.813s sys-kernel-config.mount
          1.716s proc-sys-fs-binfmt_misc.mount
          1.634s sys-kernel-debug.mount
          1.632s dev-hugepages.mount
          1.543s abrt-ccpp.service
          1.537s mcelog.service
          1.447s systemd-fsck@dev-disk-by\x2duuid-d1d4a8e3\x2d702b\x2d4e5f\x2d901d\x2dec5e7550f6f5.service
          1.395s colord.service
          1.361s systemd-remount-fs.service
          1.165s systemd-tmpfiles-setup-dev.service
          1.136s iprdump.service
          1.043s lvm2-monitor.service
           889ms systemd-user-sessions.service
           879ms NetworkManager.service
           878ms plymouth-start.service
           827ms systemd-udev-trigger.service
           753ms systemd-tmpfiles-clean.service
           732ms dmraid-activation.service
           720ms systemd-tmpfiles-setup.service
           681ms dev-disk-by\x2duuid-f7d6841c\x2d9522\x2d437c\x2d8414\x2dc7940b3b4e3a.swap
           641ms plymouth-read-write.service
           556ms auditd.service
           508ms fedora-readonly.service
           488ms systemd-fsck-root.service
           477ms systemd-journal-flush.service
           446ms systemd-sysctl.service
           382ms udisks2.service
           341ms systemd-readahead-replay.service
           340ms systemd-readahead-collect.service
           260ms boot.mount
           248ms sendmail.service
           216ms accounts-daemon.service
           215ms polkit.service
           208ms sm-client.service
           197ms systemd-udevd.service
           187ms lvm2-lvmetad.service
           178ms upower.service
           143ms home.mount
           127ms livesys.service
           116ms workarounds.service
            29ms systemd-readahead-done.service
            10ms sshd.service
             5ms systemd-vconsole-setup.service
             5ms systemd-fsck@dev-disk-by\x2duuid-198daf9e\x2d6267\x2d43f7\x2db579\x2d0205d6790489.service
             5ms livesys-late.service
             5ms rpcbind.service
             4ms wpa_supplicant.service
             4ms systemd-update-utmp-runlevel.service
             3ms sys-fs-fuse-connections.mount
             2ms systemd-random-seed-load.service

Comment 12 Michal Schmidt 2013-07-19 07:10:41 UTC

James,
Do you use btrfs as well?
It's curious that you have systemd-suspend.service in the list. Have you suspended the machine while booting?

I would still like to see the "journalctl -b" output.

Comment 13 James 2013-07-19 16:53:29 UTC

(In reply to Michal Schmidt from comment #12)
> James,
> Do you use btrfs as well?

Yes.

> It's curious that you have systemd-suspend.service in the list. Have you
> suspended the machine while booting?

Not while booting, but the machine has seen a couple of suspend/resume cycles.
 
> I would still like to see the "journalctl -b" output.

I'll post it below shortly.

Comment 14 James 2013-07-19 22:44:27 UTC

Created attachment 776001 [details]
journalctl -b output

Comment 15 Antoni Segura Puimedon 2013-07-25 23:36:31 UTC

Happening here (although in smaller measure):

         21.343s accounts-daemon.service
         21.328s firewalld.service
          8.115s systemd-cryptsetup
          5.001s dnf-makecache.service
          1.132s fedora-loadmodules.service
          1.070s plymouth-start.service
           985ms boot-EFI.mount
           949ms systemd-tmpfiles-clean.service
           918ms systemd-binfmt.service
           819ms NetworkManager.service
           614ms fedora-readonly.service
           555ms home.mount
           550ms systemd-udev-trigger.service
           548ms dev-mqueue.mount
           548ms sys-kernel-debug.mount
           547ms dev-hugepages.mount
           546ms tmp.mount
           545ms systemd-fsck@dev-disk-by\x2duuid-8D12\x2d7405.service
           513ms proc-sys-fs-binfmt_misc.mount
           480ms bluetooth.service
           467ms systemd-logind.service
           466ms abrt-ccpp.service
           462ms lm_sensors.service
           336ms rtkit-daemon.service
           334ms systemd-fsck
           305ms systemd-tmpfiles-setup.service
           295ms colord.service
           293ms systemd-remount-fs.service
           249ms systemd-sysctl.service
           245ms plymouth-read-write.service
           205ms systemd-fsck-root.service
           175ms systemd-journal-flush.service
           168ms systemd-readahead-replay.service
           127ms multi.mount
           101ms upower.service
            95ms chronyd.service
            55ms udisks2.service
            55ms plymouth-quit-wait.service
            53ms plymouth-quit.service
            34ms systemd-readahead-collect.service
            33ms systemd-udevd.service
            30ms sys-kernel-config.mount
            26ms polkit.service
            24ms sshd.service
            16ms teamviewerd.service
            14ms rpcbind.service
             8ms gdm.service
             8ms avahi-daemon.service
             4ms wpa_supplicant.service
             3ms auditd.service
             3ms systemd-tmpfiles-setup-dev.service
             2ms systemd-user-sessions.service
             1ms systemd-update-utmp-runlevel.service
             1ms systemd-random-seed-load.service
             1ms systemd-readahead-done.service
           979us systemd-vconsole-setup.service

On my other machine, also core i7, but running arch I get:

          6.499s firewalld.service
          3.133s avahi-daemon.service
          3.119s systemd-logind.service
          2.985s gdm.service
          1.526s accounts-daemon.service
          1.176s systemd-binfmt.service
          1.158s NetworkManager.service
          1.130s sys-kernel-debug.mount
          1.125s systemd-fsck@dev-disk-by\x2duuid-0a83d5ea\x2deceb\x2d4fb9\x2db231\x2d36d7993d8e49.service
          1.118s dev-hugepages.mount
          1.116s dev-mqueue.mount
          1.107s systemd-udev-trigger.service
          1.094s polkit.service
          1.042s systemd-remount-fs.service
          1.038s tmp.mount
          1.009s systemd-vconsole-setup.service
           933ms proc-sys-fs-binfmt_misc.mount
           923ms systemd-modules-load.service
           828ms systemd-tmpfiles-setup-dev.service
           494ms alsa-restore.service
           446ms systemd-sysctl.service
           393ms colord.service
           318ms systemd-udevd.service
           305ms boot.mount
           286ms systemd-user-sessions.service
           267ms systemd-update-utmp.service
           166ms sys-kernel-config.mount
           146ms rpi.mount
           145ms udisks2.service
           126ms systemd-fsck@dev-disk-by\x2duuid-5dd4a246\x2dc65c\x2d49c8\x2d9fd2\x2dff17e72a7036.service
           104ms systemd-random-seed-load.service
            62ms systemd-journal-flush.service
            40ms nmbd.service
            39ms rtkit-daemon.service
            39ms upower.service
            30ms systemd-tmpfiles-setup.service
            22ms home.mount
            17ms smbd.service
             3ms home-celebdor-games.mount
             2ms systemd-tmpfiles-clean.service
           837us sys-fs-fuse-connections.mount

The 3x boot time on firewalld (without having touched any rules on the first machine (F19) seems excessive.

Comment 16 James 2013-07-26 23:06:14 UTC

Since removing /var/log/journal to disable the persistent journal, boot times have gone down somewhat:

         20.005s plymouth-quit-wait.service
         19.555s firewalld.service
          9.437s NetworkManager-wait-online.service
          8.869s accounts-daemon.service
          6.150s systemd-udev-settle.service
          4.778s gdm.service
          3.596s avahi-daemon.service

although they still seem rather long. Takes X/gdm quite a while to start up, too. I don't have a machine on ext4 at the moment to compare.

firewalld does seem slow to start, particularly if I drop the file caches beforehand:

# echo 3 > /proc/sys/vm/drop_caches
# time service firewalld restart
Redirecting to /bin/systemctl restart  firewalld.service

real	0m11.483s
user	0m0.012s
sys	0m0.024s

Once it's all in memory, restarting takes under a second.

Comment 17 hanishkvc 2013-07-31 18:14:03 UTC

Same issue over the last few days wrt a Fresh Fedora 19 install on a Samsung laptop having Intel i5-3210M. Initial few days it used to boot slow. However now sometime it boots slow and some times it just hangs after boot with out showing the GDB login screen. Also I am not able to switch to a text based console by pressing Ctl+Alt+F2 or so ... when the Boot hangs.

NOTE: I also think the systemd configuration should be changed by default to allow the text console to start early atleast, I don't see any reason to block text console start also. So that atleast one can debug these issues. Removing rhgb and quiet from kernel command line does seem to allow the system to boot always (i.e avoid the hang I mentioned above), even thou it also blocks for the firewall, avahi and plymouth-quit-wait services to complete and thus takes time to boot into the GUI.

This is the output of one of the slow boots.

Startup finished in 3.766s (firmware) + 4.846s (loader) + 1.115s (kernel) + 3.356s (initrd) + 1min 5.017s (userspace) = 1min 18.102s

The output of systemd-analyse blame is

         49.419s firewalld.service
         49.406s avahi-daemon.service
         20.006s plymouth-quit-wait.service
          3.890s gdm.service
          3.773s bluetooth.service
          3.762s systemd-logind.service
          3.755s rtkit-daemon.service
          2.244s chronyd.service
          2.103s lvm2-monitor.service
          2.083s systemd-fsck@dev-disk-by\x2duuid-5aac0cb1\x2d8708\x2d4b95\x2d9cc7\x2df43922e95244.service
          2.021s abrt-ccpp.service
          1.999s systemd-udev-settle.service
          1.896s mcelog.service
          1.804s sys-kernel-debug.mount
          1.804s dev-mqueue.mount
          1.785s dev-hugepages.mount
          1.770s tmp.mount
          1.163s plymouth-start.service
          1.121s fedora-loadmodules.service
          1.115s sys-kernel-config.mount
           846ms systemd-tmpfiles-setup.service
           815ms systemd-udev-trigger.service
           779ms systemd-remount-fs.service
           684ms fedora-readonly.service
           655ms dev-disk-by\x2duuid-61e2cb71\x2d74a1\x2d4f63\x2d9400\x2d001f17ec3c83.swap
           495ms plymouth-read-write.service
           469ms colord.service
           417ms lvm2-lvmetad.service
           386ms systemd-readahead-replay.service
           365ms systemd-user-sessions.service
           356ms udisks2.service
           344ms systemd-sysctl.service
           340ms systemd-fsck-root.service
           338ms livesys-late.service
           309ms home.mount
           303ms NetworkManager.service
           264ms auditd.service
           192ms sendmail.service
           175ms boot.mount
           161ms systemd-localed.service
           158ms upower.service
           111ms systemd-readahead-collect.service
            80ms accounts-daemon.service
            52ms boot-efi.mount
            25ms polkit.service
            21ms sm-client.service
            17ms systemd-udevd.service
            17ms dmraid-activation.service
            13ms rpcbind.service
            13ms livesys.service
             8ms systemd-tmpfiles-setup-dev.service
             7ms systemd-journal-flush.service
             6ms systemd-readahead-done.service
             5ms systemd-vconsole-setup.service
             5ms systemd-fsck@dev-disk-by\x2duuid-6fcf2ebf\x2dd6bb\x2d40f1\x2d81ca\x2d23771e21f21c.service
             4ms systemd-random-seed-load.service
             4ms systemd-update-utmp-runlevel.service
             3ms sys-fs-fuse-connections.mount

Comment 18 Jonas Wielicki 2013-08-04 08:25:14 UTC

Created attachment 782433 [details]
Another journalctl -b --until after boot finished

For comparision, another journalctl -b output, until after login. After login, it's dominated by munin-related setroubleshootd output.

Is there any progress on the investigation, anything we can do to help with this?

Comment 19 Jonas Wielicki 2013-08-06 13:44:25 UTC

I have another log of a machine taking 40s to initialize gdm and five other services, would that be helpful?

Comment 20 lysbleu 2013-08-19 14:45:09 UTC

It appears that removing the content of /var/log and masking systemd-journal.service greatly improves the boot time.

Comment 21 Harald Hoyer 2013-08-20 11:51:17 UTC

(In reply to lysbleu from comment #20)
> It appears that removing the content of /var/log and masking
> systemd-journal.service greatly improves the boot time.

You must not mask systemd-journal.service! Otherwise you won't get any log messages from your system!

Comment 22 Jóhann B. Guðmundsson 2013-08-20 12:15:56 UTC

(In reply to Harald Hoyer from comment #21)
> (In reply to lysbleu from comment #20)
> > It appears that removing the content of /var/log and masking
> > systemd-journal.service greatly improves the boot time.
> 
> You must not mask systemd-journal.service! Otherwise you won't get any log
> messages from your system!

Perhaps it's best for us to prevent users from being able to do that in the first place ;)

Comment 23 Zbigniew Jędrzejewski-Szmek 2013-08-21 02:30:16 UTC

(In reply to Jóhann B. Guðmundsson from comment #22)
> (In reply to Harald Hoyer from comment #21)
> > (In reply to lysbleu from comment #20)
> > > It appears that removing the content of /var/log and masking
> > > systemd-journal.service greatly improves the boot time.
> > 
> > You must not mask systemd-journal.service! Otherwise you won't get any log
> > messages from your system!
> 
> Perhaps it's best for us to prevent users from being able to do that in the
> first place ;)

Or maybe prevent users from wanting to do that instead :P

Comment 24 Michal Schmidt 2013-08-23 14:57:49 UTC

(In reply to lysbleu from comment #20)
> It appears that removing the content of /var/log and masking
> systemd-journal.service greatly improves the boot time.

James Ettle and others (in some other BZs) already said that removing /var/log/journal improves the boot speed for them. Are you saying that masking systemd-journald.service brings an additional speedup on top of that?

Comment 25 Michal Schmidt 2013-08-23 15:04:20 UTC

I'd like to ask you folks affected by the bug for:

1. some more of "journalctl -b" logs, but with chronyd.service (and any other NTP clients) disabled - to make sure we are not getting confused by possible jumps of the realtime clock.

2. SVG charts generated by systemd-bootchart (see its manpage for what to add to the kernel command line) and by "systemd-analyze plot > sa.svg"

Comment 26 Jonas Wielicki 2013-08-24 10:08:01 UTC

Will do upon next reboot (likely tomorrow). The manpage says it will collect for 30 seconds by default. How can I increase that time, because it will most likely not be sufficient to capture the whole boot (the current one having taken 1min 30s). 

Is it sufficient to edit /etc/systemd/bootchart.conf or do I have to change something in the initramfs?

I'll also disable chronyd.service in that run, I hope that's okay.

Comment 27 Jonas Wielicki 2013-08-26 08:13:01 UTC

I tried to append init=/usr/lib/systemd/systemd-bootchart to my linuxefi command line. If I do that, the boot freezes (for at least 3 minutes) after 
[  OK  ] Recreate Volatile Files and Directories...

I found no way around that, is there anything obvious I could have done wrong?

Comment 28 lysbleu 2013-08-26 22:51:40 UTC

(In reply to Michal Schmidt from comment #24)
> (In reply to lysbleu from comment #20)
> > It appears that removing the content of /var/log and masking
> > systemd-journal.service greatly improves the boot time.
> 
> James Ettle and others (in some other BZs) already said that removing
> /var/log/journal improves the boot speed for them. Are you saying that
> masking systemd-journald.service brings an additional speedup on top of that?

Actually no, it was a mistake and unmasking it did not add any boot time. Removing /var/log/journal is enough.

Comment 29 john5342 2013-08-29 03:46:11 UTC

(In reply to Michal Schmidt from comment #25)
> 2. SVG charts generated by systemd-bootchart (see its manpage for what to
> add to the kernel command line) and by "systemd-analyze plot > sa.svg"

systemd-bootchart stalls everything for me too. If i understand the man page correctly i set it to 15000 samples at 25Hz which is 10 mins. I gave it 15 mins to move but nothing. I don't know if it is related or not but it apparently cannot open "/proc/schedstat".

Will post the other items when i next reboot.

Comment 30 Miroslav Lichvar 2013-09-02 14:08:12 UTC

Just to clarify how does this bug slow down the other services, it seems the processes are blocked in the sendto() system call when they are sending messages to the system logger. Would a larger buffer on the receiving side help?

From a chronyd strace output:

10:33:16.982187 socket(PF_LOCAL, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 0
10:33:16.982228 connect(0, {sa_family=AF_LOCAL, sun_path="/dev/log"}, 110) = 0
10:33:16.982267 sendto(0, "<30>Sep 02 10:33:16 chronyd[373]"..., 63, MSG_NOSIGNAL, NULL, 0) = 63
10:33:33.642469 open("/etc/chrony.conf", O_RDONLY) = 1

It would be nice if the systemd-journald service was marked as started only when it's fully initialized and ready to accept the log messages. This would make the bootchart logs more clear.

Comment 31 Jonas Wielicki 2013-09-05 10:33:19 UTC

I have another data point, although I'm not entirely sure it's related.

Booting my machine today, systemd-logind failed to start due to timeouts (I am currently booting my system mit verbose=1):
Sep 05 12:11:59 zion systemd-logind[769]: New seat seat0.
Sep 05 12:11:59 zion systemd-logind[769]: Watching system buttons on /dev/input/event1 (Power Button)
Sep 05 12:11:59 zion systemd-logind[769]: Watching system buttons on /dev/input/event0 (Power Button)
Sep 05 12:11:59 zion systemd-logind[695]: Failed to get system D-Bus connection: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
Sep 05 12:11:59 zion systemd-logind[695]: Failed to fully start up daemon: Connection refused
Sep 05 12:11:59 zion systemd[1]: Unit systemd-logind.service entered failed state.
Sep 05 12:11:59 zion systemd[1]: systemd-logind.service holdoff time over, scheduling restart.

Similar timeout messages come up for different services:
Sep 05 12:11:59 zion dbus-daemon[721]: dbus[721]: [system] Activating systemd to hand-off: service name='org.freedesktop.ColorManager' unit='colord.service'
Sep 05 12:11:59 zion dbus-daemon[721]: dbus[721]: [system] Failed to activate service 'org.freedesktop.ColorManager': timed out
[…]
Sep 05 12:12:24 zion dbus-daemon[721]: dbus[721]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
Sep 05 12:12:24 zion dbus-daemon[721]: dbus[721]: [system] Failed to activate service 'org.freedesktop.ColorManager': timed out
Sep 05 12:12:24 zion dbus-daemon[721]: dbus[721]: [system] Failed to activate service 'org.freedesktop.Accounts': timed out

At this point, I reset the machine. The next boot is different and surprisingly fast. My suspicion is that it correlates with the truncated journal. Oddly, I cannot find the message from journald about renaming and recreating the journal file after truncation (due to reset) in the journalctl logs.

The boot time was way better than anything in the past:
Startup finished in 2.288s (kernel) + 15.886s (initrd) + 44.452s (userspace) = 1min 2.627s



Another datapoint: When searching for the logs for this comment, journalctl blocked nearly one minute until showing output for journalctl --since today --full. During that time, it did some I/O on the hard drive, mainly reading, but incredibly slow (only 600kByte to 1 MByte/s; measured with conky). The drive is able to do up to 140 MByte/s. 
So either the files accessed by journalctl are horribly fragmented in the filesystem (btrfs), journalctl accesses lots of different files or the file format is fragmented itself, causing lots of expensive seeks.

For consecutive reads, the effective read rate is easily going up to 60 or even over 100 MByte/s, sometimes more.


journalctl says it will keep up to 4 GBytes of journal in one file:
Sep 05 12:16:25 zion systemd-journal[424]: Allowing system journal files to grow to 4.0G.

Is it possible to lower that number? I could imagine that it's just too large to be performant. Does journald load the whole syslog into memory upon boot or iterate over it for any other reason?

Comment 32 Rolle 2013-09-06 13:26:22 UTC

The same here.
I have a fast system with a SSD-drive. The system could easily boot under 20 secs. But fedora 19 takes sometimes 30 secs, sometimes 40 secs and sometimes it doesn't run to login screen (then I could only switch to another tty and login in textmode).
I have installed smartmontools and lm-sensors (with all sensors are checked). Maybe "smartd" takes a long time (see log). But there must other problems exist.
The plymouth booting sequence (the little fedora logo in the middle) doesn't appear in 95% of booting sessions. Only a black screen appears and after a long time the Login manager is seen. And sometimes It remains black (I waited sometimes 5 min.).


I attached the logs for 5 booting sessions:
session 1: ca. 48 secs (graphical login was possible)
session 2: ca. 52 secs (no graphical login screen)
session 3: ca. 37 secs (graphical login was possible)
session 4: ca. 45 secs (no graphical login screen)
session 5: ca. 37 secs (graphical login was possible)

and the lsmod-output.

If there is no login screen visible, the "gdm.service" takes long. If graphical login is possible the "gdm.service" takes under 1 secs.

So I see the following problematic things:
- plymouth-quit-wait.service
- accounts-daemon.service
- chronyd.service
- firewalld.service
- smartd
- lm-sensors
- gnome window manager (-> in the logs is the german word "Fensterverwalter"

Maybe one of these are problematic.

Comment 33 Rolle 2013-09-06 13:27:49 UTC

Created attachment 794734 [details]
Rolle: system-analyze blame Session1

Comment 34 Rolle 2013-09-06 13:29:27 UTC

Created attachment 794735 [details]
Rolle: system-analyze blame Session2

Comment 35 Rolle 2013-09-06 13:29:53 UTC

Created attachment 794736 [details]
Rolle: system-analyze blame Session3

Comment 36 Rolle 2013-09-06 13:30:23 UTC

Created attachment 794737 [details]
Rolle: system-analyze blame Session4

Comment 37 Rolle 2013-09-06 13:30:51 UTC

Created attachment 794738 [details]
Rolle: system-analyze blame Session5

Comment 38 Rolle 2013-09-06 13:31:26 UTC

Created attachment 794740 [details]
Rolle: system-analyze Session2

Comment 39 Rolle 2013-09-06 13:31:50 UTC

Created attachment 794741 [details]
Rolle: system-analyze Session3

Comment 40 Rolle 2013-09-06 13:32:15 UTC

Created attachment 794742 [details]
Rolle: system-analyze Session4

Comment 41 Rolle 2013-09-06 13:32:41 UTC

Created attachment 794743 [details]
Rolle: system-analyze Session5

Comment 42 Rolle 2013-09-06 13:33:26 UTC

Created attachment 794744 [details]
Rolle: journalctl -b  Session1

Comment 43 Rolle 2013-09-06 13:33:51 UTC

Created attachment 794745 [details]
Rolle: journalctl -b  Session2

Comment 44 Rolle 2013-09-06 13:34:21 UTC

Created attachment 794746 [details]
Rolle: journalctl -b  Session3

Comment 45 Rolle 2013-09-06 13:34:49 UTC

Created attachment 794747 [details]
Rolle: journalctl -b  Session4

Comment 46 Rolle 2013-09-06 13:35:20 UTC

Created attachment 794748 [details]
Rolle: journalctl -b  Session5

Comment 47 Rolle 2013-09-06 13:35:59 UTC

Created attachment 794750 [details]
Rolle: lsmod

Comment 48 Lennart Poettering 2013-09-13 03:18:21 UTC

I'd really like to know what happens inside the dbus daemon when the boot is that slow. Anybody can get me an strace -tt of dbus-daemon when this happens?

Comment 49 Lennart Poettering 2013-09-13 03:21:18 UTC

*** Bug 981669 has been marked as a duplicate of this bug. ***

Comment 50 Lennart Poettering 2013-09-13 03:21:30 UTC

*** Bug 985076 has been marked as a duplicate of this bug. ***

Comment 51 Lennart Poettering 2013-09-13 03:41:21 UTC

bug 1006386 and bug 1005399 are likely the same problem.

Something appears to be stalling polkit/accounts-daemon/dbus/journald for a long time, I wonder though what it might be. I'd be thankful for any reports what these processes might be working on when the system stalls. I'd be particularly interested in strace -tt of these processes while they block and until thinks continue again. Also good would be the gdb full bt of those processes then.

Are you guys using anything weird NSS-related like IPA/sssd on the systems where this happens?

Comment 52 Harald Hoyer 2013-09-13 10:48:45 UTC

Also: https://bugs.freedesktop.org/show_bug.cgi?id=68217

Can this be an interaction between systemd-logind providing service only after subscribing to "org.freedesktop.systemd1.Manager" but "org.freedesktop.systemd1.Manager" not available for some time?

Comment 53 Tom London 2013-09-13 13:58:17 UTC

I tried running strace on dbus-daemon by adding '/bin/strace ...' to the dbus.service file.

This causes system to fail to boot, as dbus-daemon dies.

Any hints on how to do this?

Comment 54 Tom London 2013-09-13 13:59:06 UTC

Actual 'ExecStart' line I used:

ExecStart=/bin/strace -tt -o /var/tmp/dbus-strace /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation

Comment 55 Lennart Poettering 2013-09-13 15:00:51 UTC

So, here's my wild but educated guess what is going on:

journald is written carefully to never require any other service to be running. This is because most services connect to it to log something, and if journald would ever block on some IPC call to another service and that service would then log something back to journald and block on that to finish we'd run into a deadlock.

Now, even though journald is carefully written that way there's one place where an IPC call can sneak in: we try to resolve the "systemd-journal" group name to a GID (with getgrid()) to that we can chgrp() all journal files to it. In most setups that is without risk, since getgrid() will load the "files" NSS module which will just go to /etc/group and return the requested data. However, in some setups the NSS lookup might actually involve IPC, and that's cases where sssd/IPA or suchlike are used, or where nscd is enable. Now, this will not necessarily always result in a deadlock, since logging to journald is asynchronous which means as long as there is room in the log socket buffer no deadlock will happen. But if a lot of things happen at the same time the log socket buffer is easily full (since on Linux it will only take 15 datagrams or so by default) and there you go.

Now, the question is what to do about this. We could just hardcode the resolved GID into journald. This would work fine on Fedora since the GID is assigned fixed anyway, but would still be ugly. Or we could require that /var/log/journal is owned by the group, and then simply read the GID from the dir before applying it to the journal files... Neither of of those solutions I particularly like I must say.

Anyway, I figure if you guys remove nscd/sssd/IPA/... from your NSS setup things will not stall anymore. Could you test this?

Comment 56 Tom London 2013-09-13 15:27:48 UTC

Created attachment 797390 [details]
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal and with nscd disabled (debug_level=debug)

Not sure if this is exactly what you requested, but I disabled nscd.service and rebooted (sssd.service was already disabled, and I don't believe I have any IPA stuff other than sssd-ipa on my system).

The result seems similar to other "20 second slower boots".

Comment 57 Harald Hoyer 2013-09-13 15:55:27 UTC

Lennart, any idea, why the timestamps are so mixed?

[   60.521021] tlondon.localhost.org colord[516]: Executing: /usr/libexec/colord
[   21.854473] tlondon.localhost.org systemd[1]: Started Security Auditing Service.
[   21.856712] tlondon.localhost.org systemd[1]: Accepted connection on private bus.
[   21.856967] tlondon.localhost.org systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent
[   21.857056] tlondon.localhost.org systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
[   22.172371] tlondon.localhost.org systemd[1]: Received SIGCHLD from PID 415 (systemd-tmpfile).
[   22.174503] tlondon.localhost.org systemd[1]: Got SIGCHLD for process 415 (systemd-tmpfile)
[   60.703582] tlondon.localhost.org sshd[531]: Server listening on 0.0.0.0 port 22.
[   60.703941] tlondon.localhost.org sshd[531]: Server listening on :: port 22.
[   22.176395] tlondon.localhost.org systemd[1]: Child 415 died (code=exited, status=0/SUCCESS)

Comment 58 Lennart Poettering 2013-09-13 16:22:52 UTC

Hmm, so guys, can you attach the full output of "strace -o getent-group getent group systemd-journal", please, on a system where the delay happens at boot? This should tell us if the NSS lookup results in IPC!

Comment 59 Kay Sievers 2013-09-13 16:40:43 UTC

(In reply to Tom London from comment #56)

> Not sure if this is exactly what you requested, but I disabled nscd.service
> and rebooted (sssd.service was already disabled, and I don't believe I have
> any IPA stuff other than sssd-ipa on my system).

Please also disable nscd.socket, or de-install the entire package, and try
again.

I guess nscd cannot be socket activated, without gross hacks that would
be needed in systemd code. The nscd.socket probably needs to be entirely
removed from the nscd package.

Comment 60 Lennart Poettering 2013-09-13 20:10:24 UTC

(In reply to Kay Sievers from comment #59)

> I guess nscd cannot be socket activated, without gross hacks that would
> be needed in systemd code. The nscd.socket probably needs to be entirely
> removed from the nscd package.

Nah, I think systemd must be capable of supporting nscd being socket activatable. That's something to fix on our side. No finger pointing here please except on ourselves.

Comment 61 Lennart Poettering 2013-09-13 20:12:02 UTC

(In reply to Harald Hoyer from comment #57)
> Lennart, any idea, why the timestamps are so mixed?

Possibly because messages have been for quite some time in the queue. To make more sense of this it would be interesting to see "-o verbose" output of that stream fragment which will show timestamps when systemd got the messages and when they were enqueued.

Comment 62 Lennart Poettering 2013-09-13 20:13:38 UTC

(In reply to Tom London from comment #53)
> I tried running strace on dbus-daemon by adding '/bin/strace ...' to the
> dbus.service file.
> 
> This causes system to fail to boot, as dbus-daemon dies.
> 
> Any hints on how to do this?

See https://bugzilla.redhat.com/show_bug.cgi?id=1006386#c18

Comment 63 Tom London 2013-09-14 14:53:54 UTC

I posted output of 'strace dbus-daemon', 'systemd-analyze plot', and 'journalctl -o short-monotonic -ab' to https://bugzilla.redhat.com/show_bug.cgi?id=1006386

Comment 64 Tom London 2013-09-14 14:55:23 UTC

(In reply to Lennart Poettering from comment #61)
> (In reply to Harald Hoyer from comment #57)
> > Lennart, any idea, why the timestamps are so mixed?
> 
> Possibly because messages have been for quite some time in the queue. To
> make more sense of this it would be interesting to see "-o verbose" output
> of that stream fragment which will show timestamps when systemd got the
> messages and when they were enqueued.

Attaching below output of 'journalctl -o verbose -ab' when 'journalctl -o short-monotonic -ab' showed miss sorted entries.

Comment 65 Tom London 2013-09-14 14:56:56 UTC

Created attachment 797658 [details]
Output of "journalctl -verbose -ab"; booted with journal, removed nscd, debug_level=debug

Comment 66 Maarten 2013-09-22 19:27:21 UTC

Big improvement in bootspeed since last software update. from 35sec tot 20sec. No big stalls anymore. Added bootplot

Comment 67 Maarten 2013-09-22 19:28:38 UTC

Created attachment 801323 [details]
Boot plot systemd-analyze plot

Comment 68 Michal Schmidt 2013-09-23 08:05:16 UTC

*** Bug 1005399 has been marked as a duplicate of this bug. ***

Comment 69 Peter Gückel 2013-09-23 16:03:28 UTC

With kdm, my computers would just hang, no graphical, sometimes no ability to switch to vt2, even. I had to reboot endlessly to get to the kdm screen  on the desktop and on the laptop, the only solution was the hack of removing /var/log/journal and booting into multi-user.target and running startx :-(

Since my switch to gdm and the latest systemd update, both my laptop (i686) and my desktop (x86_64) computers boot to graphical on the first try! I still have the subjective impression that boot is slow and the drives are working a lot more than they used to, but I can use my computers normally without resorting to hacks :-)

Comment 70 Joseph D. Wagner 2013-09-25 18:59:37 UTC

I am looking at nearly a 2 minute startup on ext4.

Startup finished in 830ms (kernel) + 10.623s (initrd) + 1min 31.282s (userspace) = 1min 42.736s

         20.002s plymouth-quit-wait.service
         17.646s NetworkManager.service
         17.130s accounts-daemon.service
         16.981s haveged.service
         16.948s rtkit-daemon.service
         14.319s gdm.service
          2.916s akmods.service
          2.900s systemd-cryptsetup@luks\x2dbc1e37d2\x2d243b\x2d4f9b\x2d91d0\x2df8fa0632a0ca.service
          2.758s akmods-shutdown.service
          1.589s systemd-fsck-root.service
          1.405s fedora-loadmodules.service
          1.303s lvm2-monitor.service
          1.152s systemd-fsck@dev-disk-by\x2duuid-965cbc1b\x2d83ec\x2d4ca5\x2d99a6\x2d65d03c9649bb.service
          1.052s chronyd.service
           976ms dev-disk-by\x2duuid-106ad9a0\x2d39bf\x2d44eb\x2da085\x2dd324f3e151be.swap
           719ms systemd-binfmt.service
           652ms psacct.service
           629ms systemd-udev-trigger.service
           619ms sys-kernel-debug.mount
           619ms dev-hugepages.mount
           619ms dev-mqueue.mount
           581ms proc-sys-fs-binfmt_misc.mount
           505ms tmp.mount
           470ms sys-kernel-config.mount
           458ms lm_sensors.service
           405ms plymouth-start.service
           356ms systemd-ask-password-wall.service
           331ms systemd-udevd.service
           325ms boot.mount
           323ms systemd-random-seed-load.service
           295ms systemd-user-sessions.service
           295ms ip6tables.service
           295ms iptables.service
           295ms atieventsd.service
           294ms mcelog.service
           288ms abrt-ccpp.service
           287ms systemd-logind.service
           232ms systemd-readahead-replay.service
           232ms systemd-readahead-collect.service
           196ms systemd-remount-fs.service
           186ms systemd-sysctl.service
           181ms polkit.service
           120ms systemd-fsck@dev-disk-by\x2duuid-ef7e08a0\x2d9ca2\x2d43e2\x2da275\x2dece3eddff8b9.service
           105ms auditd.service
           102ms boot-efi.mount
            89ms colord.service
            81ms systemd-tmpfiles-setup.service
            67ms udisks2.service
            63ms plymouth-read-write.service
            26ms home.mount
            11ms sshd.service
             8ms fedora-readonly.service
             7ms rpcbind.service
             6ms systemd-readahead-done.service
             6ms NetworkManager-dispatcher.service
             5ms lvm2-lvmetad.service
             5ms upower.service
             4ms systemd-tmpfiles-setup-dev.service
             4ms systemd-update-utmp-runlevel.service
             3ms systemd-cryptsetup@luks\x2d1d184e06\x2dc1c6\x2d4bc0\x2db980\x2dd862a37cbcfc.service
             3ms systemd-localed.service
             3ms systemd-journal-flush.service
             3ms systemd-vconsole-setup.service
             2ms sys-fs-fuse-connections.mount

Comment 71 Joseph D. Wagner 2013-09-25 19:01:58 UTC

Created attachment 803028 [details]
josephwagner-boot.xml

Comment 72 Joseph D. Wagner 2013-09-25 19:07:55 UTC

Created attachment 803029 [details]
josephwagner-journal-boutput.txt

Comment 73 Peter Gückel 2013-09-25 23:40:13 UTC

I regret to report that my previous relief is short-lived. I have been unable to boot my desktop computer just now. I tried 2-3 times and no gdm ever appeared. I had no choice but to resort to the hack of removing /var/log/journal.

Comment 74 Michal Schmidt 2013-09-27 12:30:21 UTC

*** Bug 1011895 has been marked as a duplicate of this bug. ***

Comment 75 Robin Laing 2013-09-30 23:39:16 UTC

I had an error in booting on F19 today and slow boot as well.  Due to other issues I was directed to delete /var/log/journal and try.  It fixed most issues and boot time seems much faster.  May be worth a try.

Comment 76 Peter Gückel 2013-10-01 16:06:02 UTC

I installed the ssd and kernel updates from this morning on both computers. I have had /var/log/journal in place on both computers for a number of days. Things are really looking good: I hope I haven't spoken too soon ;-).

Desktop booted to gdm right away on my f19 kde system; laptop booted normally on my f19 kde system on the first try.

Comment 77 Alessandro Suardi 2013-10-08 16:04:42 UTC

Attaching my own data point, with 2 SVG boot plots booting the same 3.12.0-rc4 kernel on ext4, before and after cp'ing /dev/null on /var/log/journal/system.journal and erasing old user journal files.

Difference: 28"4 vs 65"9 boot time. All of that time difference is spent with the disk LED blinking frantically.

Before cleanup:

[root@xbox ~]# ls -l /var/log/journal.bak/697c9225e38c4cc3b5666e109076911d/
total 140968
-rw-r-----+ 1 root systemd-journal   4005888 Sep 16 17:45 system~
-rw-r-----+ 1 root systemd-journal 118161408 Oct  8 15:34 system.journal
-rw-r-----+ 1 root systemd-journal   3796992 Aug 26 22:32 user-1000.journal
-rw-r-----+ 1 root systemd-journal   3735552 Sep 16 17:46 user-41~
-rw-r-----+ 1 root systemd-journal   6971392 Oct  8 15:30 user-41.journal
-rw-r-----+ 1 root systemd-journal   3747840 Aug 26 22:32 user-42.journal
-rw-r-----+ 1 root systemd-journal   3756032 Aug 26 19:02 user-992.journal


After cleanup, one reboot (testing without avahi-daemon/bluetooth services) + 28"4 boot (reinstating avahi-daemon/bluetooth):

[root@xbox ~]# ls -l /var/log/journal/697c9225e38c4cc3b5666e109076911d/
total 11548
-rw-r-----+ 1 root systemd-journal 4825088 Oct  8 15:50 system.journal
-rw-r-----+ 1 root systemd-journal 6983680 Oct  8 15:48 user-41.journal

Comment 78 Alessandro Suardi 2013-10-08 16:06:32 UTC

Created attachment 809392 [details]
ASuardi: SVG before truncating journal

Comment 79 Alessandro Suardi 2013-10-08 16:07:40 UTC

Created attachment 809393 [details]
ASuardi: SVG after truncating journal

Comment 80 Joseph D. Wagner 2013-10-13 20:26:08 UTC

Clearing the journal resolved the issue for me.

# rm -fr -- /var/log/journal/*

Before fix:
Startup finished in 830ms (kernel) + 10.623s (initrd) + 1min 31.282s (userspace) = 1min 42.736s

After fix:
Startup finished in 816ms (kernel) + 11.194s (initrd) + 22.596s (userspace) = 34.606s

Comment 81 mikey 2013-10-23 13:02:17 UTC

just to say that I've been experiencing this problem after a clean install of Fedora 19. 

I've not tried # rm -fr -- /var/log/journal/*

it would be good to know if we have figured out what the issue is and if removing the journal logs works why this is the case and if it is a real fix or just helps mask the underlying problem.

Comment 82 Peter Gückel 2013-10-25 20:16:58 UTC

The problem was entirely resolved for me, ie., normal boot without removing the journal, but, today, after updating to systemd-204-17, the problem has returned, insofar as it was necessary to boot twice to get to gdm.

Comment 83 Aaron Kling 2013-12-12 04:48:03 UTC

I believe I'm seeing the problem as described in comment #55. The top few results from systemd-analyze blame are below. The requested strace will be attached after this comment. I'm running the pre-release of Fedora 20, but I was seeing the same thing under 19, only worse. Clearing /var/log/journal made the boot time bearable (~30 sec as compared to 2+ min), but it's still not what it should be. I do run a FreeIPA server and the affected machine is a client to it. NSCD is not installed, but ipa through sssd is. I don't have the time to remove IPA and test right now, but I can later if it is still needed.

16.755s firewalld.service
 9.955s sssd.service
 9.673s accounts-daemon.service
 7.281s rtkit-daemon.service
 5.781s systemd-logind.service
 2.524s rsyslog.service
 2.305s NetworkManager.service
 1.880s chronyd.service
 1.581s home.mount
 1.518s abrt-ccpp.service
 1.442s certmonger.service
 1.402s polkit.service
 1.207s systemd-tmpfiles-setup.service
 1.154s systemd-tmpfiles-setup-dev.service
 1.128s systemd-update-utmp.service
 1.100s mcelog.service

Comment 84 Aaron Kling 2013-12-12 04:49:42 UTC

Created attachment 835580 [details]
strace of getent group

Comment 85 Jonas Wielicki 2013-12-12 08:14:42 UTC

Is it possible that this is, at least partially, a filesystem fragmentation issue?

I am seeing over 20000 fragments for one journal file on my btrfs.

Comment 86 Miroslav Lichvar 2013-12-12 08:27:51 UTC

This bug looks like a duplicate of F20 bug #1006386 and its F19 clone in bug #1039200.

Comment 87 Lennart Poettering 2015-01-08 01:58:32 UTC


*** This bug has been marked as a duplicate of bug 1039200 ***