1006386 – Journal flushing often slow, can prevent system booting correctly

Bug 1006386 - Journal flushing often slow, can prevent system booting correctly

Summary: Journal flushing often slow, can prevent system booting correctly

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	systemd
Sub Component:
Version:	rawhide
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	systemd-maint
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:	AcceptedBlocker
Duplicates (2):	962748 987242 (view as bug list)
Depends On:
Blocks:	F20FinalBlocker 998573 1029604
TreeView+	depends on / blocked

Reported:	2013-09-10 14:25 UTC by Tom London
Modified:	2014-09-24 15:46 UTC (History)
CC List:	33 users (show)
Fixed In Version:	systemd-208-9.fc20
Clone Of:
Clones:	1029604 1039200 (view as bug list)
Environment:
Last Closed:	2013-12-06 05:55:59 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Output of "systemd-analyze plot" of system without /var/log/journal (systemd.log_level=debug) (74.38 KB, image/svg+xml) 2013-09-10 14:26 UTC, Tom London	no flags	Details
Output of "journalctl -o short-monotonic -ab" of boot with no /var/log/journal (debug_level=debug) (572.44 KB, text/plain) 2013-09-10 14:27 UTC, Tom London	no flags	Details
Output of "systemd-analyze plot" of system with /var/log/journal (systemd.log_level=debug) (147.52 KB, image/svg+xml) 2013-09-10 14:28 UTC, Tom London	no flags	Details
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal (systemd.log_level=debug) (571.77 KB, text/plain) 2013-09-10 14:29 UTC, Tom London	no flags	Details
Output of "systemd-analyze plot" of system with /var/log/journal and with auditd disabled (log_level=debug) (146.54 KB, image/svg+xml) 2013-09-11 02:14 UTC, Tom London	no flags	Details
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal and with auditd disabled (debug_level=debug) (932.93 KB, text/plain) 2013-09-11 02:15 UTC, Tom London	no flags	Details
Output of "strace -tt -p $(pidof dbus-daemon)" running with /var/log/journal, removed nscd and systemd.log_level=debug (2.14 MB, text/plain) 2013-09-14 14:46 UTC, Tom London	no flags	Details
Output of "systemd-analyze plot" of system with /var/log/journal, no nscd, debug-shell (149.64 KB, image/svg+xml) 2013-09-14 14:47 UTC, Tom London	no flags	Details
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal and nscd removed (debug_level=debug) (934.49 KB, text/plain) 2013-09-14 14:48 UTC, Tom London	no flags	Details
Output of "strace -tt -p $(pidof systemd-journald)" running with /var/log/journal in "auto" and systemd.log_level=debug (3.11 MB, text/plain) 2013-11-08 14:46 UTC, Tom London	no flags	Details
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal (debug_level=debug) (8.81 MB, text/plain) 2013-11-08 14:47 UTC, Tom London	no flags	Details
Output of "strace -T -tt -p $(pidof systemd-journald)" running with /var/log/journal in "auto" (4.18 MB, text/plain) 2013-11-09 15:01 UTC, Tom London	no flags	Details
Output of "strace -T -tt -p $(pidof systemd-journald)" running with /var/log/journal in "auto" (3.63 MB, text/plain) 2013-11-11 14:49 UTC, Tom London	no flags	Details
Output of "strace -T -tt -p $(pidof systemd-journald)" running with /var/log/journal in "auto" (4.18 MB, text/plain) 2013-11-12 14:23 UTC, Tom London	no flags	Details
output from journalctl -ab (249.09 KB, text/x-log) 2013-11-21 10:53 UTC, Petr Schindler	no flags	Details
Output of "perf report -g" (7.95 MB, text/plain) 2013-11-23 01:05 UTC, Tom London	no flags	Details
View All

Description Tom London 2013-09-10 14:25:17 UTC

Description of problem:
Running latest Rawhide, x86_64 system.

Booting with /var/log/journal existing takes 27 seconds longer that when I boot with no /var/log/journal: I did "mv /var/log/journal /var/log/journal.old" just before reboot).

Booting without /var/log journal reports taking 25.4 seconds.
Booting with /var/log/journal reports taking 52.0 seconds.

[These are consecutive reboots, about 2 minutes apart.]

This occurs every time I boot with /var/log/journal.

I finally got booting with "systemd.log_level=debug" to work: appears I also needed "debug" as a kernel commandline option.

I attach the output of "journalctl -o short-monotonic -ab" and "systemctl analyze plot" booting both without /var/log/journal and then booting with /var/log/journal.

Version-Release number of selected component (if applicable):
systemd-206-11.fc21.x86_64
kernel-3.12.0-0.rc0.git16.2.fc21.x86_64
dracut-032-23.git20130904.fc21.x86_64

How reproducible:
Every boot....

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Tom London 2013-09-10 14:26:22 UTC

Created attachment 796028 [details]
Output of "systemd-analyze plot" of system without /var/log/journal (systemd.log_level=debug)

Comment 2 Tom London 2013-09-10 14:27:15 UTC

Created attachment 796029 [details]
Output of "journalctl -o short-monotonic -ab" of boot with no /var/log/journal (debug_level=debug)

Comment 3 Tom London 2013-09-10 14:28:39 UTC

Created attachment 796030 [details]
Output of "systemd-analyze plot" of system with /var/log/journal (systemd.log_level=debug)

Comment 4 Tom London 2013-09-10 14:29:46 UTC

Created attachment 796031 [details]
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal (systemd.log_level=debug)

Comment 5 Harald Hoyer 2013-09-10 16:45:41 UTC

[   21.404905] tlondon.localhost.org systemd-journal[294]: Permanent journal is using 33.9M (max 4.0G, leaving 4.0G of free 83.8G, current limit 4.0G).
[   20.491636] tlondon.localhost.org systemd[1]: Accepted connection on private bus.
[   51.691885] tlondon.localhost.org auditd[418]: Started dispatcher: /sbin/audispd pid: 424
[   51.692546] tlondon.localhost.org audispd[424]: priority_boost_parser called with: 4
[   51.692868] tlondon.localhost.org audispd[424]: max_restarts_parser called with: 10
[   51.693189] tlondon.localhost.org audispd[424]: audispd initialized with q_depth=150 and 1 active plugins
[   51.693984] tlondon.localhost.org auditd[418]: Init complete, auditd 2.3.2 listening for events (startup state enable)


Hmm... is auditd the culprit? Does it parse all the journal entries? Does it store a lot of journal entries?

Comment 6 Tom London 2013-09-11 02:13:35 UTC

Maybe not....

I disabled auditd ('systemctl disable auditd.service') and rebooted with /var/log/journal and with 'systemd.log_level=debug'.

[root@tlondon ~]# systemctl status auditd.service
auditd.service - Security Auditing Service
   Loaded: loaded (/usr/lib/systemd/system/auditd.service; disabled)
   Active: inactive (dead)

[root@tlondon ~]# 

[root@tlondon ~]# systemd-analyze
Startup finished in 2.985s (kernel) + 7.764s (initrd) + 1min 40.495s (userspace) = 1min 51.246s
[root@tlondon ~]# 

I attach below both 'systemd-analyze plot' and 'journalctl -o short-monotonic -ab':

[   20.076868] tlondon.localhost.org systemd[1]: Forked /usr/bin/systemctl as 41
2
[   20.077062] tlondon.localhost.org systemd[1]: systemd-journal-flush.service c
hanged dead -> start
[   20.077093] tlondon.localhost.org systemd[1]: Set up jobs progress timerfd.
[   20.077871] tlondon.localhost.org systemd[1]: Received SIGCHLD from PID 406 (
mount).
[   21.363126] tlondon.localhost.org systemd-journal[282]: Permanent journal is 
using 37.9M (max 4.0G, leaving 4.0G of free 83.8G, current limit 4.0G).
[   20.078424] tlondon.localhost.org systemd[1]: Got SIGCHLD for process 406 (mo
unt)
[   49.890559] tlondon.localhost.org alsactl[419]: alsactl 1.0.27.2 daemon start
ed
[   49.912065] tlondon.localhost.org bluetoothd[420]: Bluetooth daemon 5.8
[   49.960747] tlondon.localhost.org nscd[424]: 424 cannot stat() file `/etc/netgroup': No such file or directory
[   49.976228] tlondon.localhost.org /usr/sbin/gpm[436]: *** info [daemon/startup.c(136)]:
[   49.976631] tlondon.localhost.org /usr/sbin/gpm[436]: Started gpm successfully. Entered daemon mode.
[   49.984994] tlondon.localhost.org acpid[440]: starting up with netlink and the input layer

Comment 7 Tom London 2013-09-11 02:14:34 UTC

Created attachment 796207 [details]
Output of "systemd-analyze plot" of system with /var/log/journal and with auditd disabled (log_level=debug)

Comment 8 Tom London 2013-09-11 02:15:35 UTC

Created attachment 796208 [details]
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal and with auditd disabled (debug_level=debug)

Comment 9 Harald Hoyer 2013-09-11 06:06:41 UTC

Ok, from viewing the plot, it seems to be the systemd-logind.service.


[   50.345844] tlondon.localhost.org systemd-logind[452]: Failed to enable subscription: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

Comment 10 Tom London 2013-09-11 13:46:17 UTC

Something for me to do/test?

"blame" says:

         31.171s NetworkManager.service
         31.067s bluetooth.service
         30.985s avahi-daemon.service
         30.608s accounts-daemon.service
         30.602s systemd-logind.service
         30.334s haveged.service
         30.318s rtkit-daemon.service
         30.277s gdm.service
         10.253s mcelog.service
         10.049s abrt-ccpp.service
          1.768s systemd-vconsole-setup.service
          1.230s systemd-binfmt.service
          1.171s systemd-fsck-root.service
          1.163s colord.service
          1.085s proc-sys-fs-binfmt_misc.mount
          1.078s dev-mqueue.mount
          1.069s sys-kernel-debug.mount
          1.066s dev-hugepages.mount
          1.052s udisks2.service
          1.028s kmod-static-nodes.service

Something I can/should disable?

Comment 11 Harald Hoyer 2013-09-12 09:03:17 UTC

you could check, if "audit=0" or "selinux=0" makes a difference.

Comment 12 Harald Hoyer 2013-09-12 11:58:40 UTC

Hmm, systemd-logind needs dbus.
[   21.216899] systemd[452]: Executing: /usr/lib/systemd/systemd-logind
[   21.260298] systemd[457]: Executing: /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
[   49.984185] systemd[1]: systemd-logind.service's D-Bus name org.freedesktop.login1 now registered by :1.1
[   49.984243] systemd[1]: systemd-logind.service changed start -> running
[   50.073982] dbus[457]: [system] Successfully activated service 'org.freedesktop.systemd1'
[   50.345844] systemd-logind[452]: Failed to enable subscription: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

logind.c
manager_connect_bus()

        r = bus_method_call_with_reply(
                        m->bus,
                        "org.freedesktop.systemd1",
                        "/org/freedesktop/systemd1",
                        "org.freedesktop.systemd1.Manager",
                        "Subscribe",
                        NULL,
                        &error,
                        DBUS_TYPE_INVALID);
        if (r < 0) {
                log_error("Failed to enable subscription: %s", bus_error(&error, r));
                dbus_error_free(&error);
        }

Comment 13 Tom London 2013-09-12 13:20:58 UTC

(In reply to Harald Hoyer from comment #11)
> you could check, if "audit=0" or "selinux=0" makes a difference.

In comment #6 above, I tried booting with auditd disabled:

[   13.331231] tlondon.localhost.org systemd[1]: Installed new job systemd-logind.service/start as 203
[   50.345844] tlondon.localhost.org systemd-logind[452]: Failed to enable subscription: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
[   50.346219] tlondon.localhost.org systemd-logind[452]: New seat seat0.
[   50.346530] tlondon.localhost.org systemd-logind[452]: Watching system buttons on /dev/input/event2 (Power Button)

So, presuming this is similar to "audit=0", would guess that auditd is not causative. 

I've previously tried booting both with "enforcing=0" and "enforcing=1" (my default), with no apparent differences....

I'll test this again and post...

Comment 14 Harald Hoyer 2013-09-12 13:23:31 UTC

(In reply to Tom London from comment #13)
> I'll test this again and post...

no need to.. it's more likely a dbus problem.

Comment 15 Zbigniew Jędrzejewski-Szmek 2013-09-12 13:34:49 UTC

(In reply to Harald Hoyer from comment #14)
> (In reply to Tom London from comment #13)
> > I'll test this again and post...
> 
> no need to.. it's more likely a dbus problem.

systemd-journald.service has Requires=dbus.socket, After=dbus.socket, so the dependency should be satisfied. How long is the timeout? Here
the delta between starting logind and it reporting failure is at most one second.

Zbyszek

Comment 16 Lennart Poettering 2013-09-13 03:38:14 UTC

Sounds suspiciously like bug 983688.

Comment 17 Tom London 2013-09-13 13:55:07 UTC

Hints on how to 'strace /bin/dbus-daemon'?

I tried changing /lib/systemd/system/dbus.service like:

--- dbus.service	2013-09-13 06:30:55.432573031 -0700
+++ dbus.service.strace	2013-09-13 06:32:21.282499760 -0700
@@ -4,6 +4,6 @@
 After=syslog.target
 
 [Service]
-ExecStart=/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
+ExecStart=/bin/strace -tt -o /var/tmp/dbus-strace /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
 ExecReload=/bin/dbus-send --print-reply --system --type=method_call --dest=org.freedesktop.DBus / org.freedesktop.DBus.ReloadConfig
 OOMScoreAdjust=-900

[i.e., adding /bin/strace -tt -o FILE ...rest of line...]

but rebooting (or restarting service) fails miserably.

Comment 18 Lennart Poettering 2013-09-13 20:06:21 UTC

do "systemctl enable debug-shell". Then reboot. This will give you a debug shell unconditionally very early onon Alt-F9. Then do "strace -p $PID" where PID is the PID of the system dbus instance. You could use `pidof dbus-daemon`, but that will only work as long as a single dbus instance is running.

Comment 19 Tom London 2013-09-14 14:44:38 UTC

Thanks for the info/hint.

Here is what I did: I ran 'yum remove nscd' and 'systemctl enable debug-shell'. I also created a helper script '/bin/strace-dbus.sh':

[root@tlondon ~]# cat /bin/strace-dbus.sh 
#!/bin/bash

/bin/strace -tt -p $(pidof dbus-daemon) -o /var/tmp/strace-dbus-$$ &
[root@tlondon ~]# 

I then rebooted with 'systemd.debug_level=debug debug enforcing=0' (the last just in case debug-shell and/or strace would otherwise run into SELINUX issues).

I allowed the boot to proceed past the "Welcome to Fedora 21 (Rawhide)" message, Alt-F9, and ran 'strace-dbus.sh'.

It said it attached to something, more journal-type messages spewed on the console, and after a short while I got Gnome login.

I attach below 3 files:

1. strace-dbus-568, the output of 'strace -tt -p $(pidof debus-daemon) -o ...'. [Actually, this is a truncated copy of that file; I first did 'cp /var/tmp/strace-dbus-568 ~'.

2. Output of 'systemd-analyze plot'

3. Output of 'journalctl -o short-monotonic -ab'

Hope this helps.

Let me know if I need to rerun this, etc.

Comment 20 Tom London 2013-09-14 14:46:18 UTC

Created attachment 797655 [details]
Output of "strace -tt -p $(pidof dbus-daemon)" running with /var/log/journal, removed nscd and systemd.log_level=debug

Comment 21 Tom London 2013-09-14 14:47:30 UTC

Created attachment 797656 [details]
Output of "systemd-analyze plot" of system with /var/log/journal, no nscd, debug-shell

Comment 22 Tom London 2013-09-14 14:48:25 UTC

Created attachment 797657 [details]
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal and nscd removed (debug_level=debug)

Comment 23 Tom London 2013-10-01 13:26:56 UTC

Any more info I can provide here?

Issue remains with latest Rawhide.

Comment 24 Michal Schmidt 2013-11-08 08:55:55 UTC

(In reply to Tom London from comment #20)
> Output of "strace -tt -p $(pidof dbus-daemon)"

> 07:32:21.958038 sendto(13, "<29>Sep 14 07:32:07 dbus[523]: ["..., 97,
>                        MSG_NOSIGNAL, NULL, 0) = 97
> 07:32:50.732313 open("/sys/fs/selinux/access", O_RDWR) = 14

dbus-daemon was blocked for half a minute trying to send a log line.
So the next debugging step should be to strace systemd-journald to find out what it is doing during that time.

Comment 25 Tom London 2013-11-08 14:46:04 UTC

Created attachment 821648 [details]
Output of "strace -tt -p $(pidof systemd-journald)" running with /var/log/journal in "auto" and systemd.log_level=debug

Repeated above steps: "enabled" persistent journal ("auto" setting in journal.conf), rebooted with "enforcing=0 systemd.log_level=debug debug", entered debug shell, and ran "strace-journald.sh"

Here is "strace-journald.sh":

[root@tlondon ~]# cat /bin/strace-journald.sh 
#!/bin/bash

/bin/strace -tt -p $(pidof systemd-journald) -o /var/tmp/strace-systemd-journald-$$ &
[root@tlondon ~]# 

Also attaching output of "journalctl -o short-monotonic -ab" below.

Comment 26 Tom London 2013-11-08 14:47:35 UTC

Created attachment 821653 [details]
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal (debug_level=debug)

Comment 27 Michal Schmidt 2013-11-08 16:29:30 UTC

(In reply to Tom London from comment #25)
> Output of "strace -tt -p $(pidof systemd-journald)"

Thanks. The strace started and 06:34:13.29 and it shows that systemd-journald was kept busy until 06:34:33.88 (ie. for 20 seconds) flushing the journal from /run to /var. Then it deleted /run/log/journal and performed fsync() (which took 3 more seconds). Only then it started processing input events.

The question is why is the copying from /run to /var so slow?
A typical iteration (there were 233 of them) in the strace looks like this:

06:34:13.293319 mmap(NULL, 4198400, PROT_READ|PROT_WRITE, MAP_SHARED, 16, 0x2635000) = 0x7f00a5944000
06:34:13.293391 munmap(0x7f00a5289000, 2138112) = 0
06:34:13.469752 fstatfs(16, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=118588712, f_bfree=25014566, f_bavail=18984845, f_files=30130176, f_ffree=29529075, f_fsid={107853651, -1872570476}, f_namelen=255, f_frsize=4096}) = 0
06:34:13.469881 fstat(16, {st_mode=S_IFREG|0640, st_size=44261376, ...}) = 0
06:34:13.469991 fallocate(16, 0, 44261376, 4096) = 0
06:34:13.470132 fstat(16, {st_mode=S_IFREG|0640, st_size=44265472, ...}) = 0

The mmap/munmap pair comes from journal_file_move_to() -> mmap_cache_get() -> add_mmap() [-> make_room()].

fstatfs/fstat/fallocate/fstat come from journal_file_allocate().

There is a significant delay between munmap() and the subsequent fstatfs(). I wonder if it's the munmap() itself that takes long, or is it the work that systemd-journald does in between.

Tom, could you repeat the strace, but with the -T option added to see how long each syscall took?

Comment 28 Tom London 2013-11-08 17:11:45 UTC

Certainly.

Will do so and post this evening when I get back from my day job...

Comment 29 Michal Schmidt 2013-11-08 17:22:56 UTC

Additionally, the strace shows systemd-journald uses a silly pattern of overlapping mmaps.

Looking only at mmap() calls:

06:34:13.293319 mmap(NULL, 4198400, PROT_READ|PROT_WRITE, MAP_SHARED, 16, 0x2635000) = 0x7f00a5944000
06:34:13.470218 mmap(NULL, 4198400, PROT_READ|PROT_WRITE, MAP_SHARED, 16, 0x2636000) = 0x7f0099082000
06:34:13.671046 mmap(NULL, 4198400, PROT_READ|PROT_WRITE, MAP_SHARED, 16, 0x2637000) = 0x7f00a5092000
06:34:13.886431 mmap(NULL, 4198400, PROT_READ|PROT_WRITE, MAP_SHARED, 16, 0x2638000) = 0x7f00a5543000
06:34:14.427309 mmap(NULL, 4198400, PROT_READ|PROT_WRITE, MAP_SHARED, 16, 0x2639000) = 0x7f0098c81000
...

See that it creates mmap windows 4100 KB large. But every time the mmap offset moves only by a single page (0x1000 = 4K). systemd-journald keeps 64 of these windows mmapped. Therefore most of these windows must be overlapping.

It also uses fallocate to append 4 KB pieces to the end of the file.

Wouldn't it be much better to fallocate 4 MB pieces and have significantly less (or even no) overlap of the mmap windows?

Comment 30 Michal Schmidt 2013-11-08 17:35:28 UTC

Another question pops up in my head. Why does systemd-journald go to the trouble of copying items from the /run journal to the /var journal one by one? Couldn't it simply copy the journal file(s) as a whole?

Comment 31 Michal Schmidt 2013-11-08 18:03:44 UTC

Lennart answered my question from comment #30 on IRC:

<poettering> mschmidt: the reason we read the files from /run object-by-object is simply that i wanted to reduce the number of files we have around, simply by merging them when we copy
<poettering> mschmidt: usually the files in /run will be smaller, since the disk space is not that big
<poettering> mschmidt: so, by reading the objects one-by-one we repack everything into larger files
<poettering> which has the benefit that [...] we keep identical objects around only once


And he's now been made aware of the inefficient mmap pattern, so I'm expecting a fix for that soon :-)

Comment 32 Tom London 2013-11-09 15:01:23 UTC

Created attachment 821903 [details]
Output of "strace -T -tt -p $(pidof systemd-journald)" running with /var/log/journal in "auto"

Per request, strace with -T in addition to -tt.

Comment 33 Tom London 2013-11-11 14:49:08 UTC

Created attachment 822444 [details]
Output of "strace -T -tt -p $(pidof systemd-journald)" running with /var/log/journal in "auto"

Another example of "strace -T -t -p $(pidof systemd-journald)"

Comment 34 Tom London 2013-11-12 14:23:26 UTC

Created attachment 822998 [details]
Output of "strace -T -tt -p $(pidof systemd-journald)" running with /var/log/journal in "auto"

Another example of "strace -T -tt -p $(pidor systemd-journald)" ....

Are these helpful?

More I should/can provide?

Comment 35 Karel Volný 2013-11-12 16:29:56 UTC

just FTR, this is basically bug #967521 ...

Comment 36 Rex Dieter 2013-11-12 16:35:06 UTC

kinda, though turned out bug #967521 was more about tracking some breakage (in plymouth and kdm) caused by boot slowdowns.

Comment 37 Karel Volný 2013-11-14 10:20:36 UTC

(In reply to Rex Dieter from comment #36)
> kinda, though turned out bug #967521 was more about tracking some breakage
> (in plymouth and kdm) caused by boot slowdowns.

yeah, it got pretty messed, basically at least three issues that should have been tracked as three separate bugs ...

1) slow journal blocking other things - let this one be primary bug for this

2) kdm timeout for X server - discussed along bug #967521

3) plymouth/plymout-quit-wait breakage - bug #983796 suggest to add "Before=graphical.target" and bug #967521 removes the timeout

what else I'm seeing is dbus communication being blocked

Comment 38 Karel Volný 2013-11-14 10:23:08 UTC

*** Bug 962748 has been marked as a duplicate of this bug. ***

Comment 39 Fedora Blocker Bugs Application 2013-11-14 11:16:24 UTC

Proposed as a Blocker for 20-final by Fedora user kvolny using the blocker tracking app because:

Proposing as a final blocker as per discussion with jreznik.
This has been already proposed and rejected under the bug number 967521.

I have read the meeting log, and I have to clarify a few things that were missed:

1) This is basically systemd problem (well, systemd-journald, see comments 29-31 to bug 1006386). As a _consequence_ of this one, dbus is blocked and some other problems surface, including the one that was released as a "fix" for bug 967521 in plymouth package. But the core bug is present even if plymouth is not installed at all.

2) It is NOT "just making startup slow". Services often fail to start completely or hang in non-working state, so that kdm/gdm never comes up, sometimes it fails so miserably that you even cannot login on console and you're left to pray that at least ctrl-alt-del works.

3) "How often" - I don't have the stats neither, but I have experienced the problem in its worst form on two laptops out of three I administer; the original bug 967521 has 78 people in CC now, bug 983796 has 14 poeple, and there seem to be dozens of duplicates (I haven't gone through every suspicious bug yet). Considering how many people bother to use Bugzilla out of all the users, I'd suspect this is a major plague.

Basically, the problem affects *every single system*, only the consequences differ based on the speed of hardware and how many messages the system logs.

4) "if it's just 'cycle and reboot and it works the next time" - wrong. In some cases, people are lucky that they are able to get kdm/gdm in a few tries. But the problem gets worse over time up to the point when kdm/gdm never comes up. No more rebooting helps.

5) Yes, the bug is not new in F20. But you seem to have missed the point that systemd-journald is now being forced on everyone, all the proposed workarounds for the bug would break newly installed systems. As a further reading about logging issues, I recommend comment 190 to bug 967521 and comment 6 to bug 998573. IMHO these are valid points not to have systemd-journald as default for F20, i.e. to revert the whole "No Default Syslog" (sic!) change.

Comment 40 Mike Ruckman 2013-11-14 17:57:21 UTC

Discussed in 2013-11-14 Blocker Review Meeting [1]. We would like to see more information from systemd developers before deciding a blocker bug question. Also, we would like to get confirmation this affects F20, because this report only mentions Rawhide and the linked (fixed) reports go to F19.

[1] http://meetbot.fedoraproject.org/fedora-blocker-review/2013-11-14/

Comment 41 Andre Robatino 2013-11-14 19:35:02 UTC

If this bug is why I don't get the gdm screen at least half the time in my F20 VirtualBox guest, then yes, it affects F20. (When it happens, and I go to a VT, it does show a gdm process running, but I still have to run "systemctl restart gdm" to actually get the gdm screen.) I'd be happy to provide logs, since it's easy to reproduce. Just tell me which ones.

Comment 42 Scott Schmit 2013-11-14 23:41:46 UTC

(In reply to Karel Volný from comment #37)
> yeah, it got pretty messed, basically at least three issues that should have
> been tracked as three separate bugs ...

When this first showed up, nobody knew what the problem was other than that "rm -rf /var/log/journal" fixed it.  You can't file good bugs when no one knows what the problem even is.  That it's early in the boot sequence doesn't help either.

Comment 43 Karel Volný 2013-11-15 09:25:27 UTC

(In reply to Andre Robatino from comment #41)
> If this bug is why I don't get the gdm screen at least half the time in my
> F20 VirtualBox guest, then yes, it affects F20. (When it happens, and I go
> to a VT, it does show a gdm process running, but I still have to run
> "systemctl restart gdm" to actually get the gdm screen.) I'd be happy to
> provide logs, since it's easy to reproduce. Just tell me which ones.

the ultimate proof would be strace of the process (see comment #27), but I guess it would be enough to see if the output of `systemd-analyze blame` is similar to that in comment #10, i.e. multiple services (including gdm in your case) reporting tenths of seconds spent?

Comment 44 Andre Robatino 2013-11-15 18:00:52 UTC

First providing output of the beginning of `systemd-analyze blame' for F20 after graphical gdm fails to appear. For comparison, also providing it for Rawhide after graphical gdm appears. I believe that gdm fails to come up more often on F20 than on Rawhide. I'll look into providing the strace output for F20 as well.

F20, graphical gdm does not appear:

         22.335s firewalld.service
         18.094s lvm2-pvscan@8:2.service
         15.078s plymouth-quit-wait.service
         15.000s accounts-daemon.service
         11.887s ModemManager.service
         10.678s rtkit-daemon.service
         10.210s avahi-daemon.service
          6.868s chronyd.service
          6.491s lm_sensors.service
          5.763s vboxadd.service
          5.705s vboxadd-x11.service
          5.615s systemd-user-sessions.service
          5.455s systemd-logind.service
          5.348s mcelog.service
          5.232s abrt-ccpp.service
          4.927s rsyslog.service
          4.522s gdm.service
          2.817s polkit.service
          2.458s vboxadd-service.service
          2.233s systemd-udev-settle.service
          2.159s nfs-lock.service
          1.802s proc-fs-nfsd.mount
          1.376s var-lib-nfs-rpc_pipefs.mount

Rawhide, graphical gdm appears:

    5min 30.832s dnf-makecache.service
         23.010s firewalld.service
         20.290s plymouth-quit-wait.service
         10.958s accounts-daemon.service
          9.188s ModemManager.service
          7.662s vboxadd.service
          7.411s rtkit-daemon.service
          6.975s avahi-daemon.service
          6.595s vboxadd-x11.service
          6.366s systemd-logind.service
          6.179s systemd-tmpfiles-clean.service
          6.049s user
          5.710s gdm.service
          3.897s rsyslog.service
          2.706s systemd-udev-settle.service
          2.475s vboxadd-service.service
          2.422s polkit.service
          2.097s chronyd.service
          1.934s proc-fs-nfsd.mount
          1.763s NetworkManager.service
          1.427s lvm2-pvscan@8:2.service
          1.409s lvm2-monitor.service
          1.405s lm_sensors.service

Comment 45 Andre Robatino 2013-11-15 22:04:07 UTC

The info on how the strace output was gotten was not sufficiently complete for me to be confident on reproducing it. Does the above suffice?

Comment 46 Andre Robatino 2013-11-16 13:35:05 UTC

F20. graphical gdm appears:

         26.491s firewalld.service
         18.584s accounts-daemon.service
         18.534s lvm2-pvscan@8:2.service
         16.674s plymouth-quit-wait.service
         14.506s ModemManager.service
         12.161s rtkit-daemon.service
          8.738s avahi-daemon.service
          7.803s systemd-logind.service
          6.826s vboxadd-x11.service
          6.416s chronyd.service
          6.229s mcelog.service
          5.278s rsyslog.service
          4.825s vboxadd.service
          4.758s polkit.service
          4.698s systemd-user-sessions.service
          4.544s lm_sensors.service
          3.865s abrt-ccpp.service
          3.552s gdm.service
          3.176s nfs-lock.service
          2.979s bluetooth.service
          2.395s lvm2-monitor.service
          2.349s systemd-udev-settle.service
          2.089s proc-fs-nfsd.mount

Comment 47 Tom London 2013-11-16 17:32:22 UTC

(In reply to Andre Robatino from comment #45)
> The info on how the strace output was gotten was not sufficiently complete
> for me to be confident on reproducing it. Does the above suffice?

Here is what I did to generate the straces.

First (with hints from Lennart):

1. do "systemctl enable debug-shell" and reboot.
2. During the boot sequence, enter Alt-F9; this should bring up a debug shell. Try to do this before the boot sequence pauses/hangs (about 20-25 seconds on my laptop).
3. in the debug shell, enter "strace -T -tt -p $(pidof systemd-journald) -o /var/tmp/strace-systemd-journald-$$ &"
4. that should produce the strace output in /var/tmp/strace* so it is available after boot.

Comment 48 Andre Robatino 2013-11-16 20:04:04 UTC

It may be a Heisenbug, gdm insists on coming up now. I'll try again later. Just to be clear, I'll try to enter Ctrl-F9 (the key sequence for getting to VT9 in vbox) repeatedly so I get to VT9 as fast as possible. Once I do that, does it matter how quickly I type the strace command in 3.)? It's not remembered across reboots, so I can't just up-arrow. Your instructions suggest that it doesn't matter, and that the only step that's time-critical is 2.) But I would think that when strace attaches to a process, it only grabs what happens after you attach. In this case, time would be critical in 3.) as well, and I would have to create a script with a short name and put the command in it.

Comment 49 Andre Robatino 2013-11-16 20:10:59 UTC

Also, to other testers who aren't seeing this, try testing it on slow hardware. Testers tend to have faster-than-average hardware, so they don't see these bugs, then they hit ordinary users after release. There might be various ways to slow down fast hardware, like using a boot argument to limit the RAM, for example.

Comment 50 Scott Schmit 2013-11-17 07:15:57 UTC

I also see it if I've left my system running for a long time--giving journald a lot of logs from the last boot to work with when I boot up.

Comment 51 Michal Schmidt 2013-11-19 10:56:40 UTC

I believe the traces provided by Tom London are sufficient. No need to attach more at this point.

Comment 52 Adam Williamson 2013-11-20 19:05:52 UTC

Discussed at 2013-11-20 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-11-20/f20-blocker-review.2013-11-20-17.00.log.txt . We still haven't had a systemd developer confirm Karel's evaluation / description from comment #39; if we take that description at face value we would take this bug as a blocker, but we all felt that we'd like to have a systemd developer confirm it first. Michal, could you or another dev state whether you agree or disagree with Karel's evaluation of the nature and significance of the issue? Thanks!

Comment 53 Petr Schindler 2013-11-21 10:53:10 UTC

Created attachment 827126 [details]
output from journalctl  -ab

I see this problem quite a lot. My boot ends with end screen and no gdm appears. You can find journal from my last boot when this happened attached.

Comment 54 Michal Schmidt 2013-11-21 17:26:42 UTC

I saw the slowness on Petr Schindler's machine. We confirmed that the major slowness comes from the copying of the runtime journal to the persistent journal. Masking systemd-journal-flush.service made it go away. After triggering the flush manually later after boot (SIGUSR1 to journald) there was harddisk activity for several seconds.

From the provided straces with "-T" it appears that the inefficiency described in comment #29 cannot explain the big delay incurred by journald. Yes, the mmap usage is suboptimal, but still very little time in total is spent in the syscalls. The fallocate in 4KB increments does not seem to be a problem either, the writes are not synchronous.

What we do not see in the strace are the accesses to the mmap'd areas. I suspect the problem is in reading bits of the persistent journal file, not in the writes. Perhaps journald does a lot of reads of bits of the persistent journal file in a random-like pattern. blktrace would tell.

If that's the case, a workaround could be to limit the maximum size of an individual journal file ("SystemMaxFileSize=" in /etc/systemd/journald.conf) to some low value (Petr's systemd.journal was about 80MB, so setting the key to "10M" could mitigate the issue).

(In reply to Fedora Blocker Bugs Application from comment #39)

> 1) This is basically systemd problem (well, systemd-journald, see comments
> 29-31 to bug 1006386). As a _consequence_ of this one, dbus is blocked and
> some other problems surface, including the one that was released as a "fix"
> for bug 967521 in plymouth package. But the core bug is present even if
> plymouth is not installed at all.

True. The slowness of systemd-journald affects DBus-using services hard, because when dbus-daemon itself blocks on a write to the journal, the clients tend to hit timeouts and give up.

> 2) It is NOT "just making startup slow". Services often fail to start
> completely or hang in non-working state, so that kdm/gdm never comes up,

True (I observed this on Pets Schindler's machine).

> sometimes it fails so miserably that you even cannot login on console and
> you're left to pray that at least ctrl-alt-del works.

I haven't seen this. Not sure why ctrl-alt-del wouldn't work.

> 3) "How often" - I don't have the stats neither, but I have experienced the
> problem in its worst form on two laptops out of three I administer; the
> original bug 967521 has 78 people in CC now, bug 983796 has 14 poeple, and
> there seem to be dozens of duplicates (I haven't gone through every
> suspicious bug yet). Considering how many people bother to use Bugzilla out
> of all the users, I'd suspect this is a major plague.
> Basically, the problem affects *every single system*, only the consequences
> differ based on the speed of hardware and how many messages the system logs.

I'm not seeing this on my systems, but sure, the number of CCs is telling.

> 4) "if it's just 'cycle and reboot and it works the next time" - wrong. In
> some cases, people are lucky that they are able to get kdm/gdm in a few
> tries. But the problem gets worse over time up to the point when kdm/gdm
> never comes up. No more rebooting helps.

Could be.

> 5) Yes, the bug is not new in F20. But you seem to have missed the point
> that systemd-journald is now being forced on everyone

systemd-journald has been in Fedora since F17 and you could never remove it. In F17 and F18 only runtime journal was stored by default.
In F19 journal persistence was enabled by default. F20 does not change anything in this regard.

> all the proposed workarounds for the bug would break newly installed systems.

I disagree. Disabling journal persistence does not result in a broken system. Neither does limiting the persistent journal file size.

> As a further
> reading about logging issues, I recommend comment 190 to bug 967521

That comment is mostly a rant about timestamps and chronological order. (Note that wherever possible, the journal stores not just the receive timestamp, but also the sending timestamp. And *of course* the receive timestamp can be delayed by tens of seconds when journald blocks that long.)

> and comment 6 to bug 998573.

No idea what Jóhann was talking about there. In any case, irrelevant to this BZ.

> IMHO these are valid points not to have
> systemd-journald as default for F20, i.e. to revert the whole "No Default
> Syslog" (sic!) change.

Whether we install rsyslog or not by default has nothing to do with this issue.
rsyslog does not replace journald, it merely supplements it.

Comment 55 Adam Williamson 2013-11-21 17:34:39 UTC

Thanks, Michal. Given all that I think I'd vote +1 blocker at least provisionally; this is one of those ones we might have to give up and waive if we just can't come up with a plausible fix in a reasonable time, but it certainly seems like a serious problem that ought to be fixed in a final release.

Comment 56 Jóhann B. Guðmundsson 2013-11-21 17:36:59 UTC

+1 blocker

Comment 57 Tom London 2013-11-23 01:05:52 UTC

Created attachment 828047 [details]
Output of "perf report -g"

Not sure its helpful, but I decided to run 'perf record -g' from the debug shell, and then ran 'perf report -g >perf.txt' from graphical terminal window.

Here are the top few lines from 'perf report' (actually, all lines with a percentage >= 2.0):

+   9.10%  systemd-journal  [kernel.kallsyms]      [k] __lock_acquire
+   6.57%  systemd-journal  [kernel.kallsyms]      [k] __kernel_text_address
+   4.78%  systemd-journal  [kernel.kallsyms]      [k] print_context_stack
+   3.98%  systemd-journal  [kernel.kallsyms]      [k] avtab_search_node
+   3.94%  systemd-journal  [kernel.kallsyms]      [k] __module_text_address
+   3.76%  systemd-journal  [kernel.kallsyms]      [k] __module_address
+   3.49%  systemd-journal  [kernel.kallsyms]      [k] sched_clock_local
+   3.17%  systemd-journal  [kernel.kallsyms]      [k] is_module_text_address
+   2.41%  systemd-journal  [kernel.kallsyms]      [k] lock_release
+   2.34%  systemd-journal  [kernel.kallsyms]      [k] local_clock
+   2.33%  systemd-journal  [kernel.kallsyms]      [k] native_read_tsc
+   2.02%  systemd-journal  [kernel.kallsyms]      [k] memchr_inv


Is this in any way useful? More I can provide?

Comment 58 Tom London 2013-11-23 01:08:23 UTC

Forgot to be explicit: I ran 'perf record' on $(pidof systemd-journald).

Comment 59 Karel Volný 2013-11-25 11:33:08 UTC

(In reply to Michal Schmidt from comment #54)
> > sometimes it fails so miserably that you even cannot login on console and
> > you're left to pray that at least ctrl-alt-del works.
> 
> I haven't seen this. Not sure why ctrl-alt-del wouldn't work.

I saw that (no text login prompt at all) about once in five tries on a machine which was in the state that kdm never came up - but only when fooling around with strace which slowed the startup further

yes, ctrl-alt-del had worked, take the "pray that" part with a grain of salt :-)

> > 5) Yes, the bug is not new in F20. But you seem to have missed the point
> > that systemd-journald is now being forced on everyone
> 
> systemd-journald has been in Fedora since F17 and you could never remove it.
> In F17 and F18 only runtime journal was stored by default.
> In F19 journal persistence was enabled by default. F20 does not change
> anything in this regard.
> 
> > all the proposed workarounds for the bug would break newly installed systems.
> 
> I disagree. Disabling journal persistence does not result in a broken
> system. Neither does limiting the persistent journal file size.
>
> > IMHO these are valid points not to have
> > systemd-journald as default for F20, i.e. to revert the whole "No Default
> > Syslog" (sic!) change.
> 
> Whether we install rsyslog or not by default has nothing to do with this
> issue.
> rsyslog does not replace journald, it merely supplements it.

pardon my ignorance then, but ...

I thought that the way things work now is that journald "eats" all messages then it resends them to rsyslog which writes /var/log/messages

at least on my system:

[root@unused-4-148 ~]# fuser -v /var/log/messages
                     UŽIVATEL    PID PŘÍSTUP PŘÍKAZ
/var/log/messages:   root        544 f.... abrt-watch-log
                     root        566 F.... rsyslogd

so, journald not keeping logs(*) & no rsyslog = no logs (from before last boot) at all

which is a major system breakage, IMHO(!)

(*) unless I got the "Disabling journal persistence" part wrong too; the other workaround being "keep deleting logs (/var/log/journal) manually from time to time until this bug gets fixed" and similar (yes, limiting the size closely to zero seems to me almost the same, or, maybe it is even worse)

Comment 60 Jaroslav Reznik 2013-11-25 13:32:35 UTC

Michal, Petr - have you tried that suggested "fix" by limiting the size of an individual journal file? Any results?

Comment 61 Petr Schindler 2013-11-25 14:51:29 UTC

I've tried right now. I limited the size to 10 MB and it boots every time. I've tried it few times, so the size could raise above 10MB and it worked correctly and created new journal. So this fix helped to me.

Comment 62 Karel Volný 2013-11-26 12:11:19 UTC

(In reply to Petr Schindler from comment #61)
> I've tried right now. I limited the size to 10 MB and it boots every time.
> I've tried it few times, so the size could raise above 10MB and it worked
> correctly and created new journal.

sooo ... do I read this correctly that it doesn't rotate like I thought but rather keeps adding files?

> So this fix helped to me.

in this case, and if the above is true, and if we have some configuration option to limit the number of files and total size(*), I think reasonable "solution" for F20 would be to ship default configuration that imposes this limit

(*) it raised above 10 MiB after a few boots? - whoa, the whole /var/log on my home machine has only about 11 MiB after a few months with text logs (some bits compressed after logrotate run)

Comment 63 Petr Schindler 2013-11-26 13:48:55 UTC

(In reply to Karel Volný from comment #62)
> (In reply to Petr Schindler from comment #61)
> > I've tried right now. I limited the size to 10 MB and it boots every time.
> > I've tried it few times, so the size could raise above 10MB and it worked
> > correctly and created new journal.
> 
> sooo ... do I read this correctly that it doesn't rotate like I thought but
> rather keeps adding files?

Yes, it creates new system.journal before the old could have above 10M. Old one is moved to something like system@<some hash>.journal

For me it was done when old one had 9MB.

> 
> > So this fix helped to me.
> 
> in this case, and if the above is true, and if we have some configuration
> option to limit the number of files and total size(*), I think reasonable
> "solution" for F20 would be to ship default configuration that imposes this
> limit
> 
> (*) it raised above 10 MiB after a few boots? - whoa, the whole /var/log on
> my home machine has only about 11 MiB after a few months with text logs
> (some bits compressed after logrotate run)

It took about 8 consequential reboots (just start system and reboot). :) It grows pretty fast.

Comment 64 Lennart Poettering 2013-11-26 22:32:58 UTC

Hmm, does this occur on both ext4 and btrfs?

Comment 65 Lennart Poettering 2013-11-27 01:04:56 UTC

In journald git I made two changes now:

We will no longer allocate new space piecemeal, but in 8mb chunks.

More importantly I added a bisection cache that should minimize the negative effects of the bisection logic to find something on disk. 

The changes are not easy to backport, so I am looking for somebody who is happy to run a git version of things (need to compile yourself...)

Other than that, here's a quick hack, I'd like you to try out: please create a file /etc/sysctl.d/dgram-qlen.conf with the contents:

net.unix.max_dgram_qlen=1000

And reboot. Does this have an impact on boot times? If so this might be a dirty work-around for F20...

Comment 66 Michal Schmidt 2013-11-27 13:43:11 UTC

(In reply to Lennart Poettering from comment #65)
> The changes are not easy to backport, so I am looking for somebody who is
> happy to run a git version of things (need to compile yourself...)

The backport did not seem too hard. Here's a scratch build in progress:
http://koji.fedoraproject.org/koji/taskinfo?taskID=6231761

It adds the following patches on top of systemd-208-6.fc20:
      journal: when appending to journal file, allocate larger blocks at once
      journal: make table const
      journald: keep statistics on how of we hit/miss the mmap cache
      journal: optimize bisection logic a bit by caching the last position
      journal: fix iteration when we go backwards from the beginning of an array chain element
      journal: allow journal_file_copy_entry() to work on non-local files
      journal: simplify pre-allocation logic
      journald: mention how long we needed to flush to /var in the logs

Comment 67 Tom London 2013-11-27 14:49:35 UTC

I downgraded to 

libgudev1-208-6.fc20.bz1006386.x86_64
systemd-208-6.fc20.bz1006386.x86_64
systemd-ui-2-2.fc20.x86_64
systemd-python-208-6.fc20.bz1006386.x86_64
systemd-devel-208-6.fc20.bz1006386.x86_64
systemd-libs-208-6.fc20.bz1006386.x86_64
libgudev1-devel-208-6.fc20.bz1006386.x86_64
systemd-journal-gateway-208-6.fc20.bz1006386.x86_64
systemd-debuginfo-208-6.fc20.bz1006386.x86_64

and rebooted.

No joy. Notice the "Time spent on flushing to /var is 29.462836s for 1258 entries."

Same pause as before. Here is a snipped from 'journalctl -short-monotonic -ab':

[   29.617063] tlondon.localhost.org audispd[604]: max_restarts_parser called with: 10
[   29.618813] tlondon.localhost.org audispd[604]: audispd initialized with q_depth=150 and 1 active plugins
[   29.611355] tlondon.localhost.org kernel: type=1305 audit(1385563239.479:9): audit_pid=600 old=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
[   32.636629] tlondon.localhost.org systemd-journal[347]: Permanent journal is using 107.0M (max 4.0G, leaving 4.0G of free 84.8G, current limit 4.0G).
[   62.100998] tlondon.localhost.org systemd-journal[347]: Time spent on flushing to /var is 29.462836s for 1258 entries.
[   65.465772] tlondon.localhost.org systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
[   65.514602] tlondon.localhost.org systemd[1]: Started Security Auditing Service.
[   65.516513] tlondon.localhost.org systemd[1]: Started Recreate Volatile Files and Directories.
[   65.518354] tlondon.localhost.org systemd[1]: Starting Update UTMP about System Reboot/Shutdown...
[   65.520256] tlondon.localhost.org systemd[1]: Started Update UTMP about System Reboot/Shutdown.
[   65.522576] tlondon.localhost.org systemd[1]: Starting System Initialization.

Comment 68 Tom London 2013-11-27 14:59:03 UTC

On the other hand, the patch to /etc/systcl.d/dgram-qlen.conf appears to have worked much better.

Snippet from reboot with /etc/sysctl.d/dgram-qlen.conf.

Notice "Time spent on flushing to /var is 28.992ms for 1254 entries.":

[   29.867105] tlondon.localhost.org systemd[1]: Mounted /boot.
[   29.868869] tlondon.localhost.org systemd[1]: Starting Local File Systems.
[   29.878818] tlondon.localhost.org systemd[1]: Reached target Local File Systems.
[   29.888912] tlondon.localhost.org systemd[1]: Started Mark the need to relabel after reboot.
[   29.890701] tlondon.localhost.org systemd[1]: Started Reconfigure the system on administrator request.
[   29.898601] tlondon.localhost.org systemd[1]: Started Relabel all filesystems, if necessary.
[   29.900420] tlondon.localhost.org systemd[1]: Starting Recreate Volatile Files and Directories...
[   29.905307] tlondon.localhost.org systemd[1]: Starting Tell Plymouth To Write Out Runtime Data...
[   29.907054] tlondon.localhost.org systemd[1]: Starting Trigger Flushing of Jo
urnal to Persistent Storage...
[   29.916899] tlondon.localhost.org systemd[1]: Starting Security Auditing Service...
[   29.947814] tlondon.localhost.org auditctl[599]: No rules
[   29.949431] tlondon.localhost.org auditctl[599]: AUDIT_STATUS: enabled=0 flag=1 pid=0 rate_limit=0 backlog_limit=320 lost=0 backlog=0
[   29.955208] tlondon.localhost.org systemd[1]: Started Security Auditing Service.
[   30.163366] tlondon.localhost.org auditd[598]: Started dispatcher: /sbin/audispd pid: 606
[   32.566778] tlondon.localhost.org systemd-journal[346]: Permanent journal is using 128.1M (max 4.0G, leaving 4.0G of free 84.8G, current limit 4.0G).
[   32.598060] tlondon.localhost.org systemd-journal[346]: Time spent on flushing to /var is 28.992ms for 1254 entries.
[   33.157423] tlondon.localhost.org audispd[606]: priority_boost_parser called with: 4
[   33.239789] tlondon.localhost.org audispd[606]: max_restarts_parser called with: 10

Comment 69 Michal Schmidt 2013-11-27 16:14:19 UTC

(In reply to Tom London from comment #68)
> On the other hand, the patch to /etc/systcl.d/dgram-qlen.conf appears to
> have worked much better.
> 
> Snippet from reboot with /etc/sysctl.d/dgram-qlen.conf.
> 
> Notice "Time spent on flushing to /var is 28.992ms for 1254 entries.":

Wow. Is the result repeatable (both with & without dgram-qlen.conf)? I expected the max_dgram_qlen change would significantly reduce the chance of clients blocking on journald, but I did not expect this huge speedup of the flushing itself.

Comment 70 Tom London 2013-11-27 17:37:31 UTC

I only had time to test this once this morning before I was off to work.

I will repeat this when I get home later this afternoon.

Comment 71 Adam Williamson 2013-11-27 18:32:56 UTC

Discussed at 2013-11-27 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-11-27/f20-blocker-review-3.2013-11-27-17.01.log.txt . Accepted as a blocker due to the ability to prevent boot from succeeding. Alpha criterion: "A system installed with a release-blocking desktop must boot to a log in screen where it is possible to log in to a working desktop using a user account created during installation or a 'first boot' utility." - https://fedoraproject.org/wiki/Fedora_20_Alpha_Release_Criteria#Expected_installed_system_boot_behavior

Comment 72 Tom London 2013-11-28 01:18:32 UTC

Well, this is more complicated that I thought.

I performed 2 consecutive boots of my system, both with Storage=auto, both with systemd-208-6.fc20.bz1006386.x86_64.

The first boot with /etc/sysctl.d/dgram-qlen.conf:

journalctl-sysctl.txt:[   28.452541] tlondon.localhost.org systemd-journal[346]: Time spent on flushing to /var is 31.384ms for 1252 entries.

and the second with dgram-qlen.conf moved to /etc:

journalctl-nosysctl.txt:[   30.412865] tlondon.localhost.org systemd-journal[346]: Time spent on flushing to /var is 28.565ms for 1253 entries.

So, I'm now confused....

Comment 73 Michal Schmidt 2013-11-28 09:25:52 UTC

Tom,
I suspect that your /var/log/journal/$MACHINE_ID/system.journal got rotated recently. Since the file is now small, you're not seeing the pathological case now.

Comment 74 Tom London 2013-11-29 17:16:46 UTC

OK, I've attempted to rebase my system.

I "upgraded" to systemd-208-6.fc21.x86_64, restored /etc/sysctl.d/dgram-qlen.conf to /etc/sysctl.d/.

Here is what I see on a fresh reboot:

[   26.367819] tlondon.localhost.org systemd[1]: Started Relabel all filesystems
, if necessary.
[   26.369617] tlondon.localhost.org systemd[1]: Starting Recreate Volatile File
s and Directories...
[   26.403304] tlondon.localhost.org systemd[1]: Starting Tell Plymouth To Write
 Out Runtime Data...
[   26.407412] tlondon.localhost.org systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
[   26.410198] tlondon.localhost.org systemd[1]: Starting Security Auditing Service...
[   29.339436] tlondon.localhost.org systemd-journal[358]: Permanent journal is using 13.0M (max 4.0G, leaving 4.0G of free 84.9G, current limit 4.0G).
[   46.264863] tlondon.localhost.org systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
[   46.313668] tlondon.localhost.org systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
[   46.315449] tlondon.localhost.org systemd[1]: Started Security Auditing Service.

So, about 20 seconds to transition from "Starting" to "Started" for "Tell Plymouth to Write out Runtime Data."

Comment 75 Tom London 2013-11-29 18:08:51 UTC

Here is another boot (above was Permissive, this one is enforcing), this time I move /etc/sysctl.d/dgram-qlen.conf away.  Similar timings:

[   26.916182] tlondon.localhost.org systemd[1]: Starting Recreate Volatile Files and Directories...
[   26.917914] tlondon.localhost.org systemd[1]: Starting Tell Plymouth To Write Out Runtime Data...
[   26.932669] tlondon.localhost.org systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
[   26.935363] tlondon.localhost.org systemd[1]: Starting Security Auditing Service...
[   27.346229] tlondon.localhost.org systemd-journal[346]: Permanent journal is using 16.2M (max 4.0G, leaving 4.0G of free 84.4G, current limit 4.0G).
[   46.565052] tlondon.localhost.org systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
[   46.613994] tlondon.localhost.org systemd[1]: Started Security Auditing Service.
[   46.615775] tlondon.localhost.org systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
[   46.617253] tlondon.localhost.org systemd-tmpfiles[560]: Unable to fix label of /run/log/journal: No such file or directory
[   46.618994] tlondon.localhost.org systemd-tmpfiles[560]: Unable to fix label of /run/log/journal/6f397d582f6f6691f48a1fa74b338324: No such file or directory
[   46.622110] tlondon.localhost.org systemd[1]: Started Recreate Volatile Files and Directories.

Comment 76 Tom London 2013-11-30 16:59:48 UTC

Latest boot with:

kernel-3.13.0-0.rc2.git0.1.fc21.x86_64
systemd-208-6.fc21.x86_64
and
[root@tlondon ~]# cat /etc/sysctl.d/dgram-qlen.conf 
net.unix.max_dgram_qlen=1000
[root@tlondon ~]# 

Still "pauses" ...

[   19.102656] tlondon.localhost.org systemd[1]: Mounted /boot.
[   19.102924] tlondon.localhost.org systemd[1]: Starting Local File Systems.
[   19.103200] tlondon.localhost.org systemd[1]: Reached target Local File Systems.
[   19.103587] tlondon.localhost.org systemd[1]: Started Mark the need to relabel after reboot.
[   19.103887] tlondon.localhost.org systemd[1]: Started Reconfigure the system on administrator request.
[   19.104228] tlondon.localhost.org systemd[1]: Started Relabel all filesystems, if necessary.
[   19.104489] tlondon.localhost.org systemd[1]: Starting Recreate Volatile Files and Directories...
[   19.104748] tlondon.localhost.org systemd[1]: Starting Tell Plymouth To Write Out Runtime Data...
[   19.105015] tlondon.localhost.org systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
[   19.114838] tlondon.localhost.org systemd[1]: Starting Security Auditing Service...
[   19.971575] tlondon.localhost.org systemd-journal[342]: Permanent journal is using 28.2M (max 4.0G, leaving 4.0G of free 84.1G, current limit 4.0G).
[   46.416973] tlondon.localhost.org systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
[   46.466278] tlondon.localhost.org systemd[1]: Started Security Auditing Service.
[   46.466653] tlondon.localhost.org systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
[   46.467026] tlondon.localhost.org systemd[1]: Started Recreate Volatile Files and Directories.
[   46.467423] tlondon.localhost.org systemd[1]: Starting Update UTMP about System Reboot/Shutdown...
[   46.467956] tlondon.localhost.org systemd[1]: Started Update UTMP about System Reboot/Shutdown.
[   46.468278] tlondon.localhost.org systemd[1]: Starting System Initialization.
[   46.468661] tlondon.localhost.org systemd[1]: Reached target System Initialization.

Comment 77 Adam Williamson 2013-12-04 00:17:33 UTC

Is this going anywhere? Final Go/No-Go is Thursday, which means we really should have built the final RC today, and this bug's sitting here...

Comment 78 Zbigniew Jędrzejewski-Szmek 2013-12-04 00:43:25 UTC

The update with those patches is building... I think that the results above are inconclusive, so let's push it out and let more people test.

Comment 79 Adam Williamson 2013-12-04 01:52:56 UTC

Well, those patches and more :/

"- Backport patches (#1023041, #1036845"

neither of those is an accepted (or even proposed) blocker or FE bug.

"- HWDB update
 - Some small new features: nspawn --drop-capability=, running PID 1 under valgrind, "yearly" and "annually" in calendar specifications
 - Some small documentation and logging updates"

none of those changes relates to a blocker or FE bug.

During freeze, we are only supposed to make changes that fix FE or blocker bugs. That's what 'freeze' means. When you do this, we have the choice of taking a bunch of unnecessary and possibly destabilizing changes, or not fixing a blocker bug. Neither of those choices is a great choice. :/

In future, during freezes, when we need a build to fix an FE or blocker bug, it'd be appreciated if you could submit a build that *only* fixes FE/blocker issues.

Comment 80 Zbigniew Jędrzejewski-Szmek 2013-12-04 02:07:16 UTC

(In reply to Adam Williamson from comment #79)
> Well, those patches and more :/
Ooops, I completly forgot to exclude those patches. Do you want me to revert those extra patches?

Comment 81 Adam Williamson 2013-12-04 02:24:01 UTC

that would be really helpful, in fact - thanks.

Comment 82 Fedora Update System 2013-12-04 03:31:33 UTC

systemd-208-8.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/systemd-208-8.fc20

Comment 83 Jaroslav Reznik 2013-12-04 11:30:23 UTC

Does this build contain limit of the size of the log file? As Petr confirmed it solves the issue? Petr, could you retest it with this build?

Comment 84 Zbigniew Jędrzejewski-Szmek 2013-12-04 13:53:47 UTC

(In reply to Jaroslav Reznik from comment #83)
> Does this build contain limit of the size of the log file?
No.

> As Petr confirmed it solves the issue?
Yes, but limiting MaxFileSize=10 also has downsides, it makes things less efficient in general, even if it helps here.

> Petr, could you retest it with this build?
That would be nice.

Comment 85 Tom London 2013-12-04 14:29:31 UTC

Just to be clear, should we be testing this with or without to previously proposed /etc/sysctl.d/dlen-qlen.conf file?

Comment 86 Zbigniew Jędrzejewski-Szmek 2013-12-04 14:34:11 UTC

Without. Although if you could test both, this wouldn't hurt too :)

Comment 87 Tom London 2013-12-04 14:54:21 UTC

Fresh boot with systemd-208-8.fc21.x86_64 (and no /etc/sysctl.d/dgram-qlen.conf).

Guess this is the interesting line:

[   26.185554] tlondon.localhost.org systemd-journal[355]: Time spent on flushing to /var is 7.558988s for 1225 entries.

Believe this is much better than before....

BTW, this entry continues to be incorrectly sorted in the 'short-monotonic' output:

[   17.751438] tlondon.localhost.org kernel: type=1305 audit(1386168291.399:11):
 audit_pid=572 old=0 auid=4294967295 ses=4294967295
                                              subj=system_u:system_r:auditd_t:s0
 res=1



[   17.215805] tlondon.localhost.org systemd[1]: Starting Local File Systems.
[   17.216045] tlondon.localhost.org systemd[1]: Reached target Local File Systems.
[   17.216262] tlondon.localhost.org systemd[1]: Started Mark the need to relabel after reboot.
[   17.219584] tlondon.localhost.org systemd[1]: Started Reconfigure the system on administrator request.
[   17.220168] tlondon.localhost.org systemd[1]: Started Relabel all filesystems, if necessary.
[   17.220402] tlondon.localhost.org systemd[1]: Starting Recreate Volatile Files and Directories...
[   17.220621] tlondon.localhost.org systemd[1]: Starting Tell Plymouth To Write Out Runtime Data...
[   17.224298] tlondon.localhost.org systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
[   17.224544] tlondon.localhost.org systemd[1]: Starting Security Auditing Service...
[   17.303631] tlondon.localhost.org systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
[   17.329517] tlondon.localhost.org auditctl[573]: No rules
[   17.329770] tlondon.localhost.org auditctl[573]: AUDIT_STATUS: enabled=0 flag=1 pid=0 rate_limit=0 backlog_limit=320 lost=0 backlog=0
[   17.330796] tlondon.localhost.org systemd[1]: Started Security Auditing Service.
[   18.626116] tlondon.localhost.org systemd-journal[355]: Permanent journal is using 39.8M (max 4.0G, leaving 4.0G of free 83.6G, current limit 4.0G).
[   26.185554] tlondon.localhost.org systemd-journal[355]: Time spent on flushing to /var is 7.558988s for 1225 entries.
[   27.086592] tlondon.localhost.org systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
[   27.477893] tlondon.localhost.org systemd[1]: Started Recreate Volatile Files and Directories.
[   27.478173] tlondon.localhost.org systemd[1]: Starting Update UTMP about System Reboot/Shutdown...
[   27.478432] tlondon.localhost.org systemd[1]: Started Update UTMP about System Reboot/Shutdown.
[   27.478674] tlondon.localhost.org systemd[1]: Starting System Initialization.
[   27.478924] tlondon.localhost.org systemd[1]: Reached target System Initialization.
[   27.479187] tlondon.localhost.org systemd[1]: Starting Daily Cleanup of Temporary Directories.
[   27.479429] tlondon.localhost.org systemd[1]: Started Daily Cleanup of Temporary Directories.
[   27.479669] tlondon.localhost.org systemd[1]: Starting Timers.
[   27.479919] tlondon.localhost.org systemd[1]: Reached target Timers.
[   27.480171] tlondon.localhost.org systemd[1]: Starting Manage Sound Card State (restore and store)...

Comment 88 Tom London 2013-12-04 15:03:11 UTC

    Results after an immediate reboot:

    Time reported flushing:
    [   27.328031] tlondon.localhost.org systemd-journal[345]: Time spent on flushing to /var is 6.173021s for 1232 entries.

    Still seems to have from 18->27 seconds till "Time spent" is reported, and from 27->40 seconds until "Started Trigger Flushing" is reported.

    [   18.370261] tlondon.localhost.org systemd[1]: Started Reconfigure the system on administrator request.
    [   18.370488] tlondon.localhost.org systemd[1]: Started Relabel all filesystems, if necessary.
    [   18.370708] tlondon.localhost.org systemd[1]: Starting Recreate Volatile Files and Directories...
    [   18.370930] tlondon.localhost.org systemd[1]: Starting Tell Plymouth To Write Out Runtime Data...
    [   18.371167] tlondon.localhost.org systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
    [   18.360222] tlondon.localhost.org kernel: SELinux: initialized (dev sda1, type ext4), uses xattr
    [   18.375407] tlondon.localhost.org systemd[1]: Starting Security Auditing Service...
    [   21.154742] tlondon.localhost.org systemd-journal[345]: Permanent journal is using 56.0M (max 4.0G, leaving 4.0G of free 83.6G, current limit 4.0G).
    [   27.328031] tlondon.localhost.org systemd-journal[345]: Time spent on flushing to /var is 6.173021s for 1232 entries.
    [   40.971999] tlondon.localhost.org systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
    [   41.021420] tlondon.localhost.org systemd[1]: Started Security Auditing Service.
    [   41.021686] tlondon.localhost.org systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
    [   41.022170] tlondon.localhost.org systemd[1]: Started Recreate Volatile Files and Directories.
    [   41.022436] tlondon.localhost.org systemd[1]: Starting Update UTMP about System Reboot/Shutdown...

Comment 89 Petr Schindler 2013-12-04 16:51:16 UTC

I updated to systemd-208-8.fc20 and it seems to work quite fine. At least it boots now to gdm.

Comment 90 Fedora Update System 2013-12-05 00:37:40 UTC

Package systemd-208-8.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-208-8.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-22704/systemd-208-8.fc20
then log in and leave karma (feedback).

Comment 91 Fedora Update System 2013-12-06 05:55:59 UTC

systemd-208-9.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 92 René 2013-12-19 16:20:52 UTC

Are you sure this bug is fixed? On a fresh installation (KDE spin installed on blank partitions - no upgrade) booting took over 30 seconds when I first looked into optimizing boot times. After deleting the journal, boot times went down to under 20 seconds (according to systemd-analyze). Setting max_dgram_qlen to 1000 before that didn't help...

Though I always booted successfully, but spending one third of the boot with flushing the journal doesn't seem right to me.

I didn't adjust the default settings that might be related to this (besides disabling rsyslog, which got installed by default - I thought this shouldn't be active by default anymore?). The journal was about 50 MB before deletion.

René

Comment 93 Adam Williamson 2013-12-19 17:35:17 UTC

"Though I always booted successfully"

That was the issue we focused on.

"but spending one third of the boot with flushing the journal doesn't seem right to me."

it shouldn't happen all the time, but I think the devs do intend to keep working on it. Zbigniew, would a new report for the flush still sometimes delaying the boot be worthwhile, or unnecessary?

Comment 94 Karel Volný 2014-01-28 10:43:35 UTC

*** Bug 987242 has been marked as a duplicate of this bug. ***

Comment 95 Tomi Leppikangas 2014-02-09 20:27:06 UTC

Booting still slow in Fedora 20 with systemd-208-9.fc20.i686

Setting max_dgram_qlen doesn't help.

Abount 20 sec spend in flushing every boot:
Feb 02 23:54:20 xxx systemd-journal[278]: Time spent on flushing to /var is 24.789893s for 955 entries.
Feb 03 17:52:12 xxx systemd-journal[263]: Time spent on flushing to /var is 22.292291s for 959 entries.
Feb 05 18:24:07 xxx systemd-journal[261]: Time spent on flushing to /var is 27.376598s for 968 entries.
Feb 07 16:52:33 xxx systemd-journal[275]: Time spent on flushing to /var is 25.655816s for 955 entries.
Feb 07 17:23:36 xxx systemd-journal[272]: Time spent on flushing to /var is 24.069075s for 950 entries.


One thing i notised that there is lots of extens in journal file:

# filefrag /var/log/journal/ee72231613031058e546ebc5662ee8fe/system.journal 
/var/log/journal/ee72231613031058e546ebc5662ee8fe/system.journal: 5466 extents found

Comment 96 Ahmad Samir 2014-02-10 09:13:50 UTC

(In reply to Adam Williamson from comment #93)
> "Though I always booted successfully"
> 
> That was the issue we focused on.
> 
> "but spending one third of the boot with flushing the journal doesn't seem
> right to me."
> 
> it shouldn't happen all the time, but I think the devs do intend to keep
> working on it. Zbigniew, would a new report for the flush still sometimes
> delaying the boot be worthwhile, or unnecessary?

This question hasn't been answered yet, is a new report needed for the slow boot due to the journal flushing?

Comment 97 Tomi Leppikangas 2014-02-27 16:10:48 UTC

Today bootting was faster. 

Journald reports:
[   10.060962] rak046 systemd-journal[269]: Time spent on flushing to /var is 21.452ms for 945 entries.

thats ms, 1000x speed difference.

What have changed:
systemd updated from systemd-208-9.fc20.i686
 to systemd-208-14.fc20.i686.
journal rotated, now only 2 extents:

/var/log/journal/ee72231613031058e546ebc5662ee8fe/system.journal: 2 extents found

old has more:
/var/log/journal/ee72231613031058e546ebc5662ee8fe/system: 5467 extents found

I think problem comes back when file gets bigger again.

Comment 98 Goffredo Baroncelli 2014-06-10 20:03:38 UTC

I experienced a similar slowness. My root filesystem is btrfs which suffers a lot the fragmentation.

I solved doing 

     sudo btrfs fi defrag /var/log/journal/*/*

My boot time decreased of 20 seconds (from 50s to 30s) !

I am trying to do each week

Note You need to log in before you can comment on or make changes to this bug.

abetakehiko
ahmadsamir3891
alick9188
awilliam
bob53181
bugzilla
collura
harald
ignatenko
i.grok
johannbg
jreznik
kreijack
kvolny
lnykryn
michele
mlichvar
mruckman
mschmidt
msekleta
plautrba
pschindl
rdieter
rene.schaffrath+rhbz
robatino
systemd-maint
theo148
tomek
tomi.leppikangas
vpavlin
yaroslav.sapozhnik
yjcoshc
zbyszek