Bug 1008559

Summary: Loaded units and determined initial transaction in *3h* 1min 3.384174s.
Product: [Fedora] Fedora Reporter: Nicolas Mailhot <nicolas.mailhot>
Component: systemdAssignee: systemd-maint
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: harald, johannbg, lnykryn, mschmidt, msekleta, plautrba, systemd-maint, vpavlin, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-14 20:24:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
dmesg at rescue mode log in
none
journalctl at rescue mode log in
none
journalctl after init 5 and service restarts
none
journalctl with systemd.log_target=kmsg none

Description Nicolas Mailhot 2013-09-16 15:15:59 UTC
Description of problem:
system takes ages to boot and then drops into rescue mode

After toot log in an init 5 fixes the system (sometimes also need to restard the gdm and postfix services)

kernel, dracut and mdadm version do not seem to mater (it fails the same way with mdadm 3.2 and 3.3, with old initramfses and recent ones)


Version-Release number of selected component (if applicable):
dracut-033-3.git20130913.fc21.x86_64
kernel-3.12.0-0.rc0.git22.1.fc21.x86_64
kernel-3.12.0-0.rc0.git26.1.fc21.x86_64
kernel-headers-3.12.0-0.rc0.git26.1.fc21.x86_64
kernel-modules-extra-3.12.0-0.rc0.git26.1.fc21.x86_64
kernel-tools-3.12.0-0.rc0.git26.1.fc21.x86_64
kernel-tools-libs-3.12.0-0.rc0.git26.1.fc21.x86_64
mdadm-3.2.6-21.fc20.x86_64
systemd-207-1.fc21.x86_64
systemd-libs-207-1.fc21.x86_64
systemd-python-207-1.fc21.x86_64

Comment 1 Nicolas Mailhot 2013-09-16 15:17:00 UTC
Created attachment 798312 [details]
dmesg at rescue mode log in

Comment 2 Nicolas Mailhot 2013-09-16 15:17:57 UTC
Created attachment 798313 [details]
journalctl at rescue mode log in

Comment 3 Michal Schmidt 2013-09-16 15:20:29 UTC
According to the dmesg timestamps, almost every line printed took about 60 seconds. Please try adding "systemd.log_target=kmsg". See if it makes the logging go faster.

Comment 4 Nicolas Mailhot 2013-09-16 15:22:41 UTC
Created attachment 798316 [details]
journalctl after init 5 and service restarts

Comment 5 Nicolas Mailhot 2013-09-16 15:44:09 UTC
(In reply to Michal Schmidt from comment #3)
> According to the dmesg timestamps, almost every line printed took about 60
> seconds. Please try adding "systemd.log_target=kmsg". See if it makes the
> logging go faster.

Why would it cause
 Loaded units and determined initial transaction in *3h* 1min 3.384174s ?
Also, as I wrote before it fails the same way on older initramfs and their kernels used to work fine

(not that I'm against experimenting, but here every experiment costs more than 3 hours)

Comment 6 Michal Schmidt 2013-09-16 15:48:50 UTC
(In reply to Nicolas Mailhot from comment #5)
> (In reply to Michal Schmidt from comment #3)
> > According to the dmesg timestamps, almost every line printed took about 60
> > seconds. Please try adding "systemd.log_target=kmsg". See if it makes the
> > logging go faster.
> 
> Why would it cause
>  Loaded units and determined initial transaction in *3h* 1min 3.384174s ?

Simply by spending 1 minute printing every "Installed new job ..." line (and other debug lines). It then takes 3 hours to print 180 lines.
I suspect it is getting blocked on writing to the journal socket and timing out. By logging to kmsg you'd avoid this blockage.

I think we need to rethink this commit:

commit 8b18fdc19531ba56d0bdfe34c62870997a9bcc96
Author: Lennart Poettering <lennart>
Date:   Wed Jul 24 07:24:05 2013 +0200

    core: synchronously block when logging

Comment 7 Harald Hoyer 2013-09-16 16:27:39 UTC
[    0.417227] acpi PNP0A03:00: ACPI _OSC support notification failed, disabling PCIe ASPM
[    0.417234] acpi PNP0A03:00: Unable to request _OSC control (_OSC support mask: 0x08)
[    0.419907] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
…

[    7.748042] ata8: link is slow to respond, please be patient (ready=0)
[   11.448041] ata7: device not ready, treating as offline
[   11.460610] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   11.483048] ata8: device not ready, treating as offline
[   11.495651] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
…
[  627.698045] ======================================================
[  627.698046] [ INFO: possible circular locking dependency detected ]
[  627.698047] 3.12.0-0.rc0.git26.1.fc21.x86_64 #1 Not tainted
[  627.698048] -------------------------------------------------------
[  627.698049] kworker/0:1/26 is trying to acquire lock:
[  627.698055]  (&fb_info->lock){+.+.+.}, at: [<ffffffff813cd45d>] lock_fb_info+0x1d/0x50
[  627.698056] 
but task is already holding lock:
[  627.698059]  (console_lock){+.+.+.}, at: [<ffffffff8146c173>] console_callback+0x13/0x180
[  627.698060] which lock already depends on the new lock.


something is completely broken on your system.

Comment 8 Nicolas Mailhot 2013-09-16 16:45:11 UTC
(In reply to Harald Hoyer from comment #7)
> [    0.417227] acpi PNP0A03:00: ACPI _OSC support notification failed,
> disabling PCIe ASPM
> [    0.417234] acpi PNP0A03:00: Unable to request _OSC control (_OSC support
> mask: 0x08)
> [    0.419907] acpi PNP0A03:00: fail to add MMCONFIG information, can't
> access extended PCI configuration space under this bridge.
> …
> 
> [    7.748042] ata8: link is slow to respond, please be patient (ready=0)
> [   11.448041] ata7: device not ready, treating as offline
> [   11.460610] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [   11.483048] ata8: device not ready, treating as offline
> [   11.495651] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> …
> [  627.698045] ======================================================
> [  627.698046] [ INFO: possible circular locking dependency detected ]
> [  627.698047] 3.12.0-0.rc0.git26.1.fc21.x86_64 #1 Not tainted
> [  627.698048] -------------------------------------------------------
> [  627.698049] kworker/0:1/26 is trying to acquire lock:
> [  627.698055]  (&fb_info->lock){+.+.+.}, at: [<ffffffff813cd45d>]
> lock_fb_info+0x1d/0x50
> [  627.698056] 
> but task is already holding lock:
> [  627.698059]  (console_lock){+.+.+.}, at: [<ffffffff8146c173>]
> console_callback+0x13/0x180
> [  627.698060] which lock already depends on the new lock.
> 
> 
> something is completely broken on your system.

The SATA warnings are normal (early-days SATA dvd player, Linux complained about it for years)

The circular locking is new, but quite unsurprising on a 3.12-rc0 rawhide kernel with debugging enabled. It will hopefully get fixed as usual. It's unrelated to the current problem (fails the same on old initramfses with more stable kernels)

Comment 9 Jóhann B. Guðmundsson 2013-09-16 16:51:32 UTC
The sata warnings are due to broken dvd player you should replace it or unplug it.

Comment 10 Jóhann B. Guðmundsson 2013-09-16 16:54:36 UTC
What's the time without development kernel as in the latest 3.11 stable ( kernel-3.11.1-300.fc20 )

Comment 11 Nicolas Mailhot 2013-09-16 17:10:25 UTC
(In reply to Michal Schmidt from comment #6)
> (In reply to Nicolas Mailhot from comment #5)
> > (In reply to Michal Schmidt from comment #3)
> > > According to the dmesg timestamps, almost every line printed took about 60
> > > seconds. Please try adding "systemd.log_target=kmsg". See if it makes the
> > > logging go faster.
> > 
> > Why would it cause
> >  Loaded units and determined initial transaction in *3h* 1min 3.384174s ?
> 
> Simply by spending 1 minute printing every "Installed new job ..." line (and
> other debug lines). It then takes 3 hours to print 180 lines.
> I suspect it is getting blocked on writing to the journal socket and timing
> out. By logging to kmsg you'd avoid this blockage.
> 
> I think we need to rethink this commit:
> 
> commit 8b18fdc19531ba56d0bdfe34c62870997a9bcc96
> Author: Lennart Poettering <lennart>
> Date:   Wed Jul 24 07:24:05 2013 +0200
> 
>     core: synchronously block when logging


Looks this was the culprit indeed, boot times are semi-normal again
> Loaded units and determined initial transaction in 1.955201s

It also makes the circular locking warning go away (not sure if the bug is fixed or hidden)

Comment 12 Nicolas Mailhot 2013-09-16 17:11:23 UTC
(In reply to Jóhann B. Guðmundsson from comment #9)
> The sata warnings are due to broken dvd player you should replace it or
> unplug it.

It works good enough except for the boot warning…

Comment 13 Nicolas Mailhot 2013-09-16 17:12:54 UTC
Created attachment 798361 [details]
journalctl with systemd.log_target=kmsg

Comment 14 Nicolas Mailhot 2013-09-16 17:14:44 UTC
(In reply to Nicolas Mailhot from comment #11)

> Looks this was the culprit indeed, boot times are semi-normal again
> > Loaded units and determined initial transaction in 1.955201s
> 
> It also makes the circular locking warning go away (not sure if the bug is
> fixed or hidden)

So, is this a kernel bug or are systemd expectations wrong?

Comment 15 Lennart Poettering 2013-09-29 18:30:16 UTC
I am pretty sure this is fixed by this commit upstream:

http://cgit.freedesktop.org/systemd/systemd/commit/src/journal/journald-server.c?id=4608af4333d0f7f5f8e3bc632801b04ef07d246d

Comment 16 Fedora Update System 2013-10-02 12:13:01 UTC
systemd-208-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/systemd-208-1.fc20