Bug 990323

Summary: systemd-journald fails after resume from hibernate
Product: [Fedora] Fedora Reporter: Dimitris <dimitris.on.linux>
Component: systemdAssignee: systemd-maint
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: johannbg, lnykryn, msekleta, notting, pedrogfrancisco, plautrba, systemd-maint, ulrich.hobelmann, vpavlin, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-21 21:30:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dimitris 2013-07-30 22:37:12 UTC
Description of problem:

Before hibernate:

$ systemctl status --failed
$

After resume:

$ systemctl status --failed
systemd-journald.service -> '/org/freedesktop/systemd1/unit/systemd_2djournald_2eservice'
systemd-journald.service - Journal Service
   Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static)
   Active: failed (Result: start-limit) since Tue 2013-07-30 15:23:39 PDT; 8s ago
     Docs: man:systemd-journald.service(8)
           man:journald.conf(5)
  Process: 3773 ExecStart=/usr/lib/systemd/systemd-journald (code=killed, signal=USR1)
   Status: "Shutting down..."

Before "systemctl start systemd-journal-flush.service":

Jul 30 15:22:19 gaspode systemd-logind[658]: Hibernate key pressed.
Jul 30 15:22:19 gaspode NetworkManager[779]: <info> sleep requested (sleeping: no  enabled: yes)
Jul 30 15:22:19 gaspode NetworkManager[779]: <info> sleeping or disabling...
Jul 30 15:22:19 gaspode NetworkManager[779]: <info> (enp0s25): device state change: activated -> unmanaged (reason 'sleeping') [100 10 37]
Jul 30 15:22:19 gaspode NetworkManager[779]: <info> (enp0s25): deactivating device (reason 'sleeping') [37]
Jul 30 15:22:19 gaspode NetworkManager[779]: <info> (enp0s25): canceled DHCP transaction, DHCP client pid 3592
Jul 30 15:22:19 gaspode NetworkManager[779]: <info> (enp0s25): cleaning up...
Jul 30 15:22:19 gaspode NetworkManager[779]: <info> (enp0s25): taking down device.
Jul 30 15:22:19 gaspode NetworkManager[779]: <info> (wlp3s0): device state change: unavailable -> unmanaged (reason 'sleeping') [20 10 37]
Jul 30 15:22:19 gaspode dbus[666]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Jul 30 15:22:19 gaspode dbus[666]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.nm-dispatcher.service': Unit dbus-org.freedesktop.nm-dispatcher.service f
Jul 30 15:22:19 gaspode NetworkManager[779]: <warn> Dispatcher failed: (32) Unit dbus-org.freedesktop.nm-dispatcher.service failed to load: No such file or directory. See system 
Jul 30 15:22:21 gaspode systemd[1]: Starting Sleep.
Jul 30 15:22:21 gaspode systemd[1]: Reached target Sleep.
Jul 30 15:22:21 gaspode systemd[1]: Starting Hibernate...
Jul 30 15:22:21 gaspode systemd-sleep[3730]: Failed to open configuration file /etc/systemd/sleep.conf: No such file or directory
Jul 30 15:22:21 gaspode systemd-sleep[3730]: Spawned /usr/lib/systemd/system-sleep/notify-upower.sh as 3731
Jul 30 15:22:21 gaspode kernel: PM: Hibernation mode set to 'platform'
Jul 30 15:22:21 gaspode systemd-sleep[3730]: /usr/lib/systemd/system-sleep/notify-upower.sh exited successfully.
Jul 30 15:22:21 gaspode systemd-sleep[3730]: Suspending system...
Jul 30 15:22:21 gaspode kernel: PM: Marking nosave pages: [mem 0x0009e000-0x000fffff]
Jul 30 15:22:21 gaspode kernel: PM: Marking nosave pages: [mem 0xbd1a1000-0xbd1a6fff]
Jul 30 15:22:21 gaspode kernel: PM: Marking nosave pages: [mem 0xbd2b7000-0xbd30efff]
Jul 30 15:22:21 gaspode kernel: PM: Marking nosave pages: [mem 0xbd3c6000-0xbd6fefff]
Jul 30 15:22:21 gaspode kernel: PM: Marking nosave pages: [mem 0xbd700000-0xffffffff]
Jul 30 15:22:21 gaspode kernel: PM: Basic memory bitmaps created
Jul 30 15:22:21 gaspode kernel: PM: Syncing filesystems ... done.
Jul 30 15:23:38 gaspode systemd-journal[3759]: Allowing system journal files to grow to 4.0G.
Jul 30 15:23:38 gaspode systemd-journal[3759]: Journal started
Jul 30 15:23:38 gaspode systemd[1]: Time has been changed
Jul 30 15:23:38 gaspode systemd[1]: systemd-journald.service: main process exited, code=exited, status=1/FAILURE
Jul 30 15:23:38 gaspode systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Jul 30 15:23:38 gaspode systemd[1]: systemd-journald.service: main process exited, code=killed, status=10/USR1
Jul 30 15:23:38 gaspode systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Jul 30 15:23:38 gaspode systemd-sleep[3730]: System resumed.
Jul 30 15:23:38 gaspode systemd-sleep[3730]: Spawned /usr/lib/systemd/system-sleep/notify-upower.sh as 3762
Jul 30 15:23:38 gaspode systemd-sleep[3730]: /usr/lib/systemd/system-sleep/notify-upower.sh exited successfully.
Jul 30 15:23:38 gaspode systemd[1]: Started Hibernate.
Jul 30 15:23:38 gaspode systemd[1]: Requested transaction contradicts existing jobs: File exists
Jul 30 15:23:38 gaspode systemd[1]: Service sleep.target is not needed anymore. Stopping.
Jul 30 15:23:38 gaspode systemd[1]: Stopping Sleep.
Jul 30 15:23:38 gaspode systemd-journal[3766]: Allowing system journal files to grow to 4.0G.
Jul 30 15:23:38 gaspode systemd-journal[3766]: Journal started
Jul 30 15:23:38 gaspode NetworkManager[779]: <info> wake requested (sleeping: yes  enabled: yes)
Jul 30 15:23:38 gaspode NetworkManager[779]: <info> waking up and re-enabling...
Jul 30 15:23:38 gaspode NetworkManager[779]: <info> (enp0s25): device state change: unmanaged -> unavailable (reason 'managed') [10 20 2]
Jul 30 15:23:38 gaspode NetworkManager[779]: <info> (enp0s25): bringing up device.
Jul 30 15:23:38 gaspode systemd[1]: systemd-journald.service: main process exited, code=exited, status=1/FAILURE
Jul 30 15:23:38 gaspode systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Jul 30 15:23:39 gaspode systemd-journal[3770]: Allowing system journal files to grow to 4.0G.
Jul 30 15:23:39 gaspode systemd-journal[3770]: Journal started
Jul 30 15:23:39 gaspode systemd[1]: systemd-journald.service: main process exited, code=exited, status=1/FAILURE
Jul 30 15:23:39 gaspode systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.

After "systemctl start systemd-journal-flush.service":

$ systemctl status --failed
$

-- Logs begin at Wed 2013-06-26 10:12:19 PDT, end at Tue 2013-07-30 15:29:18 PDT. --
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: reset: unexpected enumeration response hbm.
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: reset: wrong host start response
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: reset: unexpected enumeration response hbm.
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: reset: wrong host start response
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: reset: unexpected enumeration response hbm.
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: reset: wrong host start response
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: reset: unexpected enumeration response hbm.
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: reset: wrong host start response
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: reset: unexpected enumeration response hbm.
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: reset: wrong host start response
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: we can't read the message slots =00000000.
Jul 30 15:28:47 gaspode kernel: mei_me 0000:00:03.0: reset: unexpected enumeration response hbm.

[... lots more of the same mei_me kernel spam, see #917081 ...]

Jul 30 15:28:47 gaspode systemd-journal[3980]: Journal started
Jul 30 15:28:47 gaspode systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Jul 30 15:28:48 gaspode kernel: mei_me 0000:00:03.0: reset: init clients timeout hbm_state = 1.
Jul 30 15:28:48 gaspode kernel: mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
Jul 30 15:29:03 gaspode dbus[666]: [system] Activating service name='org.freedesktop.ModemManager' (using servicehelper)
Jul 30 15:29:03 gaspode dbus[666]: [system] Activated service 'org.freedesktop.ModemManager' failed: Process /lib64/dbus-1/dbus-daemon-launch-helper received signal 13
Jul 30 15:29:18 gaspode kernel: mei_me 0000:00:03.0: reset: init clients timeout hbm_state = 1.
Jul 30 15:29:18 gaspode kernel: mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING


Version-Release number of selected component (if applicable):

systemd.x86_64 204-9.fc19

Also:
kernel.x86_64 3.10.3-300.fc19

How reproducible:

Every time.

Steps to Reproduce:
See above

Actual results:


Expected results:


Additional info:

Comment 1 Dimitris 2013-07-30 23:03:46 UTC
FWIW, I cannot reproduce this under kernels 3.9.9-302.fc19.x86_64 or 3.11.0-0.rc3.git0.1.fc20.x86_64.

Comment 2 Zbigniew Jędrzejewski-Szmek 2013-07-31 13:09:27 UTC
It looks like sd-journald.service is started during hibernation. What is its status before hibernation? And does it start succesfully after hibernation?

Comment 3 Dimitris 2013-07-31 16:48:05 UTC
It was running before I initiated hibernation.  Just after resume:

$ systemctl status systemd-journald.service 
systemd-journald.service - Journal Service
   Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static)
   Active: active (running) since Tue 2013-07-30 16:00:28 PDT; 17h ago
     Docs: man:systemd-journald.service(8)
           man:journald.conf(5)
 Main PID: 406 (systemd-journal)
   Status: "Shutting down..."
   CGroup: name=systemd:/system/systemd-journald.service
           └─406 /usr/lib/systemd/systemd-journald

Jul 30 16:00:28 gaspode systemd-journal[406]: Allowing runtime journal files to grow to 393.9M.
Jul 30 16:00:28 gaspode systemd-journal[406]: Journal started
Jul 30 16:00:30 gaspode systemd-journal[406]: Allowing system journal files to grow to 4.0G.
Jul 30 16:00:42 gaspode systemd-journal[406]: Forwarding to syslog missed 60 messages.

Note the Status: line.  Then, wait a few seconds, and:

$ systemctl status systemd-journald.service 
systemd-journald.service - Journal Service
   Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static)
   Active: failed (Result: start-limit) since Wed 2013-07-31 09:39:40 PDT; 34s ago
     Docs: man:systemd-journald.service(8)
           man:journald.conf(5)
  Process: 2895 ExecStart=/usr/lib/systemd/systemd-journald (code=exited, status=1/FAILURE)
   Status: "Shutting down..."

Last lines in the log (before restarting), showing the repeated failures that led it to stay failed:

Jul 31 09:39:40 gaspode systemd-journal[2883]: Allowing system journal files to grow to 4.0G.
Jul 31 09:39:40 gaspode systemd-journal[2883]: Journal started
Jul 31 09:39:39 gaspode systemd[1]: systemd-journald.service: main process exited, code=exited, status=1/FAILURE
Jul 31 09:39:39 gaspode systemd[1]: systemd-journald.service: main process exited, code=killed, status=10/USR1
Jul 31 09:39:39 gaspode systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Jul 31 09:39:40 gaspode systemd-journal[2887]: Allowing system journal files to grow to 4.0G.
Jul 31 09:39:40 gaspode systemd-journal[2887]: Journal started
Jul 31 09:39:40 gaspode systemd[1]: systemd-journald.service: main process exited, code=exited, status=1/FAILURE
Jul 31 09:39:40 gaspode systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Jul 31 09:39:40 gaspode systemd-journal[2891]: Allowing system journal files to grow to 4.0G.
Jul 31 09:39:40 gaspode systemd-journal[2891]: Journal started
Jul 31 09:39:40 gaspode systemd[1]: systemd-journald.service: main process exited, code=exited, status=1/FAILURE
Jul 31 09:39:40 gaspode systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Jul 31 09:39:40 gaspode systemd-journal[2895]: Allowing system journal files to grow to 4.0G.
Jul 31 09:39:40 gaspode systemd-journal[2895]: Journal started
Jul 31 09:39:40 gaspode systemd[1]: systemd-journald.service: main process exited, code=exited, status=1/FAILURE
Jul 31 09:39:40 gaspode systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.

Comment 4 Dimitris 2013-07-31 19:12:21 UTC
This is looking more than a little kernel-triggered.  As mentioned above I can't reproduce with 3.9 or 3.11 kernels, but with 3.10.4-300.fc19.x86_64 from testing after hibernate resume:

> $ systemctl status systemd-journald.service 
> systemd-journald.service - Journal Service
>    Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static)
>    Active: failed (Result: start-limit) since Wed 2013-07-31 12:04:17 PDT; 4min 50s ago
>      Docs: man:systemd-journald.service(8)
>            man:journald.conf(5)
>   Process: 5899 ExecStart=/usr/lib/systemd/systemd-journald (code=killed, signal=USR1)
>    Status: "Shutting down..."

Comment 5 Dimitris 2013-08-03 05:32:10 UTC
This looks like a race that was exposed bug 917081.  Patching that one makes this go away too.

Comment 6 Lennart Poettering 2013-09-29 18:00:42 UTC
This should be fixed by upstream commit

http://cgit.freedesktop.org/systemd/systemd/commit/src/journal/journald-server.c?id=5843c5ebb4341382ae9c87e93c2c87467e573548

i.e. the kernel is generating POLLERR if kmsg runs over (i.e. when a lot of messages have been generated in a very short time), and journald exited on this.

Comment 7 Zbigniew Jędrzejewski-Szmek 2013-10-21 21:30:59 UTC
Previous update for F19 had this patch.