Bug 1187695 - systemd "Looping too fast" after reexec with broken dbus socket
Summary: systemd "Looping too fast" after reexec with broken dbus socket
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Michal Schmidt
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-30 17:04 UTC by Tomáš Bžatek
Modified: 2016-12-20 13:11 UTC (History)
12 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-12-20 13:11:27 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
strace -p 1 fragment (1.65 KB, text/plain)
2015-01-30 17:04 UTC, Tomáš Bžatek
no flags Details
journalctl -b (227.84 KB, text/plain)
2015-01-30 17:05 UTC, Tomáš Bžatek
no flags Details

Description Tomáš Bžatek 2015-01-30 17:04:47 UTC
Created attachment 986084 [details]
strace -p 1 fragment

Description of problem:
This is the first time I've seen that problem, the system was working properly before. Routinely spawned "dnf update" on my rawhide system, with about 700 updates (~ 2 months since last update). The update process got stuck eventually, ps showed stuck "firewall-cmd --reload --quiet". So I killed that with SIGTERM. The update process continued and eventually stuck again few packages later. Ps showed stuck "systemctl condrestart xxxx.service". Killing that with SIGTERM did nothing, killing with SIGKILL did nothing either. It succeeded after some time and the update process continued until it got stuck on another "systemctl condrestart yyyyy.service"...

From that point on, any request to systemd incl. "reboot" gets stuck.

Version-Release number of selected component (if applicable):
systemd-216-11.fc22.x86_64 (running)
systemd-218-3.fc22.x86_64 (freshly updated by dnf update)
kernel-3.18.0-0.rc0.git9.1.fc22.x86_64

Additional info:
This is a VM under KVM, 8 logical CPUs, x86_64, Haswell, started by "qemu-system-x86_64 -enable-kvm -cpu host -smp 8 -m 8G"

The symptopms fit to http://lists.freedesktop.org/archives/systemd-devel/2014-December/025867.html except of the memory usage, which is low. Looking at top, systemd eats between 10 - 50 % CPU.

Have a look at the attached "strace -p 1" fragment. This is reading from fds 4 and 14:
  File: ‘/proc/1/fd/4’ -> ‘anon_inode:[eventpoll]’
  File: ‘/proc/1/fd/14’ -> ‘anon_inode:[timerfd]’

Comment 1 Tomáš Bžatek 2015-01-30 17:05:51 UTC
Created attachment 986086 [details]
journalctl -b

Comment 2 Jan Synacek 2015-02-02 07:22:19 UTC

*** This bug has been marked as a duplicate of bug 1186018 ***

Comment 3 Jan Synacek 2015-02-02 07:24:02 UTC
I'm about 90% sure that this is a dup of bug 1186018. If killing "dbus-send" doesn't work around the issue, please, reopen this bugzilla.

Comment 4 Michal Schmidt 2015-03-13 17:54:35 UTC
Unduplicating. Bug 1186018 is related, but it's not the same bug.

I can now reproduce it using these steps:
1. Lose the DBus system bus socket by following the steps from
   https://bugzilla.redhat.com/show_bug.cgi?id=1186018#c7
2. Tell systemd to reexec:
   kill -TERM 1

After 25 seconds I see the first error message:
   Failed to register match for Disconnected message: Connection timed out
This repeats a couple more times. Eventually the "Looping too fast" start appearing too.

(I have no reproducer or explanation for the unkillable systemctl observed by Tomáš. It should not be possible to cause that from userspace.)

Comment 5 Michal Sekletar 2015-03-17 17:04:38 UTC
It seems to me that this bug was caused by updating dbus (to version which changed unit file option ListenStream) and systemd in the single dnf transaction. 

Also I think that patch for bug 1186018 should also fix this one because we would not end up with two sockets after daemon reload in the first place.

It would be great if systemd could handle cases like this one in more robust manner than entering "busy" loop after reexec. However with kdbus this entire sort of issues should go away entirely.

Comment 6 Michal Schmidt 2015-03-18 12:07:15 UTC
(In reply to Michal Sekletar from comment #5)
Michal, I agree with everything you wrote there. Still, I find the failure mode interesting enough to make me want to look into it further. And that's why I reopened this BZ. I'm assigning it to myself so you don't have to waste your time on it.

Comment 8 Jan Kurik 2015-07-15 14:34:57 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23

Comment 9 Fedora End Of Life 2016-11-24 11:24:28 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 10 Fedora End Of Life 2016-12-20 13:11:27 UTC
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.