Description of problem:
I have, thus far, upgraded two of my three systems from Fedora Server 28 to Fedora Server 29 and found that after booting them up on 29, systemd is wholly unresponsive to systemctl commands - they all print "Failed to [insert action here]: Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)" and exit.
The first system (virt03) just...stopped doing it after several reboots when I was trying to troubleshoot it. I have no explanation whatsoever as to why it stopped occurring.
The second system (virt01) is still doing it. I've found that if I boot to rescue.target I can activate every service that's wanted by multi-user.target and have no issue, but when I activate multi-user.target itself, the problem occurs.
Both systems have similar sets of software installed and are being used for the same purpose - members of Ceph and Kubernetes clusters. I removed a couple packages from virt01 last night while trying to troubleshoot it. I'll attach the output of rpm -qa | sort from both, along with a diff of the two.
A keen reader may notice that I skipped a box named virt02 :) this is correct. virt02 is only a Kubernetes node and Ceph mon/mds; it has no OSDs, while the other two do. I'm going to try to upgrade it tonight and see what it does, and will report back.
While trying to find similar bug reports, I came across https://bugzilla.redhat.com/show_bug.cgi?id=1548417 - it MIGHT be the same, but the logging I see from `journalctl -f` (attached) from the time when I activated multi-user.target on virt01 doesn't match the reporter's all that closely.
Any other details I can provide, please ask. I'm at a total loss as to how to troubleshoot this one, so I'm sure what I have isn't as helpful as it could potentially be.
Version-Release number of selected component (if applicable):
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid
I'm not sure yet. It's happened to me on 100% of boxes that I've upgraded from Fedora 28 to Fedora 29, but my sample size is 2. Not super scientific.
Steps to Reproduce:
1. Have Fedora Server 28 system running kubelet (from Fedora repo's kubernetes-node package) and Ceph (from Ceph's repo, located at http://download.ceph.com/rpm-mimic/el7)
2. Upgrade to Fedora Server 29 with `dnf upgrade --refresh && dnf system-upgrade download --releasever=29 && dnf system-upgrade reboot`
3. On boot, observe this issue
systemd is unresponsive in multi-user.target
systemd is responsive in multi-user.target
Created attachment 1506844 [details]
virt01: rpm -qa
Created attachment 1506845 [details]
virt03: rpm -qa
Created attachment 1506846 [details]
rpm -qa diff (left is virt01, right is virt03)
Created attachment 1506847 [details]
I was a bit delayed, but got the last box (virt02) upgraded to Fedora 29 last night. It did not exhibit the issue. I'll attach the `rpm -qa` output from that one as well.
Created attachment 1507302 [details]
virt02: rpm -qa
#1548417 was a memory corruption in some udev / device hashmap code called when parsing the mount table. In your logs, I see bus_process_object, so it's responding to a dbus message. Maybe https://github.com/systemd/systemd/issues/10716 is related?
It could be related. Taking a look at the logs he posted (https://pastebin.com/b9wZt0s6), though, his systemd is catching a SIGSEGV while mine is catching a SIGABRT. We all know how these things can end up intertwined, though ;)
It is worth noting - Lennart mentions that the issue on https://github.com/systemd/systemd/issues/10716 is probably fallout from https://github.com/systemd/systemd/commit/a7a7163df7fc8a9f794f6803b2f6c9c9b0745a1f, which is intended to fix a race condition between daemon-reload and other commands. In both that issue and my issue, the crash occurs during a period with frequent daemon-reloads. My logs show six of them in the two seconds prior to the crash. It's hard to ignore all the similarities.
Is there any way to roll back to systemd 238 on Fedora 29 while we await a fix?
Update - it looks like `dnf install --downgrade systemd-239.3` makes it go away until a fix is released :)
Sorry, lost the terminal and had the command wrong by memory. `dnf install systemd-239-3.fc29`
This message is a reminder that Fedora 29 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '29'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.
Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 29 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
Thank you for reporting this bug and we are sorry it could not be fixed.