Created attachment 1758380 [details] Picuture of screen showing the frozen state Description of problem: During update of systemd on Fedora 34 the system completely froze. Version-Release number of selected component (if applicable): systemd 247.3-2-fc34 updating to 247.3-3.fc34 How reproducible: Don't know Steps to Reproduce: 1. dnf update systemd 2. 3. Actual results: System freezing Expected results: Normal update Additional info: xfce environment running in graphical mode.
Created attachment 1758381 [details] Running "journalctl -b -1 -p err" after system successfully rebooted
If I run systemctl daemon-reexec it looks like pid-1 fails to establish a connection to the dbus. In fact, running "lsof -p 1" shows no sockets at all opened by pid-1.
Feb 20 07:38:39 fc33 systemd[1]: Assertion 'p->n_ref > 0' failed at src/shared/varlink.c:386, function varlink_unref(). Aborting. Feb 20 07:38:40 fc33 systemd-coredump[4513]: Process 4512 (systemd) of user 0 dumped core. Stack trace of thread 4512: #0 0x00007f38e5ee058b kill (libc.so.6 + 0x3d58b) #1 0x000055c7b55d348e crash (/usr/lib/systemd/systemd (deleted) + 0x4848e) #2 0x00007f38e6085960 __restore_rt (libpthread.so.0 + 0x13960) #3 0x00007f38e5ee0292 raise (libc.so.6 + 0x3d292) #4 0x00007f38e5ec98a4 abort (libc.so.6 + 0x268a4) #5 0x00007f38e62218d2 n/a (/usr/lib/systemd/libsystemd-shared-247.so (deleted) + 0x728d2) Feb 20 07:38:40 fc33 systemd[1]: Caught <ABRT>, dumped core as pid 4512. Feb 20 07:38:40 fc33 systemd[1]: Freezing execution. Feb 20 07:38:40 fc33 systemd-oomd[367]: Failed to connect to /run/systemd/io.system.ManagedOOM: Connection refused Feb 20 07:38:40 fc33 systemd-oomd[367]: Failed to acquire varlink connection Feb 20 07:38:40 fc33 systemd-oomd[367]: Event loop failed: Connection refused Feb 20 07:39:39 fc33 abrt-notification[4650]: Process 4512 (systemd) crashed in ??() Oops, that's https://github.com/systemd/systemd/issues/18025.
Proposed as a Freeze Exception for 34-beta by Fedora user pwalter using the blocker tracking app because: systemd crashing during update makes it very difficult to test Beta.
Is this happening to other people too?
(In reply to Zbigniew Jędrzejewski-Szmek from comment #5) > Is this happening to other people too? A crash during update I saw only once. That said, when it crashed during update i was in graphical mode. After that I only upgrade in multi-user-mode. "systemctl daemon-reexec" consistently crashes when running in a virtual machine. Both qemu/kvm and VirtualBox. It does not crash on bare hardware on my system. "systemctl daemon-reexec" crash is seen in version 247 and 248 on Fedora 34.
I am seeing this as well. I have systemd-247.3-2.fc34.x86_64 installed in a VM and updating systemd ends up in the whole update hanging. I've been working this around with reverting to an older VM snapshot and updating with dnf update -x 'systemd*'
Well, the correct "workaround" is to update offline. Like we recommend everyone does :D It works OK in an offline update.
+3 in https://pagure.io/fedora-qa/blocker-review/issue/272 , marking accepted, but as noted there I'm only inclined to pull a very targeted and safe fix for this.
(In reply to Adam Williamson from comment #8) > Well, the correct "workaround" is to update offline. Like we recommend > everyone does :D It works OK in an offline update. That would still run "systemctl daemon-reexec" from the post-install scriptlet, wouldn't it.
I cannot reproduce this. Maybe there's some additional ingredient missing. Also, the backtrace in Comment 3 is not useful, since it was generated after files have already been replaced. Villy, please attach the coredump if you have it. Also, you say that you are able to reproduce this on 'daemon-reload'. Which versions? Can you provide the backtrace or coredump? I'd assume that this is fixed with https://github.com/systemd/systemd/commit/9807fdc1da8e037ddedfa4e2c6d2728b6e60051e, except that you're saying it's also with 248.
(In reply to Zbigniew Jędrzejewski-Szmek from comment #11) > > Also, you say that you are able to reproduce this on 'daemon-reload'. Which > versions? Daemon-reexec not reload. > Can you provide the backtrace or coredump? > Later today.
Created attachment 1760694 [details] New version systemd
Created attachment 1760695 [details] Old version of systemd
Created attachment 1760696 [details] Raw crash dump
Created attachment 1760697 [details] Raw chrash dump
*** Bug 1934549 has been marked as a duplicate of this bug. ***
(In reply to Zbigniew Jędrzejewski-Szmek from comment #5) > Is this happening to other people too? Zbigniew, I reported a crash with the same failed assertion 'Assertion 'p->n_ref > 0' failed at src/shared/varlink.c:386, function varlink_unref().' with the full trace which happened during the systemd-247.3-3.fc34 post-install scriptlet when I ran sudo dnf reinstall systemd*247.3* at https://bugzilla.redhat.com/show_bug.cgi?id=1930900 Many systemd services timed-out during the transaction, and the update took several minutes when it would normally be only a few seconds. The system couldn't switch VTs, reboot or shutdown normally after that. I reported a systemd abort during the systemd-247.3-3.fc34 post-install scriptlet when running sudo dnf upgrade. The trace of the systemd crash showed a "corrupted double-linked list" in malloc_printerr at malloc.c:5626 in glibc-2.32.9000-29.fc34.x86_64 in frame 6. https://bugzilla.redhat.com/show_bug.cgi?id=1930793 A systemd crash with the same trace and other characteristics also happened when I upgraded to systemd-248~rc2-1.fc34. I can provide more information if it would help. Thanks.
*** Bug 1936027 has been marked as a duplicate of this bug. ***
Quite a lot in bug 1936027 before I found this. But the gist is that the coredump produced is bad and can't be used by coredumpctl. And I can reproduce it 100% on Fedora Workstation. Do a graphical login, launch GNOME Terminal, and 'dnf update *rpm' in a dir that has the systemd rpms. It never happens if I do a 'dnf offline-upgrade' or GNOME Software initiated pkoffline update. Maybe it has something to do with a full user stack being used; or the update being initiated from inside that user stack *shrug*. But in all cases the coredump isn't usable.
I actually tried setting the scratch build I did for the resolver fix as a 'workaround' in openQA so the test that fails intermittently would work reliably (I hoped)...and that ran into something this problem too. The openQA tests install the builds set as 'workarounds' before running the test proper, but when they tried to do that, things blew up, like so: https://openqa.fedoraproject.org/tests/801288#step/_console_wait_login_2/3 There's a bunch of "Failed to" <do something> errors, the "reboot" command after the update completes doesn't work, and logging into another console doesn't work. I can't tell if anything outright *crashed* because the test wasn't able to log into another console to find out. I can fiddle with it some more next week if desired.
$ sudo reboot Failed to open initctl fifo: No such device or address Failed to talk to init daemon. Requires reboot -f or sysrq+b
Created attachment 1761097 [details] coredumpctl gdb, thread apply all bt full I got most of thread 2, thread 1 looks unusable.
You can do 'systemctl stop systemd-oomd' to work-around the issue ;) https://github.com/systemd/systemd/pull/18915 should fix the crash.
yup, that does indeed seem to work.
*** Bug 1936559 has been marked as a duplicate of this bug. ***
*** Bug 1937407 has been marked as a duplicate of this bug. ***
*** Bug 1937134 has been marked as a duplicate of this bug. ***
FEDORA-2021-7bd2ec6c13 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-7bd2ec6c13
FEDORA-2021-7bd2ec6c13 has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-7bd2ec6c13` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-7bd2ec6c13 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
*** Bug 1937504 has been marked as a duplicate of this bug. ***
*** Bug 1930900 has been marked as a duplicate of this bug. ***
*** Bug 1930793 has been marked as a duplicate of this bug. ***
Howdy, I'm looking forward to installing this fix... I have seen this problem twice on native hardware... in the past two weeks or so... same version of systemd (#1 was the upgrade to systemd... I thought it was a fluke... #2 was a dnf reinstall systemd...). I would bet money that it could happen again if I do a dnf reinstall systemd. George...
(In reply to George R. Goffe from comment #34) > I'm looking forward to installing this fix... > > I have seen this problem twice on native hardware... in the past two weeks > or so... same version of systemd (#1 was the upgrade to systemd... I thought > it was a fluke... #2 was a dnf reinstall systemd...). > > I would bet money that it could happen again if I do a dnf reinstall systemd. It was really fixed. 1. If you do not want wait, or use the testing repo: # systemctl stop systemd-oomd ; systemctl disable systemd-oomd ; dracut -f 2. # dnf update --enablerepo=updates-testing systemd* ; dracut -f ; # and init 6
I'll take that in cash, thanks.
FEDORA-2021-c2bfa5e4f6 has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-c2bfa5e4f6` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-c2bfa5e4f6 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2021-c2bfa5e4f6 has been pushed to the Fedora 34 stable repository. If problem still persists, please make note of it in this bug report.
I think we can close this now, the update was tested to fix it and is in stable...
(In reply to Adam Williamson from comment #39) > I think we can close this now, the update was tested to fix it and is in > stable... I can no longer cause a failure when calling "systemctl daemon-reexec" even when systemd-oomd is running. So the real problem seems to be fixed as well.