Bug 1937134

Summary: during system upgrade, systemd crashed and froze execution
Product: [Fedora] Fedora Reporter: George R. Goffe <grgoffe>
Component: systemdAssignee: systemd-maint
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rawhideCC: fedoraproject, filbranden, flepied, kasong, lnykryn, msekleta, ssahani, s, systemd-maint, yuwatana, zbyszek, z
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-11 11:45:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description George R. Goffe 2021-03-09 23:48:16 UTC
Description of problem:During a dnf reinstall operation, systemd has crashed, "freezing execution" and making the system ususable. It seems that there is now a core file(?) ".zst" (core.systemd.0.1892a9b8dcdc435eabe81c6803bc3022.23063.1615322033000000.zst).

It seems to me that such an IMPORTANT application to system operations SHOULD BE "PROTECTED" from killing the system causing a power cycle to recover.


Version-Release number of selected component (if applicable):
systemd-248~rc2-3.fc35.x86_64

How reproducible:
I would bet that a dnf reinstall systemd'*' would do it.

Steps to Reproduce:
1.
2.
3.

Actual results:
a system where reboot, halt, and other commands are NOT executed.

Expected results:


Additional info:

Mar  9 12:33:17 fc35 dbus-broker-launch[1203]: Noticed file-system modification, trigger reload.
Mar  9 12:33:17 fc35 dbus-broker-launch[1203]: Noticed file-system modification, trigger reload.
Mar  9 12:33:18 fc35 dbus-broker-launch[1203]: Noticed file-system modification, trigger reload.
Mar  9 12:33:28 fc35 pcp-pmie[2043]: Severe demand for real memory 7.3pgsout/s@fc35
Mar  9 12:33:53 fc35 systemd[1]: Reexecuting.
Mar  9 12:33:53 fc35 systemd[1]: Assertion 'p->n_ref > 0' failed at src/shared/varlink.c:386, function varlink_unref(). Aborting.


Mar  9 12:33:53 fc35 systemd-coredump[23064]: Due to PID 1 having crashed coredump collection will now be turned off.
Mar  9 12:33:53 fc35 systemd-coredump[23064]: Removed old coredump core.systemd.0.cf08c085a4e8421ca9756796437265ca.401188.1615265350000000.zst.
Mar  9 12:33:53 fc35 abrt-dump-journal-core[1293]: Failed to obtain all required information from journald
Mar  9 12:33:53 fc35 abrt-dump-journal-core[1293]: Failed to obtain all required information from journald
Mar  9 12:33:55 fc35 systemd-coredump[23064]: Process 23063 (systemd) of user 0 dumped core.

Stack trace of thread 23063:
#0  0x00007f2c393c258b kill (libc.so.6 + 0x3d58b)
#1  0x0000564b922abb58 crash (systemd + 0x45b58)
#2  0x00007f2c395674b0 __restore_rt (libpthread.so.0 + 0x134b0)
#3  0x00007f2c393c2292 raise (libc.so.6 + 0x3d292)
#4  0x00007f2c393ab8a4 abort (libc.so.6 + 0x268a4)
#5  0x00007f2c3974af26 log_assert_failed.cold (libsystemd-shared-248.so + 0x76f26)
#6  0x00007f2c397efc7f varlink_unref (libsystemd-shared-248.so + 0x11bc7f)
#7  0x0000564b922fd5f0 manager_free.part.0 (systemd + 0x975f0)
#8  0x0000564b922a8635 main (systemd + 0x42635)
#9  0x00007f2c393acb75 __libc_start_main (libc.so.6 + 0x27b75)
#10 0x0000564b922ab70e _start (systemd + 0x4570e)

Mar  9 12:33:55 fc35 systemd[1]: Caught <ABRT>, dumped core as pid 23063.
Mar  9 12:33:59 fc35 systemd[1]: Freezing execution.

Comment 1 George R. Goffe 2021-03-10 17:09:18 UTC
Hi,

I am NOT a happy camper right now. I lost a major part of a backup drive due to the power cycle of this system... seemingly related to this freeze situation. e2fsck found a mangled superblock and then used a different block... the rest is SAD history. Grrrrr...

Is anyone working this bug?

George...

Comment 2 Zbigniew Jędrzejewski-Szmek 2021-03-11 11:45:22 UTC
I'm sorry to hear that. Yeah, there's a patch, I'm building a version for F34 and rawhide now.

> I lost a major part of a backup drive due to the power cycle of this system... seemingly related to this freeze situation.

Hmm, that is not expected. When pid1 freezes, the machine is still usable, in the sense
that anything that communicates with pid1 is not possible, but the kernel and userspace are still functional.
Please use 'poweroff -f' or 'reboot -f ' to shut down / reboot the machine in such situations.

*** This bug has been marked as a duplicate of bug 1931034 ***