Bug 1548417 - systemd crashed and froze execution
Summary: systemd crashed and froze execution
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 27
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-23 12:58 UTC by Tom Horsley
Modified: 2018-11-21 13:31 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-21 13:31:54 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Tom Horsley 2018-02-23 12:58:44 UTC
Description of problem:

I found my system is a relatively unusable state yesterday with this in the log file:

Feb 22 01:59:02 tomh systemd[19027]: Closed D-Bus User Message Bus Socket.
Feb 22 01:59:02 tomh systemd[19027]: Reached target Shutdown.
Feb 22 01:59:02 tomh systemd[19027]: Starting Exit the Session...
Feb 22 01:59:02 tomh systemd[19027]: Received SIGRTMIN+24 from PID 19045 (kill).
Feb 22 02:00:01 tomh systemd[1]: Caught <ABRT>, dumped core as pid 19306.
Feb 22 02:00:01 tomh systemd[1]: Freezing execution.
Feb 22 02:00:01 tomh systemd-logind[832]: Failed to start user slice user-1000.slice, ignoring: Message recipient disconnected from message bus without replying (org.freedesktop.DBus.Error.NoReply)
Feb 22 02:00:26 tomh dbus-daemon[842]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)

Checking out the core file I see this:

(gdb) core-file ./core.19306
[New LWP 19306]
Reading symbols from /usr/lib/systemd/systemd...Reading symbols from /usr/lib/debug/usr/lib/systemd/systemd-234-9.fc27.x86_64.debug...done.
done.
warning: Ignoring non-absolute filename: <linux-vdso.so.1>
Missing separate debuginfo for linux-vdso.so.1
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/1d/6aee6bfed597c75c5c959502d5dc574aadf8e7
Missing separate debuginfo for /lib64/libudev.so.1
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/63/a671987ec4228503f63e2d795fc2940135756f
warning: the debug information found in "/usr/lib/debug//lib64/libpthread-2.26.so.debug" does not match "/lib64/libpthread.so.0" (CRC mismatch).

warning: the debug information found in "/usr/lib/debug//usr/lib64/libpthread-2.26.so.debug" does not match "/lib64/libpthread.so.0" (CRC mismatch).

[New LWP 1]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: the debug information found in "/usr/lib/debug//lib64/libc-2.26.so.debug" does not match "/lib64/libc.so.6" (CRC mismatch).

warning: the debug information found in "/usr/lib/debug//usr/lib64/libc-2.26.so.debug" does not match "/lib64/libc.so.6" (CRC mismatch).

warning: the debug information found in "/usr/lib/debug//lib64/librt-2.26.so.debug" does not match "/lib64/librt.so.1" (CRC mismatch).

warning: the debug information found in "/usr/lib/debug//usr/lib64/librt-2.26.so.debug" does not match "/lib64/librt.so.1" (CRC mismatch).

warning: the debug information found in "/usr/lib/debug//lib64/ld-2.26.so.debug" does not match "/lib64/ld-linux-x86-64.so.2" (CRC mismatch).

warning: the debug information found in "/usr/lib/debug//usr/lib64/ld-2.26.so.debug" does not match "/lib64/ld-linux-x86-64.so.2" (CRC mismatch).

warning: the debug information found in "/usr/lib/debug//lib64/libdl-2.26.so.debug" does not match "/lib64/libdl.so.2" (CRC mismatch).

warning: the debug information found in "/usr/lib/debug//usr/lib64/libdl-2.26.so.debug" does not match "/lib64/libdl.so.2" (CRC mismatch).

Missing separate debuginfo for /lib64/libudev.so.1
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/63/a671987ec4228503f63e2d795fc2940135756f.debug
warning: the debug information found in "/usr/lib/debug//lib64/libm-2.26.so.debug" does not match "/lib64/libm.so.6" (CRC mismatch).

warning: the debug information found in "/usr/lib/debug//usr/lib64/libm-2.26.so.debug" does not match "/lib64/libm.so.6" (CRC mismatch).

Core was generated by `/usr/lib/systemd/systemd --switched-root --system --deserialize 24'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fd875f27957 in kill () from /lib64/libc.so.6
[Current thread is 1 (LWP 19306)]
Missing separate debuginfos, use: dnf debuginfo-install audit-libs-2.8.2-1.fc27.x86_64 cryptsetup-libs-1.7.5-3.fc27.x86_64 device-mapper-libs-1.02.144-1.fc27.x86_64 glibc-2.26-24.fc27.x86_64 iptables-libs-1.6.1-4.fc27.x86_64 kmod-libs-25-1.fc27.x86_64 libacl-2.2.52-18.fc27.x86_64 libattr-2.4.47-21.fc27.x86_64 libblkid-2.30.2-1.fc27.x86_64 libcap-2.25-7.fc27.x86_64 libcap-ng-0.7.8-5.fc27.x86_64 libgcc-7.3.1-2.fc27.x86_64 libgcrypt-1.8.2-1.fc27.x86_64 libgpg-error-1.27-3.fc27.x86_64 libidn-1.33-4.fc27.x86_64 libmount-2.30.2-1.fc27.x86_64 libpcap-1.8.1-6.fc27.x86_64 libseccomp-2.3.3-1.fc27.x86_64 libselinux-2.7-3.fc27.x86_64 libsepol-2.7-2.fc27.x86_64 libuuid-2.30.2-1.fc27.x86_64 lz4-libs-1.8.0-1.fc27.x86_64 pam-1.3.0-6.fc27.x86_64 pcre2-10.30-6.fc27.x86_64 xz-libs-5.2.3-4.fc27.x86_64 zlib-1.2.11-4.fc27.x86_64
(gdb) bt
#0  0x00007fd875f27957 in kill () from /lib64/libc.so.6
#1  0x000055fea4cace56 in crash.lto_priv.214 (sig=6) at ../src/core/main.c:190
#2  <signal handler called>
#3  0x00007fd875f2766b in raise () from /lib64/libc.so.6
#4  0x00007fd875f29381 in abort () from /lib64/libc.so.6
#5  0x00007fd875f71a57 in __libc_message () from /lib64/libc.so.6
#6  0x00007fd875f789aa in malloc_printerr () from /lib64/libc.so.6
#7  0x00007fd875f7b518 in _int_free () from /lib64/libc.so.6
#8  0x00007fd875baf640 in hashmap_clear_free_free (h=h@entry=0x55fea55f0fe8)
    at ../src/basic/hashmap.c:899
#9  0x00007fd875baf71e in hashmap_free_free_free (h=<optimized out>, 
    h=<optimized out>) at ../src/basic/hashmap.c:852
#10 0x00007fd875be718e in ordered_hashmap_free_free_free ()
    at ../src/basic/hashmap.h:125
#11 sd_device_unref (device=<optimized out>, device=<optimized out>)
    at ../src/libsystemd/sd-device/sd-device.c:86
#12 0x00007fd875bf778d in udev_device_unref (udev_device=<optimized out>, 
    udev_device=<optimized out>) at ../src/libudev/libudev-device.c:552
#13 0x000055fea4c77825 in udev_device_unrefp () at ../src/shared/udev-util.h:26
#14 device_found_node (m=0x55fea55dbd70, node=<optimized out>, 
    add=<optimized out>, found=DEVICE_FOUND_MOUNT, now=<optimized out>)
    at ../src/core/device.c:816
#15 0x000055fea4c6b75a in mount_load_proc_self_mountinfo (
    m=m@entry=0x55fea55dbd70, set_flags=set_flags@entry=true)
    at ../src/core/mount.c:1627
#16 0x000055fea4c6c0ae in mount_dispatch_io (source=<optimized out>, 
    fd=<optimized out>, revents=<optimized out>, userdata=0x55fea55dbd70)
    at ../src/core/mount.c:1794
#17 0x00007fd875be1590 in source_dispatch (s=s@entry=0x55fea55de270)
    at ../src/libsystemd/sd-event/sd-event.c:2272
#18 0x00007fd875be17da in sd_event_dispatch (e=e@entry=0x55fea55da930)
    at ../src/libsystemd/sd-event/sd-event.c:2631
#19 0x00007fd875be1957 in sd_event_run (e=0x55fea55da930, 
    timeout=18446744073709551615) at ../src/libsystemd/sd-event/sd-event.c:2690
#20 0x000055fea4c975a2 in manager_loop (m=0x55fea55dbd70)
    at ../src/core/manager.c:2291
#21 0x000055fea4bf9ce1 in main (argc=5, argv=<optimized out>)
    at ../src/core/main.c:1937

Why isn't systemd two processes. A traditional init that does nothing but reap zombies and can restart the complicated chunk of code when it crashes like this?


Version-Release number of selected component (if applicable):
systemd-234-9.fc27.x86_64


How reproducible:
Just happened once so far

Steps to Reproduce:
1.no idea (but it seems to have been triggered by cron since it happened exactly at the hour).
2.
3.

Actual results:
systemd got hosed

Expected results:
systemd keeps running

Additional info:

Comment 1 Zbigniew Jędrzejewski-Szmek 2018-02-23 16:26:45 UTC
Thanks, that's a pretty good backtrace. I'll take a look at the code, but no promises: looks like some kind of memory corruption, and those are hard to debug, especially when non-repeatable.

> Why isn't systemd two processes. A traditional init that does nothing but reap zombies and can restart the complicated chunk of code when it crashes like this?

This has been discussed endlessly on the internets. This isn't the venue for this kind of discussions.

Comment 2 Zbigniew Jędrzejewski-Szmek 2018-11-21 13:31:54 UTC
We are trying to clean up the old udev code upstream, but it's going slowly (ETA is F30 or F31). F27 is too old at this point. Sorry not being able to handle this.


Note You need to log in before you can comment on or make changes to this bug.