Bug 1885553
| Summary: | "systemd --user" can dump core upon session closing | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Renaud Métrich <rmetrich> |
| Component: | systemd | Assignee: | David Tardon <dtardon> |
| Status: | CLOSED ERRATA | QA Contact: | Frantisek Sumsal <fsumsal> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 8.1 | CC: | chorn, dtardon, hmatsumo, jamacku, pkhedeka, qguo, sbroz, systemd-maint-list |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | 8.0 | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | systemd-239-42.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-18 14:54:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
At the immediate level: the crash should be solved. Looking that from a higher level, - why is pulseaudio triggered when someone logs in via SSH? - why is pulseaudio trying to activate bluetooth? If all of this is happening on a desktop, a user logging into wayland/xorg makes sense to trigger pulseaudio/bluetooth. Here in this case also these packages make sense: customer might want to run Gnome on vncserver, and that depends on pulseaudio and bluetooth. But at least bluetooth hardware is not in this server, maybe bluetooth rpms should detect that upon package installation and properly disable bluetooth daemons? Hi Christian, Below is what happens when a user logs in through SSH or Xorg/Wayland 1. user@<uid>.service executes this spawns "systemd --user" 2. "systemd --user" sets up pulseaudio.socket (listening on /run/user/<uid>/pulse/native) as part of reaching sockets.target this is because pulseaudio.socket is installed (/usr/lib/systemd/user/sockets.target.wants/pulseaudio.socket) 3. "systemd --user" sets up pulseaudio.service as part of reaching default.target this is because pulseaudio.service is installed (/etc/systemd/user/default.target.wants/pulseaudio.service) To avoid step 3 which is the root cause, we could just modify the pulseaudio.service and preset to not have it enabled automatically. This should not be required since there is the pulseaudio socket installed, so if someone requires sounds, it will start it. Regarding pulseaudio and bluetooth, I guess this happens because there may be bluetooth backends for pulseaudio. Thanks, Renaud. I was just wondering if this whole "let's just start this and that in case it's needed" stack here can be made more bullet proof. When customers log in via SSH, they would not expect pulseaudio triggering bluetooth, as that hardware does not even exist in their server. Also having no pulseaudio spawned would go unnoticed by at least 99% of the users logging in via ssh. fix merged to github master branch -> https://github.com/systemd-rhel/rhel-8/pull/122 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: systemd security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1611 |
Description of problem: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Core was generated by `/usr/lib/systemd/systemd --user'. ... (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f14adf6ccf5 in __GI_abort () at abort.c:79 #2 0x00007f14af6c0fbc in log_assert_failed_unreachable_realm (realm=realm@entry=LOG_REALM_SYSTEMD, text=text@entry=0x7f14af817750 "Unknown state", file=file@entry=0x7f14af8172d8 "../src/libsystemd/sd-bus/sd-bus.c", line=line@entry=2976, func=func@entry=0x7f14af818330 <__PRETTY_FUNCTION__.15777> "bus_process_internal") at ../src/basic/log.c:818 #3 0x00007f14af7a0db7 in bus_process_internal (bus=0x555d82fbc570, hint_priority=<optimized out>, priority=<optimized out>, ret=0x0) at ../src/libsystemd/sd-bus/sd-bus.c:2976 #4 0x00007f14af7a1122 in bus_ensure_running (bus=0x555d82fbc570) at ../src/libsystemd/sd-bus/sd-bus.c:2068 #5 0x00007f14af7a22d9 in sd_bus_flush (bus=0x555d82fbc570) at ../src/libsystemd/sd-bus/sd-bus.c:3100 #6 0x0000555d81b86ad7 in destroy_bus (bus=0x555d82fa4128, m=0x555d82fa3e40) at ../src/core/dbus.c:1087 #7 destroy_bus (m=0x555d82fa3e40, bus=0x555d82fa4128) at ../src/core/dbus.c:1046 #8 0x0000555d81bb5f05 in manager_recheck_dbus (m=0x555d82fa3e40) at ../src/core/manager.c:3858 #9 0x0000555d81beb6c7 in unit_notify (u=0x555d82feb450, os=UNIT_ACTIVE, ns=UNIT_DEACTIVATING, flags=<optimized out>) at ../src/core/unit.c:2480 #10 0x0000555d81bcc317 in service_stop (u=<optimized out>) at ../src/core/service.c:2382 #11 0x0000555d81b9879d in job_perform_on_unit (j=j@entry=0x7ffe74189548) at ../src/core/job.c:540 #12 0x0000555d81b9b75a in job_run_and_invalidate (j=<optimized out>) at ../src/core/job.c:597 #13 0x0000555d81bb276c in manager_dispatch_run_queue (source=<optimized out>, userdata=<optimized out>) at ../src/core/manager.c:2058 #14 manager_dispatch_run_queue (source=<optimized out>, userdata=0x555d82fa3e40) at ../src/core/manager.c:2047 #15 0x00007f14af7c8f42 in source_dispatch (s=s@entry=0x555d82fa48e0) at ../src/libsystemd/sd-event/sd-event.c:3147 #16 0x00007f14af7cadd5 in sd_event_dispatch (e=0x555d82fa4610) at ../src/libsystemd/sd-event/sd-event.c:3522 #17 0x00007f14af7caf60 in sd_event_run (e=0x555d82fa4610, timeout=18446744073709551615) at ../src/libsystemd/sd-event/sd-event.c:3579 #18 0x0000555d81bb8064 in manager_loop (m=m@entry=0x555d82fa3e40) at ../src/core/manager.c:2828 #19 0x0000555d81b7230e in invoke_main_loop (ret_error_message=0x7ffe74189718, ret_switch_root_init=<synthetic pointer>, ret_switch_root_dir=<synthetic pointer>, ret_fds=0x7ffe74189728, ret_shutdown_verb=<synthetic pointer>, ret_retval=<synthetic pointer>, ret_reexecute=<synthetic pointer>, m=0x555d82fa3e40) at ../src/core/main.c:1630 #20 main (argc=2, argv=0x7ffe741899f8) at ../src/core/main.c:2417 -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- From above, we can see that "systemd --user" died due to an assertion failure. -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- (gdb) f 11 #11 0x0000555d81b9879d in job_perform_on_unit (j=j@entry=0x7ffe74189548) at ../src/core/job.c:540 540 r = unit_stop(u); (gdb) p u->id $1 = 0x555d82fc7340 "dbus.service" (gdb) p u->fragment_ fragment_mtime fragment_path (gdb) p u->fragment_path $2 = 0x555d82fec480 "/usr/lib/systemd/user/dbus.service" 2925 static int bus_process_internal(sd_bus *bus, bool hint_priority, int64_t priority, sd_bus_message **ret) { 2926 int r; : 2943 switch (bus->state) { : 2975 default: 2976 assert_not_reached("Unknown state"); 2977 } (gdb) p bus->state $4 = 21853 154 enum bus_state { 155 BUS_UNSET, 156 BUS_WATCH_BIND, /* waiting for the socket to appear via inotify */ 157 BUS_OPENING, /* the kernel's connect() is still not ready */ 158 BUS_AUTHENTICATING, /* we are currently in the "SASL" authorization phase of dbus */ 159 BUS_HELLO, /* we are waiting for the Hello() response */ 160 BUS_RUNNING, 161 BUS_CLOSING, 162 BUS_CLOSED, 163 _BUS_STATE_MAX, 164 }; (gdb) p bus->input_fd $6 = -1372549952 (gdb) p bus->n_ref $7 = {_value = 2197747697} -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- "systemd --user" was stopping user's "dbus.service" unit, very likely because "systemd --user" was shutting down due to user having no sessions anymore on the system. The assertion failed because bus pointer contains corrupted memory (the state is invalid, so is the input_fs and likely the number of messages pending on the bus). -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- (gdb) f 6 #6 0x0000555d81b86ad7 in destroy_bus (bus=0x555d82fa4128, m=0x555d82fa3e40) at ../src/core/dbus.c:1087 1087 sd_bus_flush(*bus); (gdb) p *bus $9 = (sd_bus *) 0x0 -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Frame 6 suggests that the "bus" object was freed (and partially reused) for something else, since the bus pointer (0x555d82fa4128) doesn't point to 0x555d82fbc570 (the pointer that created the crash) but 0x0. It looks like the following Fedora BZ is being hit: - https://bugzilla.redhat.com/show_bug.cgi?id=1575340 The BZ comment from systemd developer seems to indicate that this was fixed with systemd-238, which is older than what RHEL 8 provides (based on systemd-239: " There was a significant reworking of the bus code in https://github.com/systemd/systemd/commit/4502c40399. That patch is in systemd-238, and I was confused, because the backtraces in those three bugs show the old call sequence. So what must have happened here, is that an upgrade from older systemd was made to systemd-238, and then the older systemd crashed during reexec. That said, there seems to be a memory corruption here, and destroy_bus is called on a bus that has already been deallocated. I think/hope the patch above fixes this. There's another issue with the code, uncovered by this failure. See upstream PR https://github.com/systemd/systemd/pull/9640. Nevertheless, that patch wouldn't fix this problem anyway. I'll close this on the assumption that 4502c40399 is enough to fix this. If a similar backtrace happens again, this will mean I was wrong ;) " Hence it looks like the developer is wrong and the issue is still present. Very recently a similar isue was found Upstream: https://github.com/systemd/systemd/issues/17025 So far it's hard to say if it's same root cause however. Version-Release number of selected component (if applicable): systemd-239-18.el8_1.4.x86_64 (RHEL8.1) but also likely RHEL8.2 How reproducible: Don't know. Happened at least one on customer's system