RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1885553 - "systemd --user" can dump core upon session closing
Summary: "systemd --user" can dump core upon session closing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: systemd
Version: 8.1
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: 8.0
Assignee: David Tardon
QA Contact: Frantisek Sumsal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-06 11:44 UTC by Renaud Métrich
Modified: 2024-10-01 16:56 UTC (History)
8 users (show)

Fixed In Version: systemd-239-42.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-18 14:54:22 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github systemd-rhel rhel-8 pull 122 0 None closed (#1885553) sd-bus: break the loop in bus_ensure_running() if the bus is not conn… 2021-02-16 00:33:34 UTC
Github systemd systemd issues 17025 0 None closed coredump when exec "systemctl stop user@0" 2021-02-16 00:33:34 UTC
Red Hat Knowledge Base (Solution) 5463351 0 None None None 2020-10-06 23:43:30 UTC

Description Renaud Métrich 2020-10-06 11:44:38 UTC
Description of problem:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Core was generated by `/usr/lib/systemd/systemd --user'.
...
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f14adf6ccf5 in __GI_abort () at abort.c:79
#2  0x00007f14af6c0fbc in log_assert_failed_unreachable_realm (realm=realm@entry=LOG_REALM_SYSTEMD, 
    text=text@entry=0x7f14af817750 "Unknown state", 
    file=file@entry=0x7f14af8172d8 "../src/libsystemd/sd-bus/sd-bus.c", line=line@entry=2976, 
    func=func@entry=0x7f14af818330 <__PRETTY_FUNCTION__.15777> "bus_process_internal") at ../src/basic/log.c:818
#3  0x00007f14af7a0db7 in bus_process_internal (bus=0x555d82fbc570, hint_priority=<optimized out>, 
    priority=<optimized out>, ret=0x0) at ../src/libsystemd/sd-bus/sd-bus.c:2976
#4  0x00007f14af7a1122 in bus_ensure_running (bus=0x555d82fbc570) at ../src/libsystemd/sd-bus/sd-bus.c:2068
#5  0x00007f14af7a22d9 in sd_bus_flush (bus=0x555d82fbc570) at ../src/libsystemd/sd-bus/sd-bus.c:3100
#6  0x0000555d81b86ad7 in destroy_bus (bus=0x555d82fa4128, m=0x555d82fa3e40) at ../src/core/dbus.c:1087
#7  destroy_bus (m=0x555d82fa3e40, bus=0x555d82fa4128) at ../src/core/dbus.c:1046
#8  0x0000555d81bb5f05 in manager_recheck_dbus (m=0x555d82fa3e40) at ../src/core/manager.c:3858
#9  0x0000555d81beb6c7 in unit_notify (u=0x555d82feb450, os=UNIT_ACTIVE, ns=UNIT_DEACTIVATING, flags=<optimized out>)
    at ../src/core/unit.c:2480
#10 0x0000555d81bcc317 in service_stop (u=<optimized out>) at ../src/core/service.c:2382
#11 0x0000555d81b9879d in job_perform_on_unit (j=j@entry=0x7ffe74189548) at ../src/core/job.c:540
#12 0x0000555d81b9b75a in job_run_and_invalidate (j=<optimized out>) at ../src/core/job.c:597
#13 0x0000555d81bb276c in manager_dispatch_run_queue (source=<optimized out>, userdata=<optimized out>)
    at ../src/core/manager.c:2058
#14 manager_dispatch_run_queue (source=<optimized out>, userdata=0x555d82fa3e40) at ../src/core/manager.c:2047
#15 0x00007f14af7c8f42 in source_dispatch (s=s@entry=0x555d82fa48e0) at ../src/libsystemd/sd-event/sd-event.c:3147
#16 0x00007f14af7cadd5 in sd_event_dispatch (e=0x555d82fa4610) at ../src/libsystemd/sd-event/sd-event.c:3522
#17 0x00007f14af7caf60 in sd_event_run (e=0x555d82fa4610, timeout=18446744073709551615)
    at ../src/libsystemd/sd-event/sd-event.c:3579
#18 0x0000555d81bb8064 in manager_loop (m=m@entry=0x555d82fa3e40) at ../src/core/manager.c:2828
#19 0x0000555d81b7230e in invoke_main_loop (ret_error_message=0x7ffe74189718, 
    ret_switch_root_init=<synthetic pointer>, ret_switch_root_dir=<synthetic pointer>, ret_fds=0x7ffe74189728, 
    ret_shutdown_verb=<synthetic pointer>, ret_retval=<synthetic pointer>, ret_reexecute=<synthetic pointer>, 
    m=0x555d82fa3e40) at ../src/core/main.c:1630
#20 main (argc=2, argv=0x7ffe741899f8) at ../src/core/main.c:2417
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

From above, we can see that "systemd --user" died due to an assertion failure.

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) f 11
#11 0x0000555d81b9879d in job_perform_on_unit (j=j@entry=0x7ffe74189548) at ../src/core/job.c:540
540	                        r = unit_stop(u);
(gdb) p u->id
$1 = 0x555d82fc7340 "dbus.service"
(gdb) p u->fragment_
fragment_mtime  fragment_path   
(gdb) p u->fragment_path 
$2 = 0x555d82fec480 "/usr/lib/systemd/user/dbus.service"

2925 static int bus_process_internal(sd_bus *bus, bool hint_priority, int64_t priority, sd_bus_message **ret) {
2926         int r;
 :
2943         switch (bus->state) {
 :
2975         default:
2976                 assert_not_reached("Unknown state");
2977         }

(gdb) p bus->state
$4 = 21853

154 enum bus_state {
155         BUS_UNSET,
156         BUS_WATCH_BIND,      /* waiting for the socket to appear via inotify */
157         BUS_OPENING,         /* the kernel's connect() is still not ready */
158         BUS_AUTHENTICATING,  /* we are currently in the "SASL" authorization phase of dbus */
159         BUS_HELLO,           /* we are waiting for the Hello() response */
160         BUS_RUNNING,
161         BUS_CLOSING,
162         BUS_CLOSED,
163         _BUS_STATE_MAX,
164 };

(gdb) p bus->input_fd 
$6 = -1372549952
(gdb) p bus->n_ref 
$7 = {_value = 2197747697}
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

"systemd --user" was stopping user's "dbus.service" unit, very likely because "systemd --user" was shutting down due to user having no sessions anymore on the system.
The assertion failed because bus pointer contains corrupted memory (the state is invalid, so is the input_fs and likely the number of messages pending on the bus).

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) f 6
#6  0x0000555d81b86ad7 in destroy_bus (bus=0x555d82fa4128, m=0x555d82fa3e40) at ../src/core/dbus.c:1087
1087	                sd_bus_flush(*bus);
(gdb) p *bus
$9 = (sd_bus *) 0x0
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Frame 6 suggests that the "bus" object was freed (and partially reused) for something else, since the bus pointer (0x555d82fa4128) doesn't point to 0x555d82fbc570 (the pointer that created the crash) but 0x0.

It looks like the following Fedora BZ is being hit:

- https://bugzilla.redhat.com/show_bug.cgi?id=1575340

The BZ comment from systemd developer seems to indicate that this was fixed with systemd-238, which is older than what RHEL 8 provides (based on systemd-239:
"
There was a significant reworking of the bus code in https://github.com/systemd/systemd/commit/4502c40399. That patch is in systemd-238, and I was confused, because the backtraces in those three bugs show the old call sequence. So what must have happened here, is that an upgrade from older systemd was made to systemd-238, and then the older systemd crashed during reexec.
That said, there seems to be a memory corruption here, and destroy_bus is called on a bus that has already been deallocated. I think/hope the patch above fixes this.
There's another issue with the code, uncovered by this failure. See upstream PR https://github.com/systemd/systemd/pull/9640. Nevertheless, that patch wouldn't fix this problem anyway.
I'll close this on the assumption that 4502c40399 is enough to fix this. If a similar backtrace happens again, this will mean I was wrong ;)
"

Hence it looks like the developer is wrong and the issue is still present.

Very recently a similar isue was found Upstream: https://github.com/systemd/systemd/issues/17025
So far it's hard to say if it's same root cause however.


Version-Release number of selected component (if applicable):

systemd-239-18.el8_1.4.x86_64 (RHEL8.1) but also likely RHEL8.2

How reproducible:

Don't know. Happened at least one on customer's system

Comment 1 Christian Horn 2020-10-15 06:22:24 UTC
At the immediate level: the crash should be solved.

Looking that from a higher level,
- why is pulseaudio triggered when someone logs in via SSH?
- why is pulseaudio trying to activate bluetooth?
  If all of this is happening on a desktop, a user logging into wayland/xorg
  makes sense to trigger pulseaudio/bluetooth.

Here in this case also these packages make sense: customer might want to
run Gnome on vncserver, and that depends on pulseaudio and bluetooth.  But
at least bluetooth hardware is not in this server, maybe bluetooth rpms
should detect that upon package installation and properly disable bluetooth
daemons?

Comment 2 Renaud Métrich 2020-10-15 06:46:59 UTC
Hi Christian,

Below is what happens when a user logs in through SSH or Xorg/Wayland

1. user@<uid>.service executes

  this spawns "systemd --user"

2. "systemd --user" sets up pulseaudio.socket (listening on /run/user/<uid>/pulse/native) as part of reaching sockets.target

  this is because pulseaudio.socket is installed (/usr/lib/systemd/user/sockets.target.wants/pulseaudio.socket)

3. "systemd --user" sets up pulseaudio.service as part of reaching default.target

  this is because pulseaudio.service is installed (/etc/systemd/user/default.target.wants/pulseaudio.service)

To avoid step 3 which is the root cause, we could just modify the pulseaudio.service and preset to not have it enabled automatically.
This should not be required since there is the pulseaudio socket installed, so if someone requires sounds, it will start it.


Regarding pulseaudio and bluetooth, I guess this happens because there may be bluetooth backends for pulseaudio.

Comment 3 Christian Horn 2020-10-15 06:56:41 UTC
Thanks, Renaud.

I was just wondering if this whole "let's just start this and that
in case it's needed" stack here can be made more bullet proof.
When customers log in via SSH, they would not expect pulseaudio
triggering bluetooth, as that hardware does not even exist in their 
server.  Also having no pulseaudio spawned would go unnoticed by
at least 99% of the users logging in via ssh.

Comment 4 Lukáš Nykrýn 2020-11-02 13:49:28 UTC
fix merged to github master branch -> https://github.com/systemd-rhel/rhel-8/pull/122

Comment 11 errata-xmlrpc 2021-05-18 14:54:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: systemd security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1611


Note You need to log in before you can comment on or make changes to this bug.