Bug 2158724 - systemd can send non-UTF8 messages over DBus, causing DBus to disconnect it
Summary: systemd can send non-UTF8 messages over DBus, causing DBus to disconnect it
Keywords:
Status: ON_QA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: systemd
Version: 8.7
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Michal Sekletar
QA Contact: Frantisek Sumsal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-06 10:20 UTC by Renaud Métrich
Modified: 2023-07-24 07:44 UTC (History)
7 users (show)

Fixed In Version: systemd-239-77.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github redhat-plumbers systemd-rhel8 pull 389 0 None open (#2158724) core/mount: escape invalid UTF8 char in dbus reply 2023-07-10 14:30:14 UTC
Red Hat Issue Tracker RHELPLAN-144102 0 None None None 2023-01-06 10:26:01 UTC
Red Hat Knowledge Base (Solution) 6993080 0 None None None 2023-01-06 10:27:52 UTC

Description Renaud Métrich 2023-01-06 10:20:41 UTC
Description of problem:

This is a continuation of BZ #2158167 (related to PID recycling) and the condition making the latter be hit from time to time.

We have a customer seeing the scopes started for crond and sshd not getting destroyed inside systemd-logind when last PID exits.
This causes systemd-logind to "leak" sessions (state files remaining in /run/systemd/sessions in CLOSING state) and eventually systemd-logind to hit BZ #2158167 when PID recycling happens.
With the fix for BZ #2158167, the customer sees "Maximum number of sessions reached (8192)" after 1 day or so.

I requested an strace of systemd and systemd-logind to try to understand the condition where this would happen.
It appears the issue is due to having an issue within systemd which cannot abandon the scope:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Scope session-50854.scope is not running, cannot abandon.
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Due to this, systemd-logind never deletes its internal data related to the session.

I don't know how to reproduce the condition for now. Comparing customer's strace and my strace, I see a lot of differences in DBus messages being issued by systemd to systemd-logind:

Customer:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1     18:36:02.172067 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1H\0\0\0\24\35\3\0q\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\6\0\0\0JobNew\0\0\10\1g\0\3uos"..., iov_len=136}, {iov_base="\347.]\0%\0\0\0/org/freedesktop/systemd1/job/6106855\0\0\0\23\0\0\0session-50854.scope\0", iov_len=72}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 208 <0.000013>
1     18:36:02.179388 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1Q\0\0\0\26\35\3\0z\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\n\0\0\0JobRemoved\0\0\0\0\0\0"..., iov_len=144}, {iov_base="\347.]\0%\0\0\0/org/freedesktop/systemd1/job/6106855\0\0\0\23\0\0\0session-50854.scope\0\4\0\0\0done\0", iov_len=81}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 225 <0.000018>
1     18:36:02.623647 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\0015\0\0\0N\35\3\0o\0\0\0\1\1o\0\37\0\0\0/org/freedesktop/systemd1/agent\0\2\1s\0\36\0\0\0org.freedesktop.systemd1.Agent\0\0\3\1s\0\10\0\0\0Released\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", iov_len=128}, {iov_base="0\0\0\0/user.slice/user-11469.slice/session-50854.scope\0", iov_len=53}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 181 <0.000029>
1     18:36:02.628222 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\3\1\1>\0\0\0P\35\3\0_\0\0\0\5\1u\0\263i\0\0\6\1s\0\n\0\0\0:1.7022504\0\0\0\0\0\0\4\1s\0(\0\0\0org.freedesktop.systemd1.ScopeNotRunning\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", iov_len=112}, {iov_base="9\0\0\0Scope session-50854.scope is not running, cannot abandon.\0", iov_len=62}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 174 <0.000017>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Mine:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1     11:09:19.108569 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\365\0\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", iov_len=128}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
1     11:09:19.108642 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1@\0\0\0\366\0\0\0q\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\6\0\0\0JobNew\0\0\10\1g\0\3uos"..., iov_len=136}, {iov_base="\240\2\0\0!\0\0\0/org/freedesktop/systemd1/job/672\0\0\0\17\0\0\0session-3.scope\0", iov_len=64}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
1     11:09:19.113703 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1I\0\0\0\367\0\0\0z\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\n\0\0\0JobRemoved\0\0\0\0\0\0"..., iov_len=144}, {iov_base="\240\2\0\0!\0\0\0/org/freedesktop/systemd1/job/672\0\0\0\17\0\0\0session-3.scope\0\4\0\0\0done\0", iov_len=73}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 217 <0.000008>
1     11:09:19.225074 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\2\1\0\0x\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\v\0\0\0UnitRemoved\0\0\0\0\0"..., iov_len=136}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 211 <0.000013>
1     11:09:19.226435 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1-\0\0\0\3\1\0\0o\0\0\0\1\1o\0\37\0\0\0/org/freedesktop/systemd1/agent\0\2\1s\0\36\0\0\0org.freedesktop.systemd1.Agent\0\0\3\1s\0\10\0\0\0Released\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", iov_len=128}, {iov_base="(\0\0\0/user.slice/user-0.slice/session-3.scope\0", iov_len=45}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 173 <0.000019>
1     11:09:19.226783 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\5\1\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", iov_len=128}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 203 <0.000015>
1     11:09:19.226857 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\6\1\0\0x\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\v\0\0\0UnitRemoved\0\0\0\0\0"..., iov_len=136}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
1     11:09:19.227418 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\10\1\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", iov_len=128}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
1     11:09:19.227492 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\t\1\0\0x\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\v\0\0\0UnitRemoved\0\0\0\0\0"..., iov_len=136}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

MAJOR DIFFERENCE: I don't see any UnitNew/UnitRemoved messages on customer's system


Version-Release number of selected component (if applicable):

systemd-239.68.el8
Suspected to be also seen on older systemd (we had a case with another customer in the past, before issue disappeared completely)

How reproducible:

Often/Always on customer system, cannot reproduce locally

Comment 16 Renaud Métrich 2023-01-20 12:38:51 UTC
UPDATED DESCRIPTION:

The DBus message sent upon mounting a CIFS share can be broken, depending on the output of `/proc/self/mountinfo` and `/run/mount/utab` systemd relies on to build the message:
in case the CIFS share is presented by a NetApp appliance and contains non-ascii characters (e.g. "ü") and charset "iso8859-1" is used, systemd sends a non-UTF8 message through the bus, causing DBus to disconnect it.

Example of CIFS share listed in /proc/self/mountinfo (redacted):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
//netapp.server/path/to/share/containing/umlaut_xxx�xxx /mount/point cifs rw,relatime,vers=3.0,xxx,iocharset=iso8859-1,xxx 0 0
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

I think we need to ensure that systemd always sends properly formatted messages to avoid any DBus breakage.

See also related BZs:
- #2162660 (DBus)
- #2162655 (Kernel/CIFS)

Comment 17 Renaud Métrich 2023-01-20 14:57:28 UTC
REPRODUCER BELOW:

We can reproduce through having a CIFS share, which we then hack (using systemtap) to push a non-UTF8 character, see below.

CIFS setup (local):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# yum -y install samba
# cat >> /etc/samba/smb.conf << EOF
[cifs]
	path = /cifs
	browseable = Yes
	read only = No
	guest ok = Yes
EOF

# mkdir -p /cifs/ssXss
# echo "//localhost/cifs/ssXss	/mnt			cifs	guest,noauto	0 0" >> /etc/fstab

# systemctl enable smb --now
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

SYSTEMTAP SCRIPT (let it run):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# yum -y install systemtap
# stap-prep

# stap -v -g -e 'probe module("cifs").statement("cifs_show_devname@fs/cifs/cifsfs.c:480") { $devname[19] = 129; printf("new devname: %s\n", kernel_string_quoted($devname)); }'
...
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

The script replaces "ssXss" by "ssüss" once mount is done.

Corresponding source in kernel-4.18.0-425.10.1.el8_7.x86_64 (we are interested in hacking "devname" variable to report something not UTF8 in /proc/mounts):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
 475 static int cifs_show_devname(struct seq_file *m, struct dentry *root)
 476 {
 477         struct cifs_sb_info *cifs_sb = CIFS_SB(root->d_sb);
 478         char *devname = kstrdup(cifs_sb->ctx->source, GFP_KERNEL);
 479 
 480         if (devname == NULL)
 481                 seq_puts(m, "none");
 482         else {
 483                 convert_delimiter(devname, '/');
 484                 /* escape all spaces in share names */
 485                 seq_escape(m, devname, " \t");
 486                 kfree(devname);
 487         }
 488         return 0;
 489 }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

####################
 REPRODUCING STEPS:
####################

1. Check there is some "subscribed" client in systemd (i.e. systemd-logind)

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# gdb -ex "set pagination off" -ex "set confirm off" -ex "f 3" -ex "p m->subscribed" -ex "q" /usr/lib/systemd/systemd 1
[...]
#3  0x0000563e699efb62 in manager_loop (m=m@entry=0x563e6a06ae50) at ../src/core/manager.c:2898
2898	                r = sd_event_run(m->event, wait_usec);
$1 = (sd_bus_track *) 0x563e6a238130
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

2. Start systemtap and let it run

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# stap -v -g -e 'probe module("cifs").statement("cifs_show_devname@fs/cifs/cifsfs.c:480") { $devname[19] = 129; printf("new devname: %s\n", kernel_string_quoted($devname)); }'
...
Pass 5: starting run.
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

3. Mount the local CIFS share

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# mount /mnt
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Systemtap will report:
new devname: "//localhost/cifs/ss\201ss"

showing it's hacking the output as planned.

4. Login on the system as a new user (to trigger some dbus traffic AND new mounts due to the new user)

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
$ ssh admin@localhost true
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

5. Check systemd got disconnected from the bus (it has no "subscribed" clients anymore)

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# gdb -ex "set pagination off" -ex "set confirm off" -ex "f 3" -ex "p m->subscribed" -ex "q" /usr/lib/systemd/systemd 1
...
#3  0x0000563e699efb62 in manager_loop (m=m@entry=0x563e6a06ae50) at ../src/core/manager.c:2898
2898	                r = sd_event_run(m->event, wait_usec);
$1 = (sd_bus_track *) 0x0
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

At step 5 we see the leaks starts happening, sessions in /run/systemd/sessions are not destroyed anymore since systemd-logind doesn't receive notifications anymore:

# ssh admin@localhost true
# ssh admin@localhost true
# ssh admin@localhost true
# ll /run/systemd/sessions/
total 24
-rw-r--r-- 1 root root 322 Jan 20 15:49 1
-rw-r--r-- 1 root root 322 Jan 20 15:49 2
-rw-r--r-- 1 root root 306 Jan 20 15:54 5
-rw-r--r-- 1 root root 252 Jan 20 15:56 6
-rw-r--r-- 1 root root 252 Jan 20 15:56 7
-rw-r--r-- 1 root root 252 Jan 20 15:56 8

Comment 18 Plumber Bot 2023-07-13 22:12:00 UTC
fix merged to github main branch -> https://github.com/redhat-plumbers/systemd-rhel8/pull/389


Note You need to log in before you can comment on or make changes to this bug.