Bug 2208240 - systemd can send non-UTF8 messages over DBus, causing DBus to disconnect it
Summary: systemd can send non-UTF8 messages over DBus, causing DBus to disconnect it
Keywords:
Status: ON_QA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: systemd
Version: 9.2
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Michal Sekletar
QA Contact: Frantisek Sumsal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-18 10:57 UTC by Michal Sekletar
Modified: 2023-07-24 13:48 UTC (History)
3 users (show)

Fixed In Version: systemd-252-16.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github redhat-plumbers systemd-rhel9 pull 167 0 None open (#2208240) core/mount: escape invalid UTF8 char in dbus reply 2023-05-18 15:00:41 UTC
Red Hat Issue Tracker RHELPLAN-157587 0 None None None 2023-05-18 10:58:41 UTC

Description Michal Sekletar 2023-05-18 10:57:33 UTC
This bug was initially created as a copy of Bug #2158724

I am copying this bug because: 



Description of problem:

This is a continuation of BZ #2158167 (related to PID recycling) and the condition making the latter be hit from time to time.

We have a customer seeing the scopes started for crond and sshd not getting destroyed inside systemd-logind when last PID exits.
This causes systemd-logind to "leak" sessions (state files remaining in /run/systemd/sessions in CLOSING state) and eventually systemd-logind to hit BZ #2158167 when PID recycling happens.
With the fix for BZ #2158167, the customer sees "Maximum number of sessions reached (8192)" after 1 day or so.

I requested an strace of systemd and systemd-logind to try to understand the condition where this would happen.
It appears the issue is due to having an issue within systemd which cannot abandon the scope:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Scope session-50854.scope is not running, cannot abandon.
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Due to this, systemd-logind never deletes its internal data related to the session.

I don't know how to reproduce the condition for now. Comparing customer's strace and my strace, I see a lot of differences in DBus messages being issued by systemd to systemd-logind:

Customer:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1     18:36:02.172067 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1H\0\0\0\24\35\3\0q\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\6\0\0\0JobNew\0\0\10\1g\0\3uos"..., iov_len=136}, {iov_base="\347.]\0%\0\0\0/org/freedesktop/systemd1/job/6106855\0\0\0\23\0\0\0session-50854.scope\0", iov_len=72}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 208 <0.000013>
1     18:36:02.179388 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1Q\0\0\0\26\35\3\0z\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\n\0\0\0JobRemoved\0\0\0\0\0\0"..., iov_len=144}, {iov_base="\347.]\0%\0\0\0/org/freedesktop/systemd1/job/6106855\0\0\0\23\0\0\0session-50854.scope\0\4\0\0\0done\0", iov_len=81}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 225 <0.000018>
1     18:36:02.623647 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\0015\0\0\0N\35\3\0o\0\0\0\1\1o\0\37\0\0\0/org/freedesktop/systemd1/agent\0\2\1s\0\36\0\0\0org.freedesktop.systemd1.Agent\0\0\3\1s\0\10\0\0\0Released\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", iov_len=128}, {iov_base="0\0\0\0/user.slice/user-11469.slice/session-50854.scope\0", iov_len=53}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 181 <0.000029>
1     18:36:02.628222 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\3\1\1>\0\0\0P\35\3\0_\0\0\0\5\1u\0\263i\0\0\6\1s\0\n\0\0\0:1.7022504\0\0\0\0\0\0\4\1s\0(\0\0\0org.freedesktop.systemd1.ScopeNotRunning\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", iov_len=112}, {iov_base="9\0\0\0Scope session-50854.scope is not running, cannot abandon.\0", iov_len=62}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 174 <0.000017>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Mine:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1     11:09:19.108569 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\365\0\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", iov_len=128}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
1     11:09:19.108642 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1@\0\0\0\366\0\0\0q\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\6\0\0\0JobNew\0\0\10\1g\0\3uos"..., iov_len=136}, {iov_base="\240\2\0\0!\0\0\0/org/freedesktop/systemd1/job/672\0\0\0\17\0\0\0session-3.scope\0", iov_len=64}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
1     11:09:19.113703 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1I\0\0\0\367\0\0\0z\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\n\0\0\0JobRemoved\0\0\0\0\0\0"..., iov_len=144}, {iov_base="\240\2\0\0!\0\0\0/org/freedesktop/systemd1/job/672\0\0\0\17\0\0\0session-3.scope\0\4\0\0\0done\0", iov_len=73}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 217 <0.000008>
1     11:09:19.225074 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\2\1\0\0x\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\v\0\0\0UnitRemoved\0\0\0\0\0"..., iov_len=136}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 211 <0.000013>
1     11:09:19.226435 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1-\0\0\0\3\1\0\0o\0\0\0\1\1o\0\37\0\0\0/org/freedesktop/systemd1/agent\0\2\1s\0\36\0\0\0org.freedesktop.systemd1.Agent\0\0\3\1s\0\10\0\0\0Released\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", iov_len=128}, {iov_base="(\0\0\0/user.slice/user-0.slice/session-3.scope\0", iov_len=45}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 173 <0.000019>
1     11:09:19.226783 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\5\1\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", iov_len=128}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 203 <0.000015>
1     11:09:19.226857 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\6\1\0\0x\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\v\0\0\0UnitRemoved\0\0\0\0\0"..., iov_len=136}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
1     11:09:19.227418 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\10\1\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", iov_len=128}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
1     11:09:19.227492 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\t\1\0\0x\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\v\0\0\0UnitRemoved\0\0\0\0\0"..., iov_len=136}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

MAJOR DIFFERENCE: I don't see any UnitNew/UnitRemoved messages on customer's system


Version-Release number of selected component (if applicable):

systemd-239.68.el8
Suspected to be also seen on older systemd (we had a case with another customer in the past, before issue disappeared completely)

How reproducible:

Often/Always on customer system, cannot reproduce locally


Note You need to log in before you can comment on or make changes to this bug.