Bug 2208240

Summary: systemd can send non-UTF8 messages over DBus, causing DBus to disconnect it
Product: Red Hat Enterprise Linux 9 Reporter: Michal Sekletar <msekleta>
Component: systemdAssignee: Michal Sekletar <msekleta>
Status: ON_QA --- QA Contact: Frantisek Sumsal <fsumsal>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 9.2CC: dtardon, jamacku, systemd-maint-list
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: systemd-252-16.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michal Sekletar 2023-05-18 10:57:33 UTC
This bug was initially created as a copy of Bug #2158724

I am copying this bug because: 



Description of problem:

This is a continuation of BZ #2158167 (related to PID recycling) and the condition making the latter be hit from time to time.

We have a customer seeing the scopes started for crond and sshd not getting destroyed inside systemd-logind when last PID exits.
This causes systemd-logind to "leak" sessions (state files remaining in /run/systemd/sessions in CLOSING state) and eventually systemd-logind to hit BZ #2158167 when PID recycling happens.
With the fix for BZ #2158167, the customer sees "Maximum number of sessions reached (8192)" after 1 day or so.

I requested an strace of systemd and systemd-logind to try to understand the condition where this would happen.
It appears the issue is due to having an issue within systemd which cannot abandon the scope:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Scope session-50854.scope is not running, cannot abandon.
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Due to this, systemd-logind never deletes its internal data related to the session.

I don't know how to reproduce the condition for now. Comparing customer's strace and my strace, I see a lot of differences in DBus messages being issued by systemd to systemd-logind:

Customer:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1     18:36:02.172067 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1H\0\0\0\24\35\3\0q\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\6\0\0\0JobNew\0\0\10\1g\0\3uos"..., iov_len=136}, {iov_base="\347.]\0%\0\0\0/org/freedesktop/systemd1/job/6106855\0\0\0\23\0\0\0session-50854.scope\0", iov_len=72}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 208 <0.000013>
1     18:36:02.179388 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1Q\0\0\0\26\35\3\0z\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\n\0\0\0JobRemoved\0\0\0\0\0\0"..., iov_len=144}, {iov_base="\347.]\0%\0\0\0/org/freedesktop/systemd1/job/6106855\0\0\0\23\0\0\0session-50854.scope\0\4\0\0\0done\0", iov_len=81}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 225 <0.000018>
1     18:36:02.623647 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\0015\0\0\0N\35\3\0o\0\0\0\1\1o\0\37\0\0\0/org/freedesktop/systemd1/agent\0\2\1s\0\36\0\0\0org.freedesktop.systemd1.Agent\0\0\3\1s\0\10\0\0\0Released\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", iov_len=128}, {iov_base="0\0\0\0/user.slice/user-11469.slice/session-50854.scope\0", iov_len=53}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 181 <0.000029>
1     18:36:02.628222 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\3\1\1>\0\0\0P\35\3\0_\0\0\0\5\1u\0\263i\0\0\6\1s\0\n\0\0\0:1.7022504\0\0\0\0\0\0\4\1s\0(\0\0\0org.freedesktop.systemd1.ScopeNotRunning\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", iov_len=112}, {iov_base="9\0\0\0Scope session-50854.scope is not running, cannot abandon.\0", iov_len=62}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 174 <0.000017>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Mine:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1     11:09:19.108569 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\365\0\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", iov_len=128}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
1     11:09:19.108642 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1@\0\0\0\366\0\0\0q\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\6\0\0\0JobNew\0\0\10\1g\0\3uos"..., iov_len=136}, {iov_base="\240\2\0\0!\0\0\0/org/freedesktop/systemd1/job/672\0\0\0\17\0\0\0session-3.scope\0", iov_len=64}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
1     11:09:19.113703 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1I\0\0\0\367\0\0\0z\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\n\0\0\0JobRemoved\0\0\0\0\0\0"..., iov_len=144}, {iov_base="\240\2\0\0!\0\0\0/org/freedesktop/systemd1/job/672\0\0\0\17\0\0\0session-3.scope\0\4\0\0\0done\0", iov_len=73}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 217 <0.000008>
1     11:09:19.225074 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\2\1\0\0x\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\v\0\0\0UnitRemoved\0\0\0\0\0"..., iov_len=136}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 211 <0.000013>
1     11:09:19.226435 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1-\0\0\0\3\1\0\0o\0\0\0\1\1o\0\37\0\0\0/org/freedesktop/systemd1/agent\0\2\1s\0\36\0\0\0org.freedesktop.systemd1.Agent\0\0\3\1s\0\10\0\0\0Released\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", iov_len=128}, {iov_base="(\0\0\0/user.slice/user-0.slice/session-3.scope\0", iov_len=45}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 173 <0.000019>
1     11:09:19.226783 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\5\1\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", iov_len=128}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 203 <0.000015>
1     11:09:19.226857 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\6\1\0\0x\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\v\0\0\0UnitRemoved\0\0\0\0\0"..., iov_len=136}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
1     11:09:19.227418 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\10\1\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", iov_len=128}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
1     11:09:19.227492 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\t\1\0\0x\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\v\0\0\0UnitRemoved\0\0\0\0\0"..., iov_len=136}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

MAJOR DIFFERENCE: I don't see any UnitNew/UnitRemoved messages on customer's system


Version-Release number of selected component (if applicable):

systemd-239.68.el8
Suspected to be also seen on older systemd (we had a case with another customer in the past, before issue disappeared completely)

How reproducible:

Often/Always on customer system, cannot reproduce locally