Description of problem: This is a continuation of BZ #2158167 (related to PID recycling) and the condition making the latter be hit from time to time. We have a customer seeing the scopes started for crond and sshd not getting destroyed inside systemd-logind when last PID exits. This causes systemd-logind to "leak" sessions (state files remaining in /run/systemd/sessions in CLOSING state) and eventually systemd-logind to hit BZ #2158167 when PID recycling happens. With the fix for BZ #2158167, the customer sees "Maximum number of sessions reached (8192)" after 1 day or so. I requested an strace of systemd and systemd-logind to try to understand the condition where this would happen. It appears the issue is due to having an issue within systemd which cannot abandon the scope: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Scope session-50854.scope is not running, cannot abandon. -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Due to this, systemd-logind never deletes its internal data related to the session. I don't know how to reproduce the condition for now. Comparing customer's strace and my strace, I see a lot of differences in DBus messages being issued by systemd to systemd-logind: Customer: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 1 18:36:02.172067 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1H\0\0\0\24\35\3\0q\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\6\0\0\0JobNew\0\0\10\1g\0\3uos"..., iov_len=136}, {iov_base="\347.]\0%\0\0\0/org/freedesktop/systemd1/job/6106855\0\0\0\23\0\0\0session-50854.scope\0", iov_len=72}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 208 <0.000013> 1 18:36:02.179388 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1Q\0\0\0\26\35\3\0z\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\n\0\0\0JobRemoved\0\0\0\0\0\0"..., iov_len=144}, {iov_base="\347.]\0%\0\0\0/org/freedesktop/systemd1/job/6106855\0\0\0\23\0\0\0session-50854.scope\0\4\0\0\0done\0", iov_len=81}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 225 <0.000018> 1 18:36:02.623647 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\0015\0\0\0N\35\3\0o\0\0\0\1\1o\0\37\0\0\0/org/freedesktop/systemd1/agent\0\2\1s\0\36\0\0\0org.freedesktop.systemd1.Agent\0\0\3\1s\0\10\0\0\0Released\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", iov_len=128}, {iov_base="0\0\0\0/user.slice/user-11469.slice/session-50854.scope\0", iov_len=53}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 181 <0.000029> 1 18:36:02.628222 sendmsg(19<UNIX:[397683477->397687257]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\3\1\1>\0\0\0P\35\3\0_\0\0\0\5\1u\0\263i\0\0\6\1s\0\n\0\0\0:1.7022504\0\0\0\0\0\0\4\1s\0(\0\0\0org.freedesktop.systemd1.ScopeNotRunning\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", iov_len=112}, {iov_base="9\0\0\0Scope session-50854.scope is not running, cannot abandon.\0", iov_len=62}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 174 <0.000017> -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Mine: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 1 11:09:19.108569 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\365\0\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", iov_len=128}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...> 1 11:09:19.108642 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1@\0\0\0\366\0\0\0q\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\6\0\0\0JobNew\0\0\10\1g\0\3uos"..., iov_len=136}, {iov_base="\240\2\0\0!\0\0\0/org/freedesktop/systemd1/job/672\0\0\0\17\0\0\0session-3.scope\0", iov_len=64}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...> 1 11:09:19.113703 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1I\0\0\0\367\0\0\0z\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\n\0\0\0JobRemoved\0\0\0\0\0\0"..., iov_len=144}, {iov_base="\240\2\0\0!\0\0\0/org/freedesktop/systemd1/job/672\0\0\0\17\0\0\0session-3.scope\0\4\0\0\0done\0", iov_len=73}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 217 <0.000008> 1 11:09:19.225074 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\2\1\0\0x\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\v\0\0\0UnitRemoved\0\0\0\0\0"..., iov_len=136}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 211 <0.000013> 1 11:09:19.226435 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1-\0\0\0\3\1\0\0o\0\0\0\1\1o\0\37\0\0\0/org/freedesktop/systemd1/agent\0\2\1s\0\36\0\0\0org.freedesktop.systemd1.Agent\0\0\3\1s\0\10\0\0\0Released\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", iov_len=128}, {iov_base="(\0\0\0/user.slice/user-0.slice/session-3.scope\0", iov_len=45}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 173 <0.000019> 1 11:09:19.226783 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\5\1\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", iov_len=128}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 203 <0.000015> 1 11:09:19.226857 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\6\1\0\0x\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\v\0\0\0UnitRemoved\0\0\0\0\0"..., iov_len=136}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...> 1 11:09:19.227418 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\10\1\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", iov_len=128}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...> 1 11:09:19.227492 sendmsg(51<UNIX:[17880->24141]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1K\0\0\0\t\1\0\0x\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\v\0\0\0UnitRemoved\0\0\0\0\0"..., iov_len=136}, {iov_base="\17\0\0\0session-3.scope\0002\0\0\0/org/freedesktop/systemd1/unit/session_2d3_2escope\0", iov_len=75}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...> -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- MAJOR DIFFERENCE: I don't see any UnitNew/UnitRemoved messages on customer's system Version-Release number of selected component (if applicable): systemd-239.68.el8 Suspected to be also seen on older systemd (we had a case with another customer in the past, before issue disappeared completely) How reproducible: Often/Always on customer system, cannot reproduce locally
UPDATED DESCRIPTION: The DBus message sent upon mounting a CIFS share can be broken, depending on the output of `/proc/self/mountinfo` and `/run/mount/utab` systemd relies on to build the message: in case the CIFS share is presented by a NetApp appliance and contains non-ascii characters (e.g. "ü") and charset "iso8859-1" is used, systemd sends a non-UTF8 message through the bus, causing DBus to disconnect it. Example of CIFS share listed in /proc/self/mountinfo (redacted): -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- //netapp.server/path/to/share/containing/umlaut_xxx�xxx /mount/point cifs rw,relatime,vers=3.0,xxx,iocharset=iso8859-1,xxx 0 0 -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- I think we need to ensure that systemd always sends properly formatted messages to avoid any DBus breakage. See also related BZs: - #2162660 (DBus) - #2162655 (Kernel/CIFS)
REPRODUCER BELOW: We can reproduce through having a CIFS share, which we then hack (using systemtap) to push a non-UTF8 character, see below. CIFS setup (local): -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # yum -y install samba # cat >> /etc/samba/smb.conf << EOF [cifs] path = /cifs browseable = Yes read only = No guest ok = Yes EOF # mkdir -p /cifs/ssXss # echo "//localhost/cifs/ssXss /mnt cifs guest,noauto 0 0" >> /etc/fstab # systemctl enable smb --now -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- SYSTEMTAP SCRIPT (let it run): -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # yum -y install systemtap # stap-prep # stap -v -g -e 'probe module("cifs").statement("cifs_show_devname@fs/cifs/cifsfs.c:480") { $devname[19] = 129; printf("new devname: %s\n", kernel_string_quoted($devname)); }' ... -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- The script replaces "ssXss" by "ssüss" once mount is done. Corresponding source in kernel-4.18.0-425.10.1.el8_7.x86_64 (we are interested in hacking "devname" variable to report something not UTF8 in /proc/mounts): -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 475 static int cifs_show_devname(struct seq_file *m, struct dentry *root) 476 { 477 struct cifs_sb_info *cifs_sb = CIFS_SB(root->d_sb); 478 char *devname = kstrdup(cifs_sb->ctx->source, GFP_KERNEL); 479 480 if (devname == NULL) 481 seq_puts(m, "none"); 482 else { 483 convert_delimiter(devname, '/'); 484 /* escape all spaces in share names */ 485 seq_escape(m, devname, " \t"); 486 kfree(devname); 487 } 488 return 0; 489 } -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- #################### REPRODUCING STEPS: #################### 1. Check there is some "subscribed" client in systemd (i.e. systemd-logind) -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # gdb -ex "set pagination off" -ex "set confirm off" -ex "f 3" -ex "p m->subscribed" -ex "q" /usr/lib/systemd/systemd 1 [...] #3 0x0000563e699efb62 in manager_loop (m=m@entry=0x563e6a06ae50) at ../src/core/manager.c:2898 2898 r = sd_event_run(m->event, wait_usec); $1 = (sd_bus_track *) 0x563e6a238130 -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 2. Start systemtap and let it run -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # stap -v -g -e 'probe module("cifs").statement("cifs_show_devname@fs/cifs/cifsfs.c:480") { $devname[19] = 129; printf("new devname: %s\n", kernel_string_quoted($devname)); }' ... Pass 5: starting run. -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 3. Mount the local CIFS share -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # mount /mnt -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Systemtap will report: new devname: "//localhost/cifs/ss\201ss" showing it's hacking the output as planned. 4. Login on the system as a new user (to trigger some dbus traffic AND new mounts due to the new user) -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- $ ssh admin@localhost true -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 5. Check systemd got disconnected from the bus (it has no "subscribed" clients anymore) -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # gdb -ex "set pagination off" -ex "set confirm off" -ex "f 3" -ex "p m->subscribed" -ex "q" /usr/lib/systemd/systemd 1 ... #3 0x0000563e699efb62 in manager_loop (m=m@entry=0x563e6a06ae50) at ../src/core/manager.c:2898 2898 r = sd_event_run(m->event, wait_usec); $1 = (sd_bus_track *) 0x0 -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- At step 5 we see the leaks starts happening, sessions in /run/systemd/sessions are not destroyed anymore since systemd-logind doesn't receive notifications anymore: # ssh admin@localhost true # ssh admin@localhost true # ssh admin@localhost true # ll /run/systemd/sessions/ total 24 -rw-r--r-- 1 root root 322 Jan 20 15:49 1 -rw-r--r-- 1 root root 322 Jan 20 15:49 2 -rw-r--r-- 1 root root 306 Jan 20 15:54 5 -rw-r--r-- 1 root root 252 Jan 20 15:56 6 -rw-r--r-- 1 root root 252 Jan 20 15:56 7 -rw-r--r-- 1 root root 252 Jan 20 15:56 8
fix merged to github main branch -> https://github.com/redhat-plumbers/systemd-rhel8/pull/389