Bug 2231597

Summary: reboot.target times out when having lingering users
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: systemdAssignee: systemd maint <systemd-maint>
Status: NEW --- QA Contact: Frantisek Sumsal <fsumsal>
Severity: high Docs Contact:
Priority: high    
Version: 8.8CC: hartsjc, qguo, systemd-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Renaud Métrich 2023-08-12 10:00:05 UTC
Description of problem:

We have a customer seeing constantly the reboot.target unit time out after 30 minutes when having lingering users.
During the 30 minutes, we can see a dead loop of DBus communication between systemd-logind and systemd related to the lingering user unit "user" collecting.

I cannot reproduce the issue during reboot, but it's very easily reproducible by amending "user-runtime-dir" to not stop automatically, by setting "StopWhenUnneeded=false", then stopping "user" unit from the shell.
This makes the following happen:

1. *systemd* stops "user", which sends a "UnitRemoved" message among others

2. *systemd-logind* enqueues the user in the garbage collection queue, then queries the state of "user" ("ActiveState" message)

   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
   840   10:05:05.397060 sendmsg(12<UNIX:[23940->22913]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\0014\0\0\0\306\r\0\0\250\0\0\0\1\1o\0004\0\0\0/org/freedesktop/systemd1/unit/user_401001_2eservice\0\0\0\0\3\1s\0\3\0\0\0Get\0\0\0\0\0\2\1s\0\37\0\0\0org.freedesktop.DBus.Properties\0\6\1s\0\30\0\0\0org.freedesktop.systemd1\0\0\0\0\0\0\0\0\10\1g\0\2ss\0", iov_len=184}, {iov_base="\35\0\0\0org.freedesktop.systemd1.Unit\0\0\0\v\0\0\0ActiveState\0", iov_len=52}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 236 <0.000036>
   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

3. *systemd* answers by `inactive` but also sends messages "UnitNew/UnitRemoved"

   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
   1     10:05:05.402891 sendmsg(45<UNIX:[22545->22821]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\2\1\1\21\0\0\0r\34\0\0\37\0\0\0\5\1u\0\306\r\0\0\6\1s\0\4\0\0\0:1.5\0\0\0\0\10\1g\0\1v\0\0", iov_len=48}, {iov_base="\1s\0\0\10\0\0\0inactive\0", iov_len=17}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 65 <0.000029>
   1     10:05:05.402994 sendmsg(40<UNIX:[19275->12637]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="PRIORITY=7\nSYSLOG_FACILITY=3\nCODE_FILE=../src/core/manager.c\nCODE_LINE=1181\nCODE_FUNC=manager_dispatch_gc_unit_queue\nUNIT=user\nSYSLOG_IDENTIFIER=systemd\n", iov_len=166}, {iov_base="MESSAGE=", iov_len=8}, {iov_base="user: Collecting.", iov_len=30}, {iov_base="\n", iov_len=1}], msg_iovlen=4, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL <unfinished ...>
   1     10:05:05.403073 sendmsg(45<UNIX:[22545->22821]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1Q\0\0\0s\34\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", iov_len=128}, {iov_base="\21\0\0\0user\0\0\0004\0\0\0/org/freedesktop/systemd1/unit/user_401001_2eservice\0", iov_len=81}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
   1     10:05:05.403192 sendmsg(45<UNIX:[22545->22821]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1Q\0\0\0t\34\0\0x\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\v\0\0\0UnitRemoved\0\0\0\0\0\10\1g\0\2so\0", iov_len=136}, {iov_base="\21\0\0\0user\0\0\0004\0\0\0/org/freedesktop/systemd1/unit/user_401001_2eservice\0", iov_len=81}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL <unfinished ...>
   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

4. *systemd-logind* then queries the state of "user-runtime-dir"

   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
   840   10:05:05.403202 sendmsg(12<UNIX:[23940->22913]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\0014\0\0\0\307\r\0\0\270\0\0\0\1\1o\0D\0\0\0/org/freedesktop/systemd1/unit/user_2druntime_2ddir_401001_2eservice\0\0\0\0\3\1s\0\3\0\0\0Get\0\0\0\0\0\2\1s\0\37\0\0\0org.freedesktop.DBus.Properties\0\6\1s\0\30\0\0\0org.freedesktop.systemd1\0\0\0\0\0\0\0\0\10\1g\0\2ss\0", iov_len=200}, {iov_base="\35\0\0\0org.freedesktop.systemd1.Unit\0\0\0\v\0\0\0ActiveState\0", iov_len=52}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 252 <0.000047>
   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

   This is due to having the following code for user_unit_active() function (line 554):
   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
   546 static bool user_unit_active(User *u) {
    :
   554         FOREACH_STRING(i, u->service, u->runtime_dir_service, u->slice) {
   555                 r = manager_unit_is_active(u->manager, i);
   556                 if (r < 0)
   557                         log_debug_errno(r, "Failed to determine whether unit '%s' is active, ignoring", u->service);
   558                 if (r != 0)
   559                         return true;
   560         }
    :
   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

5. *systemd* this time answers it's active, since "StopWhenUnneeded=false" was set for the reproducer

   During the shutdown, the same thing is happening since the runtime dir is still there.

   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
   1     10:05:05.403527 sendmsg(45<UNIX:[22545->22821]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\2\1\1\17\0\0\0u\34\0\0\37\0\0\0\5\1u\0\307\r\0\0\6\1s\0\4\0\0\0:1.5\0\0\0\0\10\1g\0\1v\0\0", iov_len=48}, {iov_base="\1s\0\0\6\0\0\0active\0", iov_len=15}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 63 <0.000028>
   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

6. Because the unit is active, *systemd-logind* won't garbage collect the user

   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
   565 bool user_may_gc(User *u, bool drop_not_started) {
    :
   584         /* Is this a user that shall stay around forever ("linger")? Before we say "no" to GC'ing for lingering users, let    's check
   585          * if any of the three units that we maintain for this user is still around. If none of them is,
   586          * there's no need to keep this user around even if lingering is enabled. */
   587         if (user_check_linger_file(u) > 0 && user_unit_active(u))
   588                 return false;
   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

7. *systemd-logind* then processes the 3rd message from *systemd* about unit "user" (the "UnitRemoved" message in step 3)

   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
   840   10:05:05.403400 <... recvmsg resumed>{msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\v\0\0\0UnitRemoved\0\0\0\0\0\10\1g\0\2so\0\7\1s\0\4\0\0\0:1.0\0\0\0\0\21\0\0\0user\0\0\0004\0\0\0/org/freedesktop/systemd1/unit/user_401001_2eservice\0", iov_len=209}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 209 <0.000007>
   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

   This makes *systemd-logind* enqueues the user into the garbage collection queue again, causing step 2 to occur again in loop.

The issue is present in Fedora 38 as well (systemd-253.7-1.fc38.x86_64).

The customer proposes to change the order of the for loop on line 554 as shown below:

   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
   546 static bool user_unit_active(User *u) {
    :
   554         FOREACH_STRING(i, u->slice, u->runtime_dir_service, u->service) {
   555                 r = manager_unit_is_active(u->manager, i);
   556                 if (r < 0)
   557                         log_debug_errno(r, "Failed to determine whether unit '%s' is active, ignoring", u->service);
   558                 if (r != 0)
   559                         return true;
   560         }
    :
   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

This change fixes the issue because the slice will be up, causing no "UnitNew/UnitRemoved" to be sent by *systemd* (step 3).
Additionally, systemd-logind will have less job to do to determine whether the user is still active or not, since the slice is the last thing being taking down.
This change won't change the logic.

Version-Release number of selected component (if applicable):

systemd-253.7-1.fc38.x86_64 (Fedora 38)
systemd-239 (RHEL8)

How reproducible:

Always

Steps to Reproduce:
1. Create a lingering user

   # useradd lingering
   # loginctl enable-linger lingering

2. Change "StopWhenUnneeded=true" into "StopWhenUnneeded=false" in user-runtime-dir unit

   # systemctl edit user-runtime-dir@$(id -u lingering).service --full
   ...

3. Reboot

4. Stop the user unit

   # systemctl stop user@$(id -u lingering).service

Actual results:

"Deadloop" between systemd-logind and systemd, systemd taking a lot of CPU forever:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                
      1 root      20   0  172532  10484   7960 R  60.0   0.3   1:01.52 systemd                                                
    690 root      20   0  310892 157412 156292 R  40.0   4.1   0:32.78 systemd-journal                                        
    810 dbus      20   0   56564   5304   4564 R  26.7   0.1   0:19.90 dbus-daemon                                            
   3310 root      20   0  140504  54696   7248 S  20.0   1.4   0:05.70 systemd-logind                                         

Expected results:

No CPU looping.
Quick reboot when issue happens on reboot (cannot reproduce for now).