Bug 2047187

Summary:	[spec] user slice unit can fail on logout - invalid unit
Product:	Red Hat Enterprise Linux 8	Reporter:	Steve Traylen <steve.traylen>
Component:	systemd	Assignee:	David Tardon <dtardon>
Status:	CLOSED MIGRATED	QA Contact:	Frantisek Sumsal <fsumsal>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	CentOS Stream	CC:	bstinson, djuarezg, dtardon, jwboyer, mezhang, msekleta, systemd-maint-list
Target Milestone:	rc	Keywords:	Improvement, MigratedToJIRA, Triaged
Target Release:	---	Flags:	pm-rhel: mirror+
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-09-21 11:27:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Steve Traylen 2022-01-27 11:13:22 UTC

Description of problem:

A user slice unit can go into a failed state:

# systemctl status user
● user - User Manager for UID 12345
   Loaded: loaded (/usr/lib/systemd/system/user@.service; static; vendor preset: disabled)
   Active: failed (Result: timeout) since Thu 2022-01-27 11:43:46 CET; 16min ago
  Process: 3691600 ExecStart=/usr/lib/systemd/systemd --user (code=killed, signal=KILL)
 Main PID: 3691600 (code=killed, signal=KILL)
   Status: "Startup finished in 171ms."

Jan 27 09:38:20 host.example.org systemd[3691600]: Started Mark boot as successful.
Jan 27 11:41:46 host.example.org systemd[1]: Stopping User Manager for UID 12345...
Jan 27 11:41:46 host.example.org systemd[3691600]: /usr/lib/systemd/user/systemd-exit.service:16: Failed to parse failure action specifier, ignoring: exit-force
Jan 27 11:41:46 host.example.org systemd[3691600]: systemd-exit.service: Service lacks both ExecStart= and ExecStop= setting. Refusing.
Jan 27 11:41:46 host.example.org systemd[3691600]: Failed to enqueue exit.target job: Unit systemd-exit.service has a bad unit file setting.
Jan 27 11:43:46 host.example.org systemd[1]: user: State 'stop-sigterm' timed out. Killing.
Jan 27 11:43:46 host.example.org systemd[1]: user: Killing process 3691600 (systemd) with signal SIGKILL.
Jan 27 11:43:46 host.example.org systemd[1]: user: Killing process 3691623 (krenew) with signal SIGKILL.
Jan 27 11:43:46 host.example.org systemd[1]: user: Failed with result 'timeout'.
Jan 27 11:43:46 host.example.org systemd[1]: Stopped User Manager for UID 12345.



In particular that " Failed to parse failure ac" line looks bad.

/usr/lib/systemd/user/systemd-exit.service

changed with this version of the package "systemd-239-55.el8.x86_64"

It was:
[Unit]
Description=Exit the Session
Documentation=man:systemd.special(7)
DefaultDependencies=no
Requires=shutdown.target
After=shutdown.target

[Service]
Type=oneshot
ExecStart=/usr/bin/systemctl --force exit

and is now

[Unit]
Description=Exit the Session
Documentation=man:systemd.special(7)
DefaultDependencies=no
Requires=shutdown.target
After=shutdown.target
SuccessAction=exit-force

Looks to be: 

https://github.com/systemd/systemd/commit/a400bd8c2a6285576edf8e2147e1d17aab129501

Version-Release number of selected component (if applicable):

systemd-239-55.el8.x86_64

How reproducible:

Tricky. I have not managed to recreate this on demand.

I do see it often for many users.

Steps to Reproduce:
1. User logins
2. User logouts - I don't know how, will try to find out.
3. User slice can go bad.

Actual results:

Bad user slice as above.


Expected results:

User slice should close cleanly.
Additional info:

Comment 1 Steve Traylen 2022-01-27 15:56:58 UTC

This is probably user sessions that were started before systemd was upgraded.

Can the existing systemd --user instances be respawned by calling `systemctl --user daemon-reexec` or
something on package upgrade.

Comment 2 David Tardon 2022-02-07 10:11:44 UTC

(In reply to Steve Traylen from comment #1)
> This is probably user sessions that were started before systemd was upgraded.

Yes.
 
> Can the existing systemd --user instances be respawned by calling `systemctl
> --user daemon-reexec` or
> something on package upgrade.

Doing it for all active user instances is not that simple. Unless I'm missing something, we'd have to do something like the following (which probably works, but it isn't pretty):

for u in $(systemctl show -P User user@*); do
    runuser $(id -un $u) -c 'systemctl --user daemon-reexec'
done

Comment 3 David Tardon 2022-05-17 08:58:43 UTC

*** Bug 2086989 has been marked as a duplicate of this bug. ***

Comment 4 David Tardon 2022-05-19 14:14:05 UTC

Actually there is a simpler way to reexec all user managers than the one I proposed in comment 2:

systemctl kill -s SIGRTMIN+25 $(systemctl show -P Id user@*)

Comment 5 David Tardon 2022-05-23 13:44:25 UTC

The root cause for this is a change in the way exit from an user session is done that happened between systemd-239-43 and 239-44. But the new way--use of SuccessAction=exit-force in systemd-exit.service--is only recognized by updated systemd. Because we don't reexec user instances on update, any such instance that had been started before systemd was updated still runs old systemd. Hence the user session fails to exit itself and is eventually killed by timeout. But there's practically no harm from this: only the timeout and the user instance being in failed state. It doesn't affect the system's operation in any way. The user can log in again without any problem--and will be running the updated systemd if they do, thus the issue won't happen again.

Comment 6 RHEL Program Management 2023-09-21 11:23:53 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 7 RHEL Program Management 2023-09-21 11:27:39 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.