Bug 1967962

Summary: Annoying error message "Failed to create session: Start job for unit user-0.slice failed with 'canceled'" when sudoing rapidly
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: systemdAssignee: David Tardon <dtardon>
Status: CLOSED CURRENTRELEASE QA Contact: Frantisek Sumsal <fsumsal>
Severity: high Docs Contact:
Priority: high    
Version: 8.4CC: broose, chorn, dtardon, msekleta, qguo, systemd-maint-list
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-18 14:42:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Renaud Métrich 2021-06-04 14:32:40 UTC
Description of problem:

When performing a lot of quick sudo sessions in a row, the following message is seen in the journal at Error level:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[...] sudo[42836]: pam_systemd(sudo:session): Failed to create session: Start job for unit user-0.slice failed with 'canceled'
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

This message should be logged at Debug level only (or even removed) because actually there is **no issue at all**.

To do this, systemd-logind needs to be smarter:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
2744 static int session_jobs_reply(Session *s, const char *unit, const char *result) {
 :
2753         if (streq(result, "done"))
2754                 r = session_send_create_reply(s, NULL);
2755         else {
2756                 _cleanup_(sd_bus_error_free) sd_bus_error e = SD_BUS_ERROR_NULL;
2757 
2758                 sd_bus_error_setf(&e, BUS_ERROR_JOB_FAILED, "Start job for unit %s failed with '%s'", unit, result);
2759                 r = session_send_create_reply(s, &e);
2760         }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

On line 2758, it should test if "result" is "canceled" or not and ignore the latter.


Version-Release number of selected component (if applicable):

systemd-239


How reproducible:

Very easily

Steps to Reproduce:
1. Create a script that sudo'es repeatedly

  /usr/local/bin/sudo_loop:
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  #!/bin/sh
  
  while :; do
  	sudo touch /tmp/foo
  	sudo rm /tmp/foo
  	sleep 0.1
  done
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

2. Make sure there is no root session and start the script as a service from an admin account

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  $ sudo systemctl stop user-0.slice
  $ sudo systemd-run --uid=user --unit=sudo_loop.service /usr/local/bin/sudo_loop
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

3. Monitor the journal from the admin account

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  $ journalctl --follow | tee journal.txt | grep "Failed to create session"
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------


Actual results:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Jun 04 16:24:02 vm-rhel8 sudo[42836]: pam_systemd(sudo:session): Failed to create session: Start job for unit user-0.slice failed with 'canceled'
Jun 04 16:24:03 vm-rhel8 sudo[42861]: pam_systemd(sudo:session): Failed to create session: Start job for unit user-0.slice failed with 'canceled'
Jun 04 16:24:04 vm-rhel8 sudo[42945]: pam_systemd(sudo:session): Failed to create session: Start job for unit user-0.slice failed with 'canceled'
Jun 04 16:24:05 vm-rhel8 sudo[42999]: pam_systemd(sudo:session): Failed to create session: Start job for unit user-0.slice failed with 'canceled'
Jun 04 16:24:06 vm-rhel8 sudo[43025]: pam_systemd(sudo:session): Failed to create session: Start job for unit user-0.slice failed with 'canceled'
Jun 04 16:24:06 vm-rhel8 sudo[43049]: pam_systemd(sudo:session): Failed to create session: Start job for unit user-0.slice failed with 'canceled'
 :
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Expected results:

No error message at all (there is no issue at all ...).

Additional info:

I tested the suggestion in BZ #1892833 (build http://people.redhat.com/dtardon/systemd/bz1642460-backport-UserStopDelaySec=/) but this doesn't help.

Comment 1 David Tardon 2021-06-17 11:12:28 UTC
git bisect shows this is fixed by https://github.com/systemd/systemd/pull/9824/commits/25a1ab4ed48b72e974f77a68dcbe3521014787bb . If it still fails with http://people.redhat.com/dtardon/systemd/bz1642460-backport-UserStopDelaySec=/, then I've either made a mistake in the backport or missed an additional PR it relies upon.

Comment 2 Christian Horn 2022-12-07 11:40:03 UTC
- I reproduced https://bugzilla.redhat.com/show_bug.cgi?id=1967962#c0 on rhel8.2GA
- reproduces on rhel8.3 systemd, but not on rhel8.4 systemd and later.
 
8.2 systemd-239-29 issue
8.3 systemd-239-40 issue
8.4 systemd-239-45 fixed
8.5 systemd-239-51 fixed

I see that commit https://github.com/systemd/systemd/pull/9824/commits/25a1ab4ed48b72e974f77a68dcbe3521014787bb
has quite some code changes, and also 239-51 does not have all the changes.
I vote for declaring that systemd-239-45 have the fixes, and closing this bz as CURRENT.

Opinions?

Comment 3 Christian Horn 2022-12-08 06:05:35 UTC
Concluding this is fixed in rhel8.5/systemd-239-51.el8.
bz1892833 is also around this issue.

Comment 4 David Tardon 2023-05-18 14:42:10 UTC
(In reply to Christian Horn from comment #3)
> Concluding this is fixed in rhel8.5/systemd-239-51.el8.

Let's close it then.