Bug 1946453

Summary: Backport upstream commit related to user-runtime-dir@.service
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: systemdAssignee: Jacek Migacz <jmigacz>
Status: CLOSED ERRATA QA Contact: Frantisek Sumsal <fsumsal>
Severity: high Docs Contact:
Priority: high    
Version: 8.3CC: dtardon, dwojewod, fsumsal, hartsjc, jjaburek, miabbott, msekleta, pkhedeka, qguo, rdieter, sbroz, systemd-maint-list, wdh
Target Milestone: rcKeywords: Bugfix, Triaged, ZStream
Target Release: ---Flags: pm-rhel: mirror+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: systemd-239-48.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1990403 (view as bug list) Environment:
Last Closed: 2021-11-09 19:56:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1990403    

Description Renaud Métrich 2021-04-06 07:40:51 UTC
Description of problem:

We need to backport commit d06e8fbce35c2b52ee1d09af4888876d5f2d7ae4 related to user-runtime-dir@.service.
This commit makes sure the user-runtime-dir@.service is really done before starting "systemd --user", which is really required or else race conditions happen leading the "systemd --user" failure.


Version-Release number of selected component (if applicable):

systemd-239-41.el8_3.1.x86_64


How reproducible:

Always using a systemtap script

Steps to Reproduce:

1. Create delay_mount_runtimepath.stp that will delay the mount of /run/user/<UID> operation (here below UID 101166)

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  probe syscall.mount.call {
  	if (argstr =~ "\"/run/user/101166\"") {
  		delay = 1500 + randint(150)
  		printf("PID %lu (%s) entered '%s(%s)', delaying call by %ld\n", pid(), execname(), name, argstr, delay)
  		mdelay(delay)
  	}
  }
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

2. Execute the stap script (requires Guru mode)

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  # yum -y install systemtap
  # stap-prep
  # stap -g -v ./delay_mount_runtimepath.stp 
  [...]
  Pass 5: starting run.
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

3. Log in through ssh as the target user

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  # ssh rmetrich@localhost true
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------


Actual results:

- Stap output:

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  PID 4281 (systemd-user-ru) entered 'mount("tmpfs", "/run/user/101166", "tmpfs", MS_NOSUID|MS_NODEV, "mode=0700,uid=101166,gid=101166,size=190722048")', delaying call by 1575
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

- Journal:

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  Apr 06 08:57:17.748899 vm-sssd8 systemd[3994]: pam_unix(systemd-user:session): session opened for user rmetrich by (uid=0)
  Apr 06 08:57:19.284443 vm-sssd8 systemd[3994]: Failed to fully start up daemon: Permission denied
  Apr 06 08:57:19.285148 vm-sssd8 systemd[3995]: pam_unix(systemd-user:session): session closed for user rmetrich
  Apr 06 08:57:19.285853 vm-sssd8 systemd[1]: user: Failed with result 'protocol'.
  Apr 06 08:57:19.286015 vm-sssd8 systemd[1]: Failed to start User Manager for UID 101166.
  Apr 06 08:57:19.286443 vm-sssd8 systemd[1]: Stopping /run/user/101166 mount wrapper...
  Apr 06 08:57:19.287687 vm-sssd8 sshd[3991]: pam_systemd(sshd:session): Failed to create session: Start job for unit user failed with 'failed'
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Expected results:

  No error spawning "systemd --user"

Additional info:

  See also BZ #1945569 for hardening this.

Comment 1 Renaud Métrich 2021-05-17 15:03:27 UTC
We have a few customers hitting this now, apparently using lingering helps hit the issue (probably because of the activity during boot).

Comment 3 Renaud Métrich 2021-05-26 18:42:33 UTC
I think QE can use the reproducer I provided as the test case.

Comment 4 Lukáš Nykrýn 2021-06-22 12:27:17 UTC
fix merged to github master branch -> https://github.com/redhat-plumbers/systemd-rhel8/pull/179

Comment 9 Micah Abbott 2021-06-28 19:00:39 UTC
Requesting a backport to 8.4.z

The RHCOS integration tests are hitting a failure similar to what is described here and would be interested in getting this change into 8.4.z

See https://bugzilla.redhat.com/show_bug.cgi?id=1914362 and https://github.com/coreos/coreos-assembler/pull/2261 for additional context

Comment 10 Jiri Jaburek 2021-07-03 12:09:01 UTC
I've also seen

systemd: user-runtime-dir: Failed with result 'start-limit-hit'.
systemd: Failed to start /run/user/5678 mount wrapper.
systemd: Dependency failed for User Manager for UID 5678.
systemd: user: Job user/start failed with result 'dependency'.
sshd: pam_systemd(sshd:session): Failed to create session: Start job for unit user failed with 'dependency'

on a minimal install of RHEL-8.4.0 @ s390x, which seems to be related to this BZ.

(5678 is an unprivileged UID)

Comment 11 Michal Sekletar 2021-07-31 19:59:21 UTC
Approved z-stream request for RHEL-8.4-z.

Comment 18 errata-xmlrpc 2021-11-09 19:56:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (systemd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4469