Bug 1945569
| Summary: | systemd --user fails to start when /run/user/<UID> is not a mount point | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Renaud Métrich <rmetrich> |
| Component: | systemd | Assignee: | Michal Sekletar <msekleta> |
| Status: | CLOSED MIGRATED | QA Contact: | Frantisek Sumsal <fsumsal> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 8.3 | CC: | msekleta, nobody+366555, qguo, systemd-maint-list |
| Target Milestone: | rc | Keywords: | MigratedToJIRA, Triaged |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-09-21 11:09:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I experience the same on some of my systems. Other RHEL8.3 systems with same systemd version are somehow not affected. At the moment, I don't see a cause or consistency. Hi Benjamin, Can you reproduce regularly on one of your system? It may be related to BZ #1946453. If you can reproduce regularly, please apply the solution in KCS https://access.redhat.com/solutions/5931241. Renaud. Yes, the issue occurs after applying KCS https://access.redhat.com/solutions/5931241 (without reboot/reloading systemd/resetting failed states) to some users on a specific production machine. I don't know, if I find the time to apply the stap script in https://bugzilla.redhat.com/show_bug.cgi?id=1946453 to verify further and try to apply the workaround on a engineering machine which has this occurence also. Well you need to reload systemd for change to take effect, and of course clear failed units. my communication here was unfortunate, I meant - reloaded systemd - resetted failed states -> i did not a reboot (its not suggested in de KCS, but I wanted to clarify that.) Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |
Description of problem: A customer can see sometimes the following message in the journal: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- systemd[1]: Starting User Manager for UID XXX... systemd[78237]: pam_unix(systemd-user:session): session opened for user XXXUSERNAME by (uid=0) systemd[78237]: Failed to fully start up daemon: Permission denied systemd[1]: user: Failed with result 'protocol'. systemd[1]: Failed to start User Manager for UID XXX. -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- This leads to tagging the user@.service for that user as Failed, causing a report to be shown in their monitoring tool. James was able to reproduce this as well in real conditions. I wasn't, however I was able to reproduce by creating the directory /run/user/UID manually, as root, similar to what /usr/lib/systemd/systemd-user-runtime-dir does when a session gets created. Normally, /usr/lib/systemd/systemd-user-runtime-dir creates the directory, then mounts a TMPFS filesystem on top of it, which permissions set for the user. For some (still unknown) reasons, we suspect that /run/user/UID doesn't get deleted at the time a new session comes in, causing the issue above, since /run/user/UID is owned by root, always. This is because /usr/lib/systemd/systemd-user-runtime-dir doesn't do anything when it detects that /run/user/UID is already a mount point, in particular doesn't mount the TMPFS filesystem, hence the permissions are not correct for "systemd --user" to be able to create its directories later. Digging into the code, I can see that systemd-user-runtime-dir relies on path_is_mount_point() to find out if it's a mount point: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 33 static int user_mkdir_runtime_path(const char *runtime_path, uid_t uid, gid_t gid, size_t runtime_dir_size) { 34 int r; : 45 if (path_is_mount_point(runtime_path, NULL, 0) >= 0) 46 log_debug("%s is already a mount point", runtime_path); 47 else { 48 char options[sizeof("mode=0700,uid=,gid=,size=,smackfsroot=*") : -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- But that code doesn't work, it's indeed enough to have a top directory (e.g. "/run") be a mount point for condition on line 45 to be true and falling into line 46 and doing nothing. On RHEL systems, "/run" *is* a mount point, so it's always true. In that exact scenario, we would need to check that "/run/user/UID" **is** a mount point itself, and not a parent would be a mount point. Comparing with Upstream code, I don't see a difference, the code is similarly buggy. However the failing condition on "systemd --user" is not hit because there is a test made in pam_systemd, which then avoids spawning the systemd user instance: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- systemd[20313]: pam_systemd(systemd-user:session): Runtime directory '/run/user/101166' is not owned by UID 101166, as it should. systemd[20313]: pam_systemd(systemd-user:session): Not setting $XDG_RUNTIME_DIR, as the directory is not in order. -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Version-Release number of selected component (if applicable): systemd-239-41.el8_3.1.x86_64 How reproducible: Always Steps to Reproduce: 1. create directory as root # mkdir -p /run/user/$(id -u rmetrich) 2. ssh to the system # ssh rmetrich@localhost true Actual results: systemd --user fails Expected results: no failure, and /run/user/rmetrich a TMPFS mountpoint