Description of problem: When executing a service as a user, using the User=<userid> directive in the service unit (except User=root), the output of the program is not having the _SYSTEMD_UNIT attribute, causing the output not to be linked to the service unit. Example: /usr/local/bin/dummy.sh: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- #!/bin/bash sleep 5 echo "testing $0 with PID $$" -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- /etc/systemd/system/dummy_user.service: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- [Service] Type=forking ExecStart=/bin/bash -c '/usr/local/bin/dummy.sh &' User=local1 -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- --> FAILURE (no _SYSTEMD_UNIT journal attribute to link to the unit) # journalctl -u dummy_user -- Logs begin at Thu 2019-02-14 16:40:42 CET, end at Thu 2019-02-14 17:14:52 CET. -- Feb 14 17:14:43 vm-rhel7 systemd[1]: Starting dummy_user.service... Feb 14 17:14:43 vm-rhel7 systemd[1]: Started dummy_user.service. Version-Release number of selected component (if applicable): systemd-219-62.el7_6.3.x86_64 How reproducible: Always Steps to Reproduce: 1. See above service unit Additional info: Same behaviour on RHEL8-beta (systemd-239-11.el8.x86_64) When User=root, there is no issue: /etc/systemd/system/dummy_root.service: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- [Service] Type=forking ExecStart=/bin/bash -c '/usr/local/bin/dummy.sh &' User=root -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- --> OK # journalctl -u dummy_root -- Logs begin at Thu 2019-02-14 16:40:42 CET, end at Thu 2019-02-14 17:14:52 CET. -- Feb 14 17:14:45 vm-rhel7 systemd[1]: Starting dummy_root.service... Feb 14 17:14:45 vm-rhel7 systemd[1]: Started dummy_root.service. Feb 14 17:14:50 vm-rhel7 bash[7324]: testing /usr/local/bin/dummy.sh with PID 7325 When runuser is used, there is no issue either: /etc/systemd/system/dummy_runuser.service: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- [Service] Type=forking ExecStart=/sbin/runuser local1 /bin/bash -c '/usr/local/bin/dummy.sh &' -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- --> OK # journalctl -u dummy_runuser -- Logs begin at Thu 2019-02-14 16:40:42 CET, end at Thu 2019-02-14 17:14:52 CET. -- Feb 14 17:14:47 vm-rhel7 systemd[1]: Starting dummy_runuser.service... Feb 14 17:14:47 vm-rhel7 runuser[7344]: pam_unix(runuser:session): session opened for user local1 by (uid Feb 14 17:14:47 vm-rhel7 systemd[1]: Started dummy_runuser.service. Feb 14 17:14:52 vm-rhel7 runuser[7344]: testing /usr/local/bin/dummy.sh with PID 7346
That looks again like the problem with the short-lived processes, that the message reaches to journal after the service is already dead. If you add a sleep after the echo, does it fix the issue?
Can't test now (leaving for vacations) but doesn't happen with "runuser" and User=root, so I doubt it's related to short-living processes.
Isn't this a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1658115? I believe we tested it on RHEL-7.6 and it worked there, but I might be wrong.
Hi Jan, I don't know if it's BZ 1658115. The PR is for RHEL8 and code changed quite a lot.
Ok, we (I) need to investigate this a bit more thoroughly.
It's clearly not related to BZ #1658115. After setting some GDB breakpoints, I can see the message goes through this (line numbers from latest RHEL7.6): -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 310 static int stdout_stream_line(StdoutStream *s, char *p, LineBreak line_break) { ... 338 case STDOUT_STREAM_UNIT_ID: 339 if (s->ucred.uid == 0) { 340 if (isempty(p)) 341 s->unit_id = NULL; 342 else { 343 s->unit_id = strdup(p); 344 if (!s->unit_id) 345 return log_oom(); 346 } 347 } 348 349 s->state = STDOUT_STREAM_PRIORITY; 350 return 0; -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- On line 339, when unit is executed by the user, ucred.uid != 0, so s->unit_id remains to NULL (initialized as such in caller), whereas when executed by root (or through "runuser"), ucred.uid == 0, so s->unit_id is set to "p" which is the unit name (e.g. "dummy_root.service"). It's unclear to me why there is that restriction on line 339. RHEL8 code is similar.
Code introduced in commit 62bca2c657bf95fd1f69935eef09915afa5c69d9: journal: set the _SYSTEMD_UNIT field for messages from terminated processes As described in https://bugs.freedesktop.org/show_bug.cgi?id=50184 the journal currently doesn't set fields such as _SYSTEMD_UNIT properly for messages coming from processes that have already terminated. This means among other things that "systemctl status" may not show some of the output of services that wrote messages just before they exited. This patch fixes this by having processes that log to the journal write their unit identifier to journald when the connection to /run/systemd/journal/stdout is opened. Journald stores the unit ID and uses it to fill in _SYSTEMD_UNIT when it cannot be obtained normally (i.e. from the cgroup). To prevent impersonating another unit, this information is only used when the caller is root. This doesn't fix the general problem of getting metadata about messages from terminated processes (which requires some kernel support), but it allows "systemctl status" and similar queries to do the Right Thing for units that log via stdout/stderr. "To prevent impersonating another unit, this information is only used when the caller is root." --> This is why only units running as root get their messages tagged with the unit name.
Again, this can be solved by implementing BZ #1117877
I'm going to move this bug to the RHEL 8 release to match the (somewhat) associated 1117877 tracker. We aren't really going to be able to target this effort in RHEL 7 based on the current point in the lifecycle.
*** Bug 1919275 has been marked as a duplicate of this bug. ***
*** Bug 2061689 has been marked as a duplicate of this bug. ***
*** Bug 1852845 has been marked as a duplicate of this bug. ***
This is still a factor for me in RHEL 9.1. It would be great to see this fixed, enabling cost savings through filtering unnecessary logspam.