Bug 1677370

Summary: Program output not linked to service unit in the journal when service executes as a user
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: systemdAssignee: systemd maint <systemd-maint>
Status: NEW --- QA Contact: Frantisek Sumsal <fsumsal>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.6CC: ahogbin, chris, d.perry, dtardon, kwalker, smahanga, systemd-maint-list, systemd-maint
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: rmetrich: needinfo? (systemd-maint)
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1117877    
Bug Blocks: 1716963, 1719445    

Description Renaud Métrich 2019-02-14 16:16:39 UTC
Description of problem:

When executing a service as a user, using the User=<userid> directive in the service unit (except User=root), the output of the program is not having the _SYSTEMD_UNIT attribute, causing the output not to be linked to the service unit.

Example:

/usr/local/bin/dummy.sh:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
#!/bin/bash
sleep 5
echo "testing $0 with PID $$"
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

/etc/systemd/system/dummy_user.service:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[Service]
Type=forking
ExecStart=/bin/bash -c '/usr/local/bin/dummy.sh &'
User=local1
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

--> FAILURE (no _SYSTEMD_UNIT journal attribute to link to the unit)

# journalctl -u dummy_user
-- Logs begin at Thu 2019-02-14 16:40:42 CET, end at Thu 2019-02-14 17:14:52 CET. --
Feb 14 17:14:43 vm-rhel7 systemd[1]: Starting dummy_user.service...
Feb 14 17:14:43 vm-rhel7 systemd[1]: Started dummy_user.service.


Version-Release number of selected component (if applicable):

systemd-219-62.el7_6.3.x86_64


How reproducible:

Always


Steps to Reproduce:
1. See above service unit


Additional info:

Same behaviour on RHEL8-beta (systemd-239-11.el8.x86_64)


When User=root, there is no issue:


/etc/systemd/system/dummy_root.service:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[Service]
Type=forking
ExecStart=/bin/bash -c '/usr/local/bin/dummy.sh &'
User=root
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

--> OK

# journalctl -u dummy_root
-- Logs begin at Thu 2019-02-14 16:40:42 CET, end at Thu 2019-02-14 17:14:52 CET. --
Feb 14 17:14:45 vm-rhel7 systemd[1]: Starting dummy_root.service...
Feb 14 17:14:45 vm-rhel7 systemd[1]: Started dummy_root.service.
Feb 14 17:14:50 vm-rhel7 bash[7324]: testing /usr/local/bin/dummy.sh with PID 7325



When runuser is used, there is no issue either:

/etc/systemd/system/dummy_runuser.service:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[Service]
Type=forking
ExecStart=/sbin/runuser local1 /bin/bash -c '/usr/local/bin/dummy.sh &'
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

--> OK

# journalctl -u dummy_runuser
-- Logs begin at Thu 2019-02-14 16:40:42 CET, end at Thu 2019-02-14 17:14:52 CET. --
Feb 14 17:14:47 vm-rhel7 systemd[1]: Starting dummy_runuser.service...
Feb 14 17:14:47 vm-rhel7 runuser[7344]: pam_unix(runuser:session): session opened for user local1 by (uid
Feb 14 17:14:47 vm-rhel7 systemd[1]: Started dummy_runuser.service.
Feb 14 17:14:52 vm-rhel7 runuser[7344]: testing /usr/local/bin/dummy.sh with PID 7346

Comment 1 Lukáš Nykrýn 2019-02-15 14:55:03 UTC
That looks again like the problem with the short-lived processes, that the message reaches to journal after the service is already dead. If you add a sleep after the echo, does it fix the issue?

Comment 2 Renaud Métrich 2019-02-15 15:01:57 UTC
Can't test now (leaving for vacations) but doesn't happen with "runuser" and User=root, so I doubt it's related to short-living processes.

Comment 3 Jan Synacek 2019-02-18 08:08:05 UTC
Isn't this a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1658115? I believe we tested it on RHEL-7.6 and it worked there, but I might be wrong.

Comment 5 Renaud Métrich 2019-03-26 10:16:39 UTC
Hi Jan, I don't know if it's BZ 1658115. The PR is for RHEL8 and code changed quite a lot.

Comment 6 Jan Synacek 2019-03-26 10:21:34 UTC
Ok, we (I) need to investigate this a bit more thoroughly.

Comment 8 Renaud Métrich 2019-03-27 10:28:31 UTC
It's clearly not related to BZ #1658115.
After setting some GDB breakpoints, I can see the message goes through this (line numbers from latest RHEL7.6):

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
310 static int stdout_stream_line(StdoutStream *s, char *p, LineBreak line_break) {
...
338         case STDOUT_STREAM_UNIT_ID:
339                 if (s->ucred.uid == 0) {
340                         if (isempty(p))
341                                 s->unit_id = NULL;
342                         else  {
343                                 s->unit_id = strdup(p);
344                                 if (!s->unit_id)
345                                         return log_oom();
346                         }
347                 }
348 
349                 s->state = STDOUT_STREAM_PRIORITY;
350                 return 0;
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

On line 339, when unit is executed by the user, ucred.uid != 0, so s->unit_id remains to NULL (initialized as such in caller), whereas when executed by root (or through "runuser"), ucred.uid == 0, so s->unit_id is set to "p" which is the unit name (e.g. "dummy_root.service").

It's unclear to me why there is that restriction on line 339.

RHEL8 code is similar.

Comment 9 Renaud Métrich 2019-03-27 13:50:55 UTC
Code introduced in commit 62bca2c657bf95fd1f69935eef09915afa5c69d9:

    journal: set the _SYSTEMD_UNIT field for messages from terminated processes
    
    As described in
    
      https://bugs.freedesktop.org/show_bug.cgi?id=50184
    
    the journal currently doesn't set fields such as _SYSTEMD_UNIT
    properly for messages coming from processes that have already
    terminated.  This means among other things that "systemctl status" may
    not show some of the output of services that wrote messages just
    before they exited.
    
    This patch fixes this by having processes that log to the journal
    write their unit identifier to journald when the connection to
    /run/systemd/journal/stdout is opened.  Journald stores the unit ID
    and uses it to fill in _SYSTEMD_UNIT when it cannot be obtained
    normally (i.e. from the cgroup).  To prevent impersonating another
    unit, this information is only used when the caller is root. 
    
    This doesn't fix the general problem of getting metadata about 
    messages from terminated processes (which requires some kernel
    support), but it allows "systemctl status" and similar queries to do 
    the Right Thing for units that log via stdout/stderr.


"To prevent impersonating another unit, this information is only used when the caller is root."

--> This is why only units running as root get their messages tagged with the unit name.

Comment 10 Renaud Métrich 2019-03-27 13:58:24 UTC
Again, this can be solved by implementing BZ #1117877

Comment 11 Kyle Walker 2020-02-18 19:50:40 UTC
I'm going to move this bug to the RHEL 8 release to match the (somewhat) associated 1117877 tracker. We aren't really going to be able to target this effort in RHEL 7 based on the current point in the lifecycle.

Comment 13 David Tardon 2021-01-25 08:59:22 UTC
*** Bug 1919275 has been marked as a duplicate of this bug. ***

Comment 15 David Tardon 2022-03-10 14:40:18 UTC
*** Bug 2061689 has been marked as a duplicate of this bug. ***

Comment 16 David Tardon 2022-03-15 09:22:24 UTC
*** Bug 1852845 has been marked as a duplicate of this bug. ***

Comment 21 Christopher Wawak 2023-04-17 19:07:30 UTC
This is still a factor for me in RHEL 9.1. It would be great to see this fixed, enabling cost savings through filtering unnecessary logspam.