Bug 1833647

Summary: pmlogger service fails to reload properly, printing confusing messages in the log
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: pcpAssignee: Mark Goodwin <mgoodwin>
Status: CLOSED ERRATA QA Contact: Jan Kurik <jkurik>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.3CC: agerstmayr, jkurik, mcermak, mgoodwin, mnewsome, nathans, patrickm, peter.vreman, pkhedeka
Target Milestone: rcKeywords: Bugfix, Reopened, Triaged, VerifiedUpstream
Target Release: 8.4Flags: pm-rhel: mirror+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: pcp-5.2.5-1.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-18 15:19:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Set umask in pmlogger and pmie check scripts none

Description Renaud Métrich 2020-05-09 11:20:44 UTC
This bug was initially created as a copy of Bug #1833644

I am copying this bug because: 

It also applies

Description of problem:

A customer reports "pmlogger.service" dying with the following messages printed by systemd:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
New main PID XXX does not belong to service, and PID file is not owned by root. Refusing.
New main PID XXX does not belong to service, and PID file is not owned by root. Refusing.
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

These messages are triggered when the admin executes "/usr/share/pcp/lib/pmlogger reload" from a shell.

Indeed this command kills existing pmlogger daemon (running in "pmlogger.service" cgroup) and spawns *its own* pmlogger process (running in the session context).
There are 2 issues here:
1. systemd believes pmlogger.service died
2. new "pmlogger" process is spawned in the wrong cgroup (the cgroup of the shell, not the cgroup of the pmlogger.service unit)

This makes systemd print the messages, and start a new "pmlogger.service" unit which hopefully replaces the "pmlogger" process started through using "/usr/share/pcp/lib/pmlogger reload".


Version-Release number of selected component (if applicable):

pcp-5.0.2-5.el8.x86_64


How reproducible:

Always


Steps to Reproduce:
1. Execute a reload

  # /usr/share/pcp/lib/pmlogger reload

2. Check the journal

  # journalctl -u pmlogger -b

Actual results:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
pmlogger[7219]: /usr/share/pcp/lib/pmlogger: pmlogger not running
systemd[1]: pmlogger.service holdoff time over, scheduling restart.
systemd[1]: Stopped Performance Metrics Archive Logger.
systemd[1]: Starting Performance Metrics Archive Logger...
pmlogger[7660]: Starting pmlogger ...
systemd[1]: Can't open PID file /run/pcp/pmlogger.pid (yet?) after start: No such file or directory
systemd[1]: New main PID 13013 does not belong to service, and PID file is not owned by root. Refusing.
systemd[1]: New main PID 13013 does not belong to service, and PID file is not owned by root. Refusing.
systemd[1]: Daemon never wrote its PID file. Failing.
systemd[1]: Failed to start Performance Metrics Archive Logger.
systemd[1]: Unit pmlogger.service entered failed state.
systemd[1]: pmlogger.service failed.
systemd[1]: pmlogger.service holdoff time over, scheduling restart.
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Expected results:

No error messages

Comment 1 Renaud Métrich 2020-05-09 11:22:49 UTC
Important: this is even worse on RHEL8 since pmlogger will wait for 120 seconds before respinning, whereas on RHEL7 it is immediate.

Comment 3 Renaud Métrich 2020-11-26 15:50:31 UTC
I'm reopening this BZ since BZ #1806428 doesn't fix this.

Comment 14 Mark Goodwin 2021-01-14 06:20:53 UTC
Systemd units can explicitly set a umask with the UMask= directive in the [Service] section.

So we could add Umask=0002 in each /usr/lib/systemd/system/pm*.service
Note that's "Umask" with an uppercase 'U'. This change should be benign on systems that don't twiddle with the umask.

Comment 15 Nathan Scott 2021-01-14 07:15:52 UTC
(In reply to Mark Goodwin from comment #14)
> Systemd units can explicitly set a umask with the UMask= directive in the
> [Service] section.
> 
> So we could add Umask=0002 in each /usr/lib/systemd/system/pm*.service
> Note that's "Umask" with an uppercase 'U'. This change should be benign on
> systems that don't twiddle with the umask.

That's not the right fix - PCP runs on systems which don't use systemd too.

Comment 16 Nathan Scott 2021-01-14 07:17:25 UTC
Created attachment 1747315 [details]
Set umask in pmlogger and pmie check scripts

Comment 17 Nathan Scott 2021-01-14 07:18:00 UTC
I'm running tests with the attached patch now.

Comment 37 errata-xmlrpc 2021-05-18 15:19:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcp bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1754