Bug 1833644

Summary: pmlogger service fails to reload properly, printing confusing messages in the log [RHEL-7]
Product: Red Hat Enterprise Linux 7 Reporter: Renaud Métrich <rmetrich>
Component: pcpAssignee: pcp-maint <pcp-maint>
Status: CLOSED WONTFIX QA Contact: Jan Kurik <jkurik>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.8CC: agerstmayr, jkurik, mgoodwin, molasaga, nathans, patrickm, yuokada
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: pcp-5.1.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-18 05:43:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Renaud Métrich 2020-05-09 10:33:36 UTC
Description of problem:

A customer reports "pmlogger.service" dying with the following messages printed by systemd:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
New main PID XXX does not belong to service, and PID file is not owned by root. Refusing.
New main PID XXX does not belong to service, and PID file is not owned by root. Refusing.
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

These messages are triggered when the admin executes "/usr/share/pcp/lib/pmlogger reload" from a shell.

Indeed this command kills existing pmlogger daemon (running in "pmlogger.service" cgroup) and spawns *its own* pmlogger process (running in the session context).
There are 2 issues here:
1. systemd believes pmlogger.service died
2. new "pmlogger" process is spawned in the wrong cgroup (the cgroup of the shell, not the cgroup of the pmlogger.service unit)

This makes systemd print the messages, and start a new "pmlogger.service" unit which hopefully replaces the "pmlogger" process started through using "/usr/share/pcp/lib/pmlogger reload".


Version-Release number of selected component (if applicable):

pcp-4.3.2-6.el7.x86_64


How reproducible:

Always


Steps to Reproduce:
1. Execute a reload

  # /usr/share/pcp/lib/pmlogger reload

2. Check the journal

  # journalctl -u pmlogger -b

Actual results:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
pmlogger[7219]: /usr/share/pcp/lib/pmlogger: pmlogger not running
systemd[1]: pmlogger.service holdoff time over, scheduling restart.
systemd[1]: Stopped Performance Metrics Archive Logger.
systemd[1]: Starting Performance Metrics Archive Logger...
pmlogger[7660]: Starting pmlogger ...
systemd[1]: Can't open PID file /run/pcp/pmlogger.pid (yet?) after start: No such file or directory
systemd[1]: New main PID 13013 does not belong to service, and PID file is not owned by root. Refusing.
systemd[1]: New main PID 13013 does not belong to service, and PID file is not owned by root. Refusing.
systemd[1]: Daemon never wrote its PID file. Failing.
systemd[1]: Failed to start Performance Metrics Archive Logger.
systemd[1]: Unit pmlogger.service entered failed state.
systemd[1]: pmlogger.service failed.
systemd[1]: pmlogger.service holdoff time over, scheduling restart.
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Expected results:

No error messages

Comment 5 Nathan Scott 2020-05-18 05:43:02 UTC
This issue is not of sufficient importance to warrant back-porting (and potentially destabilising) RHEL 7 at this late stage of its life-cycle.
The fix will be included with RHEL 8.3 and later.