1810110 – pmnewlog generates inaccessible config file for a new instance of pmlogger

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1810110 - pmnewlog generates inaccessible config file for a new instance of pmlogger

Summary: pmnewlog generates inaccessible config file for a new instance of pmlogger

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	pcp
Sub Component:
Version:	8.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	8.3
Assignee:	Mark Goodwin
QA Contact:	Jan Kurik
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-03-04 14:55 UTC by Jan Kurik
Modified:	2021-09-17 12:48 UTC (History)
CC List:	5 users (show)
Fixed In Version:	pcp-5.1.0
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-04 05:48:41 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Jan Kurik 2020-03-04 14:55:07 UTC

Description of problem:
pmnewlog creates a temporary directory containing a config file for a new instance of pmlogger. Unfortunately this directory is not accessible by pcp:pcp user, so the new instance of pmlogger exits immediately because it can not read the config file.

Version-Release number of selected component (if applicable):
pcp-5.0.2-5.el8

How reproducible:
Always

Steps to Reproduce:
1. Install pcp on a fresh machine with RHEL-8.2
# yum install -y pcp-zeroconf
2. Run pmnewlog script
# /usr/libexec/pcp/bin/pmnewlog -V <archive>

Actual results:
* An error message is printed on terminal:
"Sorry, but this is fatal. No new pmlogger instance has been started."

* File "/var/log/pcp/pmlogger/$(hostname)/pmlogger.log.prior" contains a message similar to this one:

pmlogger: Cannot open config file "/tmp/pcp.IHPHeoWXJ/config": Permission denied

* No new instance of pmlogger is started

Expected results:
* No error message is printed on terminal
* No error message in log files
* A new pmlogger instance is started

Additional info:
This is strange, however having the pmnewlog script fixed will lead to possibly more complicated situation.

The reason is that pmnewlog (triggered from cron via pmlogger_daily script) starts a new instance of pmlooger which is unfortunately outside of systemd control. As such, a successful run of pmnewlog confuses systemd which starts a new instance of pmlogger, in parallel to the one started by pmnewlog. Such behaviour leads to parallel run on two primary pmlogger instances.

Comment 1 Jan Kurik 2020-03-04 15:12:04 UTC

Proposed fix: https://github.com/performancecopilot/pcp/pull/858

Comment 2 Mark Goodwin 2020-03-06 06:40:08 UTC

(In reply to Jan Kurik from comment #0)
[...]
> Additional info:
> This is strange, however having the pmnewlog script fixed will lead to
> possibly more complicated situation.
> 
> The reason is that pmnewlog (triggered from cron via pmlogger_daily script)
> starts a new instance of pmlooger which is unfortunately outside of systemd
> control. As such, a successful run of pmnewlog confuses systemd which starts
> a new instance of pmlogger, in parallel to the one started by pmnewlog. Such
> behaviour leads to parallel run on two primary pmlogger instances.

Here's what actually happens - almost the same as above but in much more detail:

pmlogger_daily is started by the pmlogger_daily.timer (or the poll unit) usually at 00:10am and creates a lock file, and then runs pmnewlog to roll the logs

pmnewlog kills the existing pmlogger and then (initially) sleeps for 5 seconds

meanwhile, the pmlogger systemd service unit (which has restart=always) notices pmlogger has exited

systemd launches a new pmlogger rc script (/usr/share/pcp/lib/pmlogger), which blocks on the lock file (still held by pmlogger_daily)

pmnewlog wakes up and loops until the pmlogger pid that it killed is dead

once it's dead, pmnewlog then determines the new archive name and does a bit of other stuff and then directly runs a new pmlogger process (NOT via the rc script), which forks and daemonises.

pmnewlog then loops waiting for a pmlc connection to the new pmlogger to succeed. 

the pmlogger launched by pmnewlog writes it's pidfile and enters it's main loop, which then allows pmnewlog's pmlc to connect and so pmnewlog resumes.

meanwhile, the pmlogger rc script launched by systemd times out (usually, but not always). systemd notices and reads the pidfile and kills (with SIGINT) the pmlogger that was directly started by pmnewlog (this is where systemd gets confused).

systemd then launches a new pmlogger via the rc script, which first waits until pmlogger_daily releases the lock (which happens once pmnewlog returns)

the new pmlogger runs under systemd until the next pmlogger_daily cycle ....

--

Incredibly, the above is actually benign - we end up with a running pmlogger service under systemd's management. And it also works on non-systemd platforms. But clearly it's ridiculously convoluted!

The fix (on systemd platforms only) will be for pmnewlog to NOT launch a rogue pmlogger after killing the existing process. It will be launched by systemd via the pmlogger rc script soon after pmnewlog kills the existing pmlogger. The newly launched pmlogger rc script will wait on the lock held by pmlogger_daily, which will be released when pmnewlog returns. Once the lock is released, the rc script will launch a new pmlogger process under systemd's control.

Before that fix however, we need to change to type=notify for all PCP services (not just pmlogger). I have patches almost ready for that. Once that's all working, we have to make additional changes using systemd templates for pmlogger farms (and similar for pmie farms too).

Comment 3 Mark Goodwin 2020-03-19 04:07:50 UTC

After we've finished re-working the pmlogger logutil scripts and systemd unit configs, it's likely the problematic pmnewlog script will be retired (instead pmlogger will essentially logroll on SIGUSR1, so there will be no need to launch another one). But we'll keep this open to be sure we set the right modes on the config.

Comment 4 Mark Goodwin 2020-05-04 05:48:41 UTC

pmnewlog was deprecated in pcp-5.1.0 as part of the systemd unit config overhaul to change to type=notify. Hence DUP'ing to the RHEL-8.3 PCP rebase to pcp-5.1+

*** This bug has been marked as a duplicate of bug 1792971 ***

Comment 5 Jan Kurik 2020-06-12 15:37:43 UTC

Verified on pcp-5.1.1-2.el8 build:

# usr/libexec/pcp/bin/pmnewlog -V <archive>
pmnewlog: functionality retired and script will be removed in a future release

Because this is retired now, marking the bug as DEFERRED.

Note You need to log in before you can comment on or make changes to this bug.